The promise of machine learning in education is tempered by the standard concerns: biased data, privacy violations, inequitable outcomes, and so on. There remains a significant gap between how the technology is imagined and how, in fact, it is deployed and used.
“We are pioneering a participatory AI approach with the goal of developing ethical, human-centered, and equitable AI solutions for education,” says Stanford Graduate School of Education (GSE) Assistant Professor Hariharan Subramonyam. “By engaging teachers and students from the outset, our goal is to ensure that our data practices are inclusive and representative of diverse learner experiences and needs.”
Subramonyam explored how gathering different stakeholders at the same table can help address these needs with Mei Tan, a doctoral student at the GSE; Hansol Lee, a PhD candidate in Education Data Science at the GSE; and Northeastern University Associate Professor Dakuo Wang. Working in the world of education, the researchers convened 10 meetings in which engineers, designers, legal specialists, teachers, and students discussed the training data specifications for four different machine learning tools. (These included measuring student engagement through images, career recommendations based on resumes, assessing drop-out risk, and automated essay grading.) In a new preprint paper, the researchers distill the results of this process into a framework that ensures the right mix of people are together and able to have productive conversations.
Subramonyam and Tan spoke with Stanford HAI to outline how their findings inform a more conscientious and effective method for designing machine learning algorithms in education and beyond.
I want to start with a contextual question: Can you talk about the difference between model-centric and data-centric AI practices?
Subramonyam: For a long time, we assumed that data quality basically doesn’t matter for these models, as long as you have a lot of it. Machine learning engineers and AI researchers would take data as a given and focus mostly on fine-tuning and improving the model. That approach hit a ceiling in recent years, and people started to realize that improvements in model performance required thinking about data quality. It’s also expensive to collect lots of data. So now we’re thinking about how to gather less data that’s higher quality.
In your recent education work, you look at a multi-stakeholder collaborative process that considered what data should be collected. What prompted this, and how did you think about who should be involved?
Subramonyam: Issues around data transparency — how it has been collected and labeled and so on — have been around for a while, but this all happens after the fact: The data is already collected, the model is already built. To me, that has always seemed like the warning label on cigarette packets. It’s an acknowledgment of something bad. In my lab, in the Graduate School of Education, I am interested in how we can be more proactive about addressing these downstream problems.
Tan: In the world of AI and education, there's a good amount of research stating the obvious, which is that a lot of AI tools don't serve the needs of practitioners. And there's a lot of worry about the harms that poorly built AI tools can have in education, with many of these harms residing in the quality of the data.
Given this, we wanted to take a front-end, data-centered approach where we involved domain experts, the people who can right that wrong of how we represent the world of education through data. When we thought about whom to involve, we obviously started with teachers and students. From the industry perspective, we were thinking about machine learning engineers, but also UX designers and legal specialists, given the privacy concerns around the treatment of minors.
In thinking further about this data and this framework of participatory design, the model that we'd like to advocate to the world is not one where the researchers or the developers of ML products define the domain experts. Rather, we should ask the community: Who do you think are the valuable stakeholders in this case? The teachers, for instance, often raised administrators and school counselors as people who could make important contributions.
These group dynamics aren’t simple. What are the challenges of getting a team like this to work together?
Tan: I want to emphasize that what we’ve done here is a proof of concept, because no one has put a group like this together before. What we watched unfold was a bit of an experiment.
It ended up that the machine learning engineers had to translate what a decision about data collection might mean for the model and, ultimately, for the end-user. It required a very experienced machine learning engineer to be willing and able to engage in that dialogue with a teacher or a student. We saw some do it well and others do it less well.
The same was true with teachers. There were some teachers who were very willing to engage with the decisions and technical details in front of them, and there were others for whom that burden of entry was too high.
We learned that for this process to work, we need scaffolding for both sides. We need to help teachers understand some of the basic parameters of a machine learning model and how the data that’s collected fits into the overall picture. And we need to help ML engineers do that work of translation and understand more about the domain they’re working in — education, in this case. There is often emphasis on this first step: helping non-technical experts understand the technical. There is much less focus and work helping technical experts understand the domain. We need to do a better job of that.
What are some of the problems that result from bad data, and what promise did you see in this approach?
Tan: There are all kinds of harms that can result from bad or biased data. To take just two examples, there’s research about adaptive learning systems that shows racial and gender biases in the way these systems evaluate a student’s understanding. There is also evidence that technology used to assess engagement based on an image of a student’s face is full of bias depending on the characteristics of the student’s appearance.
In our work, we saw how domain experts shape key variables that go into data collection and, that way, alleviate some of these concerns or errors. Representation is one example. When we currently think about representation in developing machine learning data sets, we think about demographic variables, whether there is equity across gender, race, and socioeconomic lines. But the domain experts had much broader ideas of representation when it comes to education. So, for example, is this a private or public school? Is this a small classroom or large lecture? What are the individual learning needs? Is there neurodiversity? What subject are we talking about? Is this a group activity? All of these play a massive role in the classroom and can dramatically affect a student's experience — and therefore the act of data collection and the models that you build.
Beyond education, what did this show about how to think through who's involved in a process like this and what kind of scaffolding can help it succeed?
Subramonyam: There are a few things we need to think about. The first is the incentive structure. In industry, product teams are created with efficiency in mind, which means they’re divided up to work in ways where machine learning engineers are incentivized differently from people in finance, and where workflows favor separation. Our first recommendation is that we need to bring all of these people together at the beginning of the decision process. It takes enormous effort to collect data and train models, so it’s best to get thing right at the outset.
The second is around infrastructure. We need to think about the kinds of tools that we need in order to support collaboration. One of my students built a visualization related to these ML models to support collaboration between data scientists and domain experts, where the visualization offloads some of the burden from the engineer who typically has to translate things for other people.
Tan: Another thing we need to think about is the fact that language and standards are not shared across these domains. Teachers want to see evaluation metrics about how much a student is learning. This is very common language among teachers, but it’s very hard for ML engineers to define in an equation. Meanwhile, engineers care about things like precision in the model, which means nothing to teachers. We need to establish groundwork so these different stakeholders can talk about the same thing without a bunch of roundabout discussions.
And, finally, I’d mention the need for continuous iteration. In this case, we brought everyone together for one moment in time, whereas the process of developing an ML model can go on for months. As things change, these stakeholders need to be brought back together to recalibrate the direction that things are moving. Our research shows the benefits of upstream collaboration, but more work is needed to understand how to sustain continuous engagement.
This research was funded by the Stanford McCoy Family Center for Ethics in Society. Dakuo Wang was a visiting researcher at HAI, supported by IBM Research.
This story was originally published by Stanford HAI.
Subscribe to our monthly newsletter.