Chatting about chatbots: How AI tools can support teachers

On this episode of School’s In Stanford GSE Assistant Professor Dora Demszky discusses how chatbots can be used to give teachers feedback.

November 7, 2024

By Olivia Peterkin

While much has been said about the potential positive and negative effects of generative artificial intelligence (AI) in education as it relates to students, less has been said about how AI tools can be used to support teachers.

Stanford Graduate School of Education (GSE) Assistant Professor Dora Demszky, whose research combines machine learning, natural language processing, linguistics, and input from educators, is currently working on a project called M-Powering Teachers that provides feedback for teachers in the classroom.

“It’s really rooted in the idea that we want to empower teachers,” said Demszky, who teaches education data science at the GSE. “We’re not trying to tell them what to do. We’re just providing them with opportunities to reflect on what they did.”

The M-Power tool (the m stands for machine) utilizes natural language processing to analyze verbal classroom interactions and provides formative feedback to teachers.

“A lot of the feedback is actually more just providing them with things that they did, highlighting things and moments in their lesson for them to reflect on and asking them good reflection questions and goal-setting questions so there’s less opportunity for risks or for error,” she said.

Demzsky joins hosts GSE Dean Dan Schwartz and Senior Lecturer Denise Pope on School’s In as they discuss artificial intelligence as a tool for positive feedback and support for educators. Her research focuses on developing natural language processing methods to support equitable and student-centered instruction.

In the episode she explains how her team is trying to identify practices like cultivating growth mindset, using supportive language, and building on student ideas as focal points for teacher feedback and professional learning.

“We know from the literature — like, decades of literature — that when students feel heard, when they feel that their ideas matter and that their teachers are building on it rather than just funneling them to a very specific answer, that really facilitates learning,” says Demszky. “So we identify practices that are related to that, like building on ideas, mindset-supportive talk, asking questions that probe students’ thinking, and then we build algorithms.”

Never miss an episode! Subscribe to School’s In on Spotify, Apple Podcasts, or wherever you get your podcasts.

Transcript

Dora Demszky (00:00):

Because of these technical limitations of these algorithms and our current philosophical approach, we are focusing on the good only.

Denise Pope (00:09):

Today, we're discussing a very hot topic, one that we will revisit this season again and again and again because it is just that hot, AI in education. More specifically, we'll be talking about how artificial intelligence tools can improve feedback for teachers. We're always, as educators, looking to get more feedback to improve our practice, right Dan?

Dan Schwartz (00:31):

I don't know, Denise. Maybe it's just you who wants that feedback. I think I'm pretty good.

Denise Pope (00:36):

All right, Dan. Well, let's get into our episode and find out.

(00:42):

Welcome to School's In, your go-to podcast for cutting edge insights in learning. Each episode, we dive into the latest trends, innovations, and challenges facing learners. I'm Denise Pope, senior lecturer at Stanford Graduate School of Education and co-founder of Challenge Success. And I'm with my co-host, Dan Schwartz, dean of the Stanford GSE and faculty director of the Stanford Accelerator for Learning.

Dan Schwartz (01:11):

We're very fortunate today. We have someone who's state-of-the-art thinking about how to support teachers, how to get them feedback. This is Professor Dora Demszky at the Stanford Graduate School of Education. She combines machine learning, natural language processing, linguistics and input from educators. And one of the projects I want to talk with her about is how can chatbots, basically the computer, provide actionable feedback to teachers? Give me the model. I'm a teacher and I'm in the middle of teaching a class and suddenly in my earpiece it says, do this instead. What's our vision?

Denise Pope (01:52):

What? Oh my God, that's crazy.

Dan Schwartz (01:54):

Well, I just made that up. I have no idea if this is what-

Denise Pope (01:57):

Oh, oh, okay.

Dora Demszky (02:00):

Yeah, yeah, I can clarify that. So it depends on the setting, right? So are we talking about physical classrooms or face-to-face instruction where oftentimes yes, giving them real-time feedback can be really distracting. So one component is about the what, like what are you actually telling them to do? It has to be good feedback. The other question is the when and the how, which are just as important. So in the context that we've deployed it, it's been oftentimes online context with novice tutors or teachers and they get feedback after they taught. So you teach your lesson, your lesson gets recorded, which is pretty simple in online context. Then we run the analyses on your lesson and then you get some insights on it.

(02:47):

In physical classrooms, it's pretty much the same thing. The difference is that you upload, you use a device to record your lesson, and then you upload that to your app or something and then you get feedback afterwards. We have thought about real time, but as you say, it can be very distracting.

Denise Pope (03:05):

How do you know that the quality of the feedback that the bot is giving, is it called the bot, I don't know, is good? Are you checking that? Are you behind that? Is it Dora really giving the feedback?

Dora Demszky (03:18):

Great question. So we call this Empowering Teachers. That's the name of the tool that we built. It's really rooted in the idea that we want to empower teachers. We're not trying to tell them what to do. We're just providing them with opportunities to reflect on what they did. So kind of like a Fitbit, when you run and it gives you some metrics, it doesn't necessarily tell you, oh, you should burn this many calories a day, or you should run this fast. It's so person dependent and so context dependent that you are the only person that knows.

(03:51):

So that may answer some of their question about how do we know it's good? So we first obviously validate it in different ways, validate it by having teachers, experts look over what our models are predicting and what are their outputting, do they seem right or seem wrong? But also, a lot of the feedback is actually more just providing them with things that they did, highlighting things and moments in their lesson for them to reflect on and asking them good reflection questions and goal setting questions so there's less opportunity for risks or for error.

Dan Schwartz (04:32):

Let's take the Zoom example. It's online. I've got three or four students that I'm talking to. Afterwards, it'll come back and it'll say, you said to that student they should try harder. How do you think that student felt about that? I mean, what's an example of some feedback?

Dora Demszky (04:51):

So you mentioned, what you just brought up is something that has to do with growth mindset, which is something we worked on you know, "try harder". The idea that intelligence is not something that's fixed, that you can improve your brain and your skills can constantly developing if you are working on it. But just telling them to try harder may not be the best way to do that, but actually recognizing that things are challenging and giving them the tools and strategies to improve their skills. Anyways, that's a tangent, but we are trying to identify these types of practices like growth mindset, supportive language, or building on a student's idea. So we know from the literature, like decades of literature that when students feel heard, when they feel that their ideas matter and that their teachers are building on it rather than just funneling them to a very specific answer, that really facilitates learning.

(05:47):

It creates that type of a collaborative process that you mentioned at the beginning that it's not just the teacher lecturing and the students passively receiving things, but instead, students are active participants. So we identify practices that are related to that like building on ideas, mindset-supportive talk, asking questions that probe students thinking, and then we build algorithms. It's not really a chatbot. These are algorithms that can identify moments when you do that and then highlight them for you to revisit, to think about, what did I do in this moment when that led to this great example, and could I do this more? Where are there missed opportunities?

Dan Schwartz (06:36):

Is the approach generally to find good examples and tell the teacher what you did here was good, as opposed to saying, don't ever do that again?

Dora Demszky (06:48):

Exactly. There are multiple reasons for it. Just pedagogically speaking, it's better to celebrate the good, and also especially for novices. So a lot of the contexts we're working, we are working with novice instructors. For them, it's really important to give them the positive feedback early on so that they can build on it and grow and not feel like, oh, I'm just doing everything wrong.

(07:12):

Second of all, it's actually extremely hard to reliably be able to tell this was wrong because as I mentioned, everything is so context-dependent. Maybe your model cannot actually identify the teacher's true intention with let's say not building on a student's idea at the moment. Maybe they were just trying to get ideas from everybody before then synthesizing it. There are different contexts where we may not necessarily want to directly build on what the student said right away, and that's just really hard and complex to algorithmically detect.

Dan Schwartz (07:53):

There's a study that occurred, I don't know, 25 years ago, and it was about clinical psychologists, but what they discovered was clinical psychologists who'd been doing it for 20 years were no better than clinical psychologists in their first year of practice.

Denise Pope (08:10):

Oh. That's not good at all.

Dan Schwartz (08:12):

Right. They weren't getting very good feedback. You give a diagnosis or a prescription to your patient, they never come back, and so they don't know whether it works or not. So do you think the same thing's true with teachers? I've been teaching for 20 years. Have I just gotten better and better or is that just too hard of a task to improve massively?

Denise Pope (08:36):

I mean, I think it completely depends on the context. If you are getting zero feedback and you're just doing the same thing, and a lot of people who haven't had really good training often teach the way that they were taught, and we know that that's kind of antiquated at this point. People believe that talking and lecturing is the best method, which we know is not true. And kids sitting in rows, spitting back answers. Don't act surprised, Dan, I know you know this.

(09:05):

So I would say it really depends, and I think what it depends on is feedback. And if you think of a clinician maybe in the office with just a patient, I could see how that would be hard because nobody's watching them. Nobody is following up with the patient. And in some sense, that's kind of like teachers because alone. Usually they're a single adult in the classroom, and so it might be hard to know what you're doing well and what you're not doing well. So I think it depends is my answer.

Dan Schwartz (09:35):

Okay, so that's a nice hedge. I think another thing for teachers is they can experiment with curriculum so they can create conditions that help them get feedback. So when I teach my courses in college, let's say I teach intro stats, it's a pretty established curriculum, but every couple of weeks, I'll try something brand new and see how it works out.

Denise Pope (10:01):

Well, I mean actually what you just said is very true for when I was teaching high school, my third period class always went better than my first period class because with a new lesson, you're running the lesson for the first time and then you're like, okay, next time do this a little differently. By the time you get to sixth period, you're like, we got this down.

Dan Schwartz (10:21):

Really by the time I get to sixth period, I'm bored.

Denise Pope (10:23):

Okay, well, you're also already exhausted.

Dan Schwartz (10:25):

I've mastered it.

Denise Pope (10:27):

You do forget what you've said, how many times you've said it, right? You do forget that.

(10:37):

So okay, I'm not that techie. So I'm going to ask a question that might be kind of embarrassing, but if someone is programming, it's not a bot, whoever or whatever is giving the feedback, right? So how does this thing that's giving the feedback know that, oh, that's academic press, pushing someone to ask or build on a question. Oh, that's growth mindset. How is this all working, without getting too techie, without getting too techie? Sorry.

Dora Demszky (11:05):

Of course. Yes. Our general approach is the traditional way you train machine learning models is you get training data. Our training data includes a bunch, thousands of transcripts of talk between students and teachers that then we label manually with experts, not all of them, but a subset of them for these particular moves or examples that we're interested in. And then we train the model on this data to learn to predict those same labels.

Denise Pope (11:41):

That's so cool. And how accurate is it, because I know you hear about hallucinations in AI and all this stuff, like how accurate is the model?

Dora Demszky (11:49):

For this type of classification, I would say it's moderately accurate. I want to stress something which that humans disagree a lot. These are very subjective things like these are not just like, oh, is this a cat or a dog? These are examples that are very context-dependent, subjective practices as I mentioned. And the agreement between two experts is also in the moderate range. So what we are finding is that our models are able to approximate that level of agreement that humans have between one another, but they're not perfect because yeah, it is subjective. So we're trying to mitigate that by obviously hedging and saying that this is not perfect, and also by coming out with the positive examples, which obviously have a lot lower risk for harm than providing the negative examples.

(12:39):

I do want to add though that I know from Dan your work about the importance of contrastive cases, teaching people by showing them contrast of, oh, this is good, and in contrast, this is not so good. So ultimately, it would be really great to incorporate something like that, but just because of these technical limitations of these algorithms and our current philosophical approach, we are focusing on the good only, but we hope to get there at some point.

Dan Schwartz (13:06):

Focusing on the good will work if you help people see which part of it was good. So you show me a three-minute video of my lesson and you say, that's really good, and I'm sort of like, which part? So some way just to get people to see what's the thing. So the reflection prompts are probably really important here.

Dora Demszky (13:26):

So we are very specific. We actually identify a single conversational turn with some limited context. This is what you said, showing them that built on a student's idea. Students said, I added 30 to 70, and the teacher asked, where did the 70 come from? This is just a really simple example, but of building on and improving the student's thinking. So we give them these examples and then ask them, what were some strategies that you used in this context to build on a student's contribution and what else could you do next time? And which one of these strategies you might want to use again next time, thinking about your next lesson topic, for example, like ratios, making it up, it's depending on the context.

(14:12):

A lot of what we work in is math or computer science, like STEM learning environments. And then we give them resources that can facilitate these reflections. Here are some examples, for example, from other instructors. In the case of this online course, there were several instructors teaching the same curriculum the same period of time. So we could actually give them examples from their transcript, de-identified obviously that were really great examples that they can incorporate in their own reflection, like maybe I want to use a question like this next time.

Dan Schwartz (14:47):

So you're also matching across different instructors that allows you to do a lot.

Dora Demszky (14:52):

And they can share their reflection with others, so that's another motivation. Pro-social motivation for them to reflect is not just self-serving, but they can actually opt in to share that with others and see what others are reflecting. So it's kind of a shared community for professional learning among instructors.

Dan Schwartz (15:16):

That's great. So could you do this for students? I'm sort of imagining student groups.

Denise Pope (15:21):

Students in a study group. Often it's like the blind leading the blind, but if they had this extra help, this machine that was saying, hey, you're on the right track, keep doing, I mean, it becomes sort of like another teacher.

Dora Demszky (15:36):

Yeah. So this is something we're working on thinking about, especially focused on a specific population of students, multilingual learners who we know are often underserved, especially in a monolingual teaching environment where they might not have the vocabulary, but not just the vocabulary, just like they might have been taught about different concepts in different ways. And currently, it's really hard for a single teacher to support all of them and their separate language backgrounds and needs. It would be really amazing if these AI technologies, which we know are supposedly amazing for different languages and dialects, still to be tested in this particular context, but it has a lot of potential for doing exactly that. Like you said, kind of a collaborative assistant in maybe a group discussion.

(16:24):

So my work, a lot of it takes the stance of always having the teacher in the loop is because of multiple reasons. One, because the technology is not entirely reliable and it poses some risks if you directly interface it with minors and kids. And two, because I really worry about the inequities it could create to replace a teacher with an AI bot and then certain kids just not ever having access to a human teacher, so I just worry about the incentives that creates. But I think there could be different structures where an AI bot like that could really facilitate group work or peer work or even just a single student working on the problem through the teacher or even by themselves, but with some type of a supervision.

Dan Schwartz (17:17):

So I could imagine having a unit on cooperating right, and so you have three kids working together and there's an after action review where the AI sort of says this bit where you said, I like your idea, how about if we do this was really good. I don't feel like that's chasing the teacher out of the room. It seems like it could be super helpful to kids who have a lot of trouble giving feedback. And what do you think, Dora?

Dora Demszky (17:49):

Yeah, I think that specifically focused on these collaborative norms, it could be super, super helpful. There's something about that a group, like CU Boulder has actually created, it's called COBE. It currently, I don't think it intervenes in the conversation, but it listens and then identifies these moments of collaboration when students engage with each other's ideas or when they fail to do so. And then those insights are surfaced to the teacher and back to the students.

Dan Schwartz (18:20):

Is there anybody who could do that for my home life?

Denise Pope (18:26):

What you just said to this person who you care a lot about was really good. Say more of that. And then you see the person in the background going, hey, bot, tell Dan to do the dishes. Tell him that's good. Doing the dishes is good. Look, Dora, can I just say one of the coolest things about this I think, because teaching is a really lonely profession. You talk to teachers all across the US, it's very rare that they get supervised, that they get any kind of feedback that someone's actually in their classroom for more than 15 minutes with a sort of high level like, hey, good job. Keep it up. Right?

(19:02):

And when you tell teachers that you want them to go and look in other people's classrooms and take the time and actually build in time for teachers to watch teachers, it's a little bit awkward. They don't know how to give feedback to their peers. It's not part of that culture. So you've kind of taken the ouch out a little bit and you've also allowed more camaraderie to happen.

Dan Schwartz (19:28):

That's an interesting question. So these automated feedback systems in medicine, the doctors didn't want them.

Denise Pope (19:33):

Yeah.

Dan Schwartz (19:35):

So Dora, when teachers see this are they like thank you so much, or are they sort of like, sorry, I do what I do, I don't appreciate it. What is the response to this?

Dora Demszky (19:48):

I would say just like with every tool, everything, there's variation. So they're going to be the power user. So I want this for everything. Can you add this feature and that feature and can look at my historical data and can it give me suggestions for X and Y and so many things? And there are people, I think honestly we haven't really heard teachers saying I don't want feedback on my teaching. What we hear though is when they disagree with something or when they find something to be inaccurate, and that oftentimes comes from inaccurate transcription, which is a bottleneck that we are facing and we're working on addressing. It's really hard to transcribe speech accurately, especially in a noisy classroom environment, and there are going to be mistakes. And then it propagates down to the feedback.

(20:40):

But yes, even if you take all of that into account, just like I don't use a Fitbit when I run, I just like that to be my thing. I like when I'm not measured by anybody. I think that it's normal that some people want more or less of this. And so I don't think a teacher would say I don't want this ever because I don't think that's a good attitude towards professional learning, but maybe not as often as some others.

Denise Pope (21:11):

I love it. I'm so excited about it. So I mean, I can imagine someone listening saying, how can we get this? Coming to classrooms soon. Are we a year out from everybody using it? And then we'll wrap this.

Dora Demszky (21:25):

So good question. We have a beta version of the tool that is already public. It's not currently ready for wide release, but anyone can sign up on the wait list. We have a wait list. If you Google my name and you can find this project page Empowering Teachers, and you can sign up to the wait list there. As a researcher or a teacher or educator in any capacity, this tool could help you. There are also commercial tools out there like TeachFX, which pretty much does what I described to you already, and they have thousands of users across the country and abroad. And they incorporate many of the models that we have developed, so they're really great.

Dan Schwartz (22:12):

Dora, thank you so much. It's great to see into the future and what's possible.

Denise Pope (22:18):

I know, I know. It's super, super exciting. Thank you. Thank you. Especially the information on where we can start to access some of these tools as educators. I can't believe they're already here, right? I mean, it's not something that's out in the future. Okay. Dan, I wonder, we started this out saying that you didn't really want feedback, and now I want to know, do you think you can use some of this feedback now that you know all the things that AI can do?

Dan Schwartz (22:45):

You mean about all the things I'm doing right? I'd love it, especially if it gave me points for everything I got right, it gave me a couple of points.

Denise Pope (22:53):

Yes, because everything is a competition for you, Dan, so you are motivated by the points. I will say this, I do actually like that. It's almost like caught you doing good, one of those exercises that we do with the kids. It's meant to really shine a light on the good and to find out where we as educators are doing right, and to highlight it in a way that gives you a chance to grow. So in that sense, Dan, you're not that far off. I mean, I think you're right. It's very positive and it would have a lot to tell you and others about what they're doing well.

Dan Schwartz (23:30):

No, I like that. It's hard to know, and teachers do want the feedback, especially if it's sort of saying, this is the right behavior, keep it up. And I like the idea that eventually, this tool could be used to help students get feedback about what they're doing right. I think that's really an exciting, exciting possibility.

Denise Pope (23:49):

Totally. Totally. A hundred percent agree. Really, really good stuff. Thank you again Dora for this great conversation, and thank all of you for joining us on this episode of School's In. Remember to subscribe to our show on Spotify, Apple Podcasts, or wherever you tune in. I'm Denise Pope.

Dan Schwartz (24:05):

And I'm Dan Schwartz. Denise, did I get that right? Can I get some feedback, some points for that?

Faculty mentioned in this article: Dan Schwartz , Denise Pope , Dora Demszky

School's In

More related news