Study: Teacher Evaluations Have Zero Correlation with Learning

Inside Higher Ed reports on a study that found,
“Our up-to-date meta-analysis of all multi-section studies revealed no significant correlations between [evaluation] ratings and learning.”

The article points out the effects of bias in evaluations, and notes that actual learning is affected by student knowledge, motivation, etc.

https://www.insidehighered.com/news/2016/09/21/new-study-could-be-another-nail-coffin-validity-student-evaluations-teaching

We know there are some professors that are interesting and fun, and others that are boring or even incomprehensible. But, apparently, students learn pretty much the same. One explanation, I suppose, is that students who take a class have to perform whether the professor sucks or not. I had the occasional foreign-born prof whose lectures were impossible to understand. I had research-oriented profs who put zero effort into preparation and discouraged student contact. But, all this meant was more reliance on the TAs and textbooks.

I think evaluations still have some use in guiding students toward more engaging profs. They might not learn more, but they will enjoy the class more.

And, why not use evaluations in salary or tenure decisions? Focusing solely on “learning” is a bit like comparing restaurants based only on the nutritional value of their meals.

Wow. I am NOT buying this, not even a little bit. I am a Dean at a large community college in the State of Washington, and I review the student surveys (yes, they are SURVEYS, not evaluations) for nearly 100 teachers. At the same time, I have visited the classes of most of those teachers and evaluated their teaching ability, and the level of learning that appears to be taking place. I can tell you that, in my experience, there is a strong correlation between student survey ratings of teachers and classroom learning.

Sometimes, it is hard to quantify ‘learning’, because grades alone do not necessarily directly translate to learning. However, in our Math Department, we tracked student grade performance in the class taken AFTER each specific teacher’s previous class. This allowed us to make an informed judgement as to how well-prepared a student was by a particular teacher, to perform well in the class that depended on learning from the previous class. Not surprisingly, our highest student-rated teachers had students who performed the best in their next class.

To be sure, there are some fun and engaging teachers that garner high student survey ratings despite the fact that their coursework is not that rigorous. And, to be sure, some students give higher ratings to teachers who are deemed ‘easier’. However, it is the extraordinarily rare teacher who has poor student survey ratings, yet his/her students learned a great deal. And, likewise, it is the extraordinarily rare teacher who has strong student survey ratings, yet his/her students did not learn much.

In any event, no competent academic supervisor solely uses student surveys when they evaluate the performance of a teacher. Any good evaluation is based on a series of data points, only one of which constitutes information from student surveys.

As a professor, those end-of-semester evaluations are the most useful to me when I read the comments section (which is optional but which I strongly encourage my students to complete). Most of the standardized questions that my university uses measure whether or not the student has read the syllabus: “readings were related to course objectives,” “policy for absences was clearly stated” or are arbitrary “most classes started and ended on time,” “professor encourages class participation.”

Personally, I find that transparency with the students tends to boost my ratings: “we’re going to do this irritating writing assignment and you’re probably not going to enjoy it, but the end goal is x” leads to “stradmom really cares about our learning” but if I just hand out the assignment, the eval says something like “uses too many boring assignments.”

So which part of this (seemingly credible, large-scale) meta-study’s methodology do you think is flawed? Because your anecdata definitely doesn’t refute it in the slightest.

Student learning is certainly not independent of the professor. For any given student, if they are more motivated to learn, they are almost certainly going to learn more, and professors can certainly influence that motivation level. That said, student evaluations have a lot of issues with them, and as @stradmom suggested, the comments are much more valuable than the faux-objective numerical scores on the survey. Students have a pretty substantial tendency to evaluate based on how difficult they found the class regardless of whether that difficulty was the result of poor teaching, poor study habits, or the material simply being inherently difficult. With the short-answer questions, it is much easier to remove the outliers who are evaluating based on what grade they got so that you can focus on the real, valuable feedback from students who put some thought into it.

Someone in DH’s dept ran a little side analysis and realized the student satisfaction correlated heavily to grade expectation.

There were teachers usually marked as difficult or demanding who, nonetheless ranked high in student satisfaction with what they had learned, how they had been challenged to learn and use the info. But we can’t escape that these are student opinions and many questions are flawed. A lot depends on how the student sees the process of learning.

See this study of the “Effects of Instructor Attractiveness on Learning,” - http://tandfonline.com/doi/full/10.1080/00221309.2016.1200529
According to this study college students answer more questions correctly when they think their instructor is highly attractive than when they do not think they are attractive. Not a huge difference but statistically significant.
Maybe learning is more nuanced than we think.

Forgot to mention they only see a photo of an instructor while listening to an online lecture.

Since I don’t really want to spend $41.95 to buy access to this study, does anyone know how student learning was defined and measured in the study/array of studies? It seems to be a rather important thing to understand if we want to try to have an intelligent conversation on the issue.

The popularity of Capt Crunch and Lucky Charms cereal has nothing to do with nutrition.

Well, it does a little, otherwise nobody would be eating bran flakes :wink:

It all comes down to how they defined/measured learning, as @Sue22 pointed out. For example, if they used the grades in the classes, those have tons of problems with relative curving, and how a teacher grader. To standardize across a lot of colleges is going to be difficult. @ALF 's measure of success in subsequent classes would be a better measure IMO.

Beyond that, not all subjects are going to be equally affected. An interesting math teacher versus an interesting history teacher, when compared to less desirable counterparts, could produce different results.

I think that when something this hard and subjective to measure goes so hard against intuition, skepticism is very reasonable. Anyone in education knows that how much a student is motivated is one of the key factors in learning - if a student doesn’t want to learn, they aren’t going to. Teachers who inspire, motivate, and get students excited about the subject may not actually teach any content significantly better than the average teacher, but the motivation they can inspire can make the bigger difference.

Student evals are really flawed and objective and seem to correlate far more with a prof’s sex and race than actual “learning.” Women and people of color routinely get much lower ratings even when objective measures (such as uniform finals through all sections) show no real difference between their students’ outcomes and their white, male colleagues’ students.

ACK! That should say subjective NOT objective. Oops!!!

Totally disagree with this, Roman. [-(

That seems unnecessarily negative.

@PengsPhils : I know that in a research university, the “added value” idea is tricky, because as long as there are disparities in “difficulty” level of instructors, students will exploit it when they have a chance. Say students one a particular major or pre-professional track tend to take 1 sequence in one year, and another supposedly related one in the other. One would think the related one would reflect the training or lackthereof from the previous year. However, what if there were like 4-6 non-standardized sections with varying difficulty (and that this variation was well known as it normally is) and the students just self sort with many avoiding a situation where they are even held accountable for pre-requisite knowledge to a high level. I’ve see this happen in things like STEM and econ. for example. Unless there is some minimum threshold of standards or standardized exams, you won’t necessarily be able to judge added value. If the subsequent course is a “bottleneck” with only say 1 large section that is reasonably challenging, then maybe some information can be obtained from that. Or you can zoom in on sections meant to be much more challenging than normal and see if there is a substantial differences between those who took certain instructors in the previous course versus others, but I promise you that those previous instructors who intentionally ran a less intensive course would not like to be exposed. At my undergrad, the chemistry department decided to do an evaluation at the end of organic chemistry 2 that required a bit of higher order thinking and reading and needless to say, the instructors’ whose classes did not typically “emphasize those elements” (code: easier for most students) panicked because they a) knew they had obtained mostly the weaker students from the first semester and b) their courses in fact did assess more on lower level items or did allow for more algorithmic problem solving approaches. One honestly attempted to sabotage the results by telling his students it wasn’t that important (and thus some students left it blank and he had low participation rates).

Point is, college instructors seem really antsy about being held accountable in certain ways, especially tenure tracks. The ones who don’t want to go a little further do not want to be exposed or held accountable. They seem quite happy hiding behind the bubble sheet evaluations (one tenure track instructor, who will leave, actually used to receive the highest ratings for ochem 2, but his class would fully participate and get one of the poorest scores on that assessments in both years that it was given). The 2 highest performing sections were of the two instructors that of course were the most difficult. They both rate quite well, and one in fact usually rated number 1 before the substantially easier person came along).

I also suspect students give a boost based on expectations. For example, students were going to the easier instructor mostly to escape or avoid some harder ones. As long as that easier instructor met expectations of relative ease, they’ll get some boost that has little to do with learning (that guy’s class got a significant chunk coming from the easiest 1st semester sections and another chunk unsatisfied with 1st semester results in the harder instructors, as in they did not make an A), This reveals itself in the fact that the average of the easiest instructor (exams are indeed significantly less rigorous than colleagues) was often substantially lower than the others, but would be generously curved. Those students who knew they could not compete against more ambitious students on more difficult exams knew they were getting a sweet deal where the bar was set lower in terms of the cognitive complexity and level of competition in the course. You would be surprised the amount of calculus we do/did when choosing between instructors for the same course.

"@ALF wrote:
Wow. I am NOT buying this, not even a little bit.

@marvin100 So which part of this (seemingly credible, large-scale) meta-study’s methodology do you think is flawed? Because your anecdata definitely doesn’t refute it in the slightest."

I don’t know how well evaluations correlate with learning, because I haven’t seen the evaluations, but I do know that teacher quality correlates significantly with learning. If I had this research result, I would want to go back and look at my assumptions and the specific surveys used.

Maybe the evaluations in the study were poorly written. If the teacher evaluations are not well written, you may not learn what you want to know from them. That is obvious to me. Poorly written evaluations may be just telling you which teachers are easy, nice, or give less homework.

One teacher my D learned the most from is an excellent teacher, but also a first class jerk of the highest rank. Is the survey written well enough to identify his very bad behaviors, while also recognizing that his students learn more than any other instructor in the department? If I am @Alf, I want to understand both of those nuances as I work to understand which teachers are adding value and why?

I agree with @Alf etal, that professors make a huge difference. If s/he is thoughtful, s/he should be able to get valuable information from the surveys. The fact that this research of other people’s surveys shows no correlation just suggests to me that you have to be thoughtful about the specific questions and how the students are likely to think about the question when responding. You also need to avoid asking overly generic questions. If you just ask students to rate a professor from 1-10, it is too broad, and I am not surprised at the result at all.

Sure, @Much2learn – I don’t think anyone reasonable would interpret this research as a denial that evaluations can be useful, but it’s pretty unequivocal that they aren’t useful in aggregate, and that’s a big deal. Those who champion teacher evaluations and those who want them to be useful now have the ball in their court: devise better ones so that the next time someone does similar research the results come out better. But that’s a far cry from saying this study is flawed or unfounded, and a response like @ALF 's “I am NOT buying this, not even a little bit” backed with SSS anecdata is a far cry from a refutation or even rebuttal.

@marvin100 : Nothing like people alluding to the fact that students tend to often let the personality traits of the instructor along with their difficulty/expected grade to dominate ratings. I honestly suspect that the written portion, when offered is useful. However the bubblesheet portions offered by most schools are sketchy to say the least. Some students will randomly give all high marks when the teacher may only be “decent”, but they want to hurry up and leave the classroom. Others may give low marks if they were very challenged in the course. Some instructors announce in advance that they will be having evals. the next class, and then many students skip (and then these are ones that care less or perhaps would give lower ratings). Many will not give a heads up and students will show up like it is another day of class, so they get much higher participation rates.

The participation issue is certainly an effect in large lecturers. In smaller classes, maybe not so much as students don’t want to be noticed as skipping.