Class Test Averages - Are They Indicative of Professor's 'Skill'?

Hello all. Just wanted to gather some opinions on title question, to gauge whether my report is a bit solid

Backdrop: There are two different lectures for the same class, under different professors. They are not different in size by too much (both around 100-120 students). Recently, both classes had a midterm. The average results for both were noticeably different.

Now, the question is: can we reliably use the average test scores to indicate if the professor is doing a…less-than-stellar job?

As part of a project for a class, I surveyed exactly 15 students from each class (I only knew a small amount of them, to minimize any bias or whatnot). The sample size is a bit small, but I didn’t want to draw too much attention to myself. The reported results were as follow:

Class A (The slightly bigger class that I am not in): The total scores were all around the same range, with one or two out of that range. The most reported scores were in the range of 30-35 percent, with the two outliars being at 68% and 85%. The average score, if all are taken into account, was 38% (rounded to whole #).

Class B (The slightly smaller class I am in): We had the opposite situation, where scores varied wildly. There were ‘clusters’, but no other observable trend (so far as I could see). The most reported scores were in the range of 45-60 percent (which already indicates a stark difference to me). The average score was 61% (scores outside the numerous range tended to be above 60%).

If both classes are learning the same subject under the same syllabus/lesson plan (I checked them, they are pretty similar with the exception of 1 or 2 topics), why are the average scores so different?

Some possible solutions I thought up were:
-The 1st professor is a harsh grader
-The 1st professor included harder, or introduced new topics in the middle of the test (some of our professors are known to have done this before)
-The 1st professor made the questions more difficult (I am doubtful of this one since their practice midterm was identical to ours).
-Since the sample sizes are small, the results may be an error.
-The smarter people congregated to one lecture(?)

AND the one that I want to focus on:
-The 1st professor’s students didn’t learn/know the topics as well as the 2nd professor’s students.

Could the reason the 1st class did worse on the test reflect on the professor’s teaching method and skill? Most of his class’s scores we packed together, because they tended to miss similar topics. At the same time, most of them reported having way less time for each question than they needed.

In contrast, the 2nd class felt they did worse on parts that differed for most of the group. Also (being in the class), I noticed that many students left early since they finished quickly.

Taking these both into account, is it fair to assume that the first professor overestimated how much his students knew? Or how well they mastered the topics? If his test’s average is far below what a normal average would be for a math class (around 60-70?), then does that mean the professor is failing on some level to prepare the students?

Of course, it is expected for students to master these topics outside of lectures, and most of them do. Almost all of them reported studying at least 2-3 days before the exam. Those same people then reported that they were caught off guard anyways. Again, is this the professor expecting his students to know more than what they’re told/suggested? If a professor exhibits this trait, can he/she be considered a ‘bad professor’?

This is the gist of my report. What I would like, are any observations or questions that maybe I overlooked, as well as y’all’s personal opinions.


TLDR: Does lower test score average mean the professor might be ‘bad’? Or should other factors be to blame? Opinions, observations, all welcome.

(Note: Sorry for the long ol’ post. It’s past midnight, and in my bored stupor, I decided I may as well gather some opinions on my basic report before I show it to my professor)

I took a lower division biology class with a professor who had the lowest average out of 3 other professors. Worst professor ever, did not teach during the following semester.

Later I took an upper division biology class with a professor who had the lowest average out of 2 other professors. Apparently is the “best professor teaching the class” and is tenured.

There are too many variables to factor in. Solely looking at average test scores is not enough to gauge a professor’s competency

A professor can create a well-written test to achieve any desired type of score distribution, whether it be from 70-100, or from 0-50 (on a scale of 100). The mean score is tied to the difficulty of the test, which is completely different from how much the students have learned.

100 students per class is not a small sample size.

  1. Is the identical test being given to both sections?

  2. Are the sections populated by students of the same types of major? Because of schedule, is one class mostly full of engineering majors and the other class mostly full of sociology majors?

Same test or different tests?

If different tests, the other variable you are not considering is that one instructor may be giving a harder test than the other.

To build on Post #3, the task of ensuring that students enrolled in the classes randomly is complex. There are at least two additional factors that could skew the results:

  1. Savvy students (who would get good grades anyway) might have enrolled in one professor's class because of his/her good reputation, thereby "confirming" the professor's reputation.
  2. Foolish students might have enrolled in a class that met at a time of day when they were not fully alert -- like 8:00am or right after lunch.

As a professor myself, I see both of these factors playing a very large role in student outcomes.

Again, building on Post #3, population can make a big difference. At times, I have had classes with a large percentage of : football players, international students, dance majors, pharmacy majors . . . and the class becomes a very different experience.

@OhSorryYo I am aware that upper division classes can be a different matter. This class, however, is just a lower division class.

@hebegebe I understand the mean score can vary because the professor intended it to be. However, shouldn’t the professors seek to make a test where the average matches us with a normal 0-100 scale? I would assume that most wouldn’t actively seek to create tests where grading requires some sort of curve, or aim to create a mean score far below what other professors who have taught the class usually achieve.

That might be my personal opinion.

@GMTplus7 The total class size is not small, but my random samplings for survey/questioning were small (15 person from each lecture). That is what I meant by small sample size (below 20% of total population for each class).

Luckily, the professors let us take the tests with us after the midterm. I compared the midterms for our class and theirs. For the most part, the questions that were of similar topic were the same difficulty, with one or the other test having certain questions be more difficult. There were, however, topics covered exclusively on one or the other test, which might account for the difference in mean score.

As for majors, both lectures are populated with the following majors: Engineering (all of them), Computer Science, Math, Actuarial Science, Statistics, Chemistry, Bio, and Physics (I might have missed some of them). As the list implies, no other majors other than math-focused majors take the class (in other words, no Humanities majors are interested in the class).

I also considered the difference in time of day. My lecture is during the morning (at 8am), while the other class is in the afternoon (1pm). This should mean that the afternoon class would perform better, due to being more awake, yes?

@ucbalumnus Like I stated above, the tests were of comparable difficulty, where they overlapped. Within the overlapped topics, the difficulty would be higher in one class, and sometimes in the other. The topics covered diverged towards the end of the exam, which might have created differences in difficulty, if the differing topics were of differing difficulty.

@WasatchWriter The second factor is one that confuses me. My lecture is the 8am lecture, while the other is at 1pm. However, the mean score is much higher for our lecture than for theirs, despite both classes being around the same size and pure STEM majors. This is what lead me to believe maybe the difference were in the professors.

As for the first point, there is a reported difference in average grades for both professors. Using ratemyprofessors.com, the first professor’s past students reported a B- average grade, while the second professor has a reported A- average grade.

I don’t believe test difficulties were the main divider of average grades, since the tests were both of similar difficulty. In fact, I might say that the 2nd professor’s test are more confusing, as they are mostly theorem/proof based, while the 1st professor’s test is more about solving with minimal proof based questions.

It could be a number of things. Not quite conclusive.

Unless these tests were identical and were graded by the same person, the comparison of grades across the two professors would not be very meaningful. Consider a prof who gives half credit to anyone who attempts a question and also only has three choices rather than 4 or 5 for each multiple choice question. This prof will tend to have higher grades than someone else who doesn’t give as much partial credit or has more choices to consider on multiple choice exams.

Also, the number of students from each class who post on ratemyprofessor is so small that that comparison isn’t very meaningful either.

It sounds like there are too many confounding factors (different tests being the biggest one) to draw a good conclusion from this data.

I will say, however, that when I taught college, I certainly looked at test scores as an indicator of my skill. If two students blew the midterm, shame on them. If the whole class blew the midterm, shame on me.

In big lower division courses, “grading on a curve” is often done because carefully calibrating the difficulty of test problems is not an easy task (consider how the ETS prototypes SAT questions with experimental sections, something that college instructors do not do). It also allows giving difficult problems rather than loading up the test with easy problems like in high school where C students are supposed to get 70% of them correct.

At the end of the day, does it really matter?

You seem to have put a lot of energy into proving that your professor was worse than the other one.

Great…let’s assume that you’ve proved it.

Now what? What does your report accomplish? Where do you go with that info? Does it put a little asterisk on your transcript? A little PS on your job applications?

It’s entirely possible that you got the worse professor— or the professor who was worse for you. OK, so next time there’s a choice you choose the other professor.

But at the end of the day, you play the hand you’re dealt. It’s like one of my geometry classes… the kids in it are significantly less motivated than my other 3 geometry classes (or even my one algebra class.) And in addition, I have them at the end of the day, after lunch, when kids tend to be a lot more hyper.

So I’ve got all the excuses I need. But I’m still expected to teach them geometry… and to get them to learn geometry. (And, no, those are not the same thing.)

But to answer your question, I think that it’s very subjective. Of course kids are going to say they studied 3 days before a test, particularly if they’re trying to make the case that the professor is ineffective. There are so many variables… what time of day did the two courses meet? How was attendance? Were the majors of the two classes comparable? (It’s entirely possible that the timing of another class meant that kids of a particular major all took your course at the same time.)

Is it possible that the easier professor is actually feeding too much test info to his students, as opposed to expecting them to learn it?

If there’s one thing I’ve learned with 100% certainty between high school, college, medical, and graduate school, it’s that you can never trust someone’s self reporting of their studying. Not everyone will skew in the same way, but almost no one did whatever you think “3 days of studying” means. Some people did “3 days of studying” but they’ve been reviewing class notes and the textbook every day before and after class the entire semester. Some people did “3 days of studying” but that means they opened the textbook for the first time 3 days before the exam and spent 2 hours each day in the library and 85% of the time they were on Facebook.

As @UCBalumnus points out, the “0-100” scale that most high school students are used to is a pretty bad scale. Failing is often under 60 so you’re really only left with 40 points of room to differentiate people who didn’t fail (and only 10 points for each letter) in contrast to 60 points of room to differentiate the students who did fail - you don’t need that much stratification of bad students. In many schools, many profs want to differentiate great students from truly outstanding students so they make the grades overall lower because it gives more room at the top where an outstanding student can pull away from the pack and similarly, gives them more room to work with in separating the passing students in general.

@bjkmom I’m a bit confused as to how you arrived at the conclusion that I think my professor is worse? In fact, I would say my professor is better for me, from what I’ve seen. I have no problem with the class either way, as I’m passing it with an A currently.

Just to clarify: everything I’ve reported on has been about a different professor, NOT my current professor. I have been comparing to results found in my current class, just for comparison’s sake. Also, this isn’t for some “it wasn’t me, it was the professor” kind of thing; I’m simply doing a small research project for one of my classes; of course I’m putting a lot of effort into this, because it’s a project. Does that clear up some of the confusion?

Anyhow, my biggest doubt has been about this point: the self reported studying time. Obviously, I don’t monitor them, so I have to take what they say with a grain of salt.

If there was one fact that might have made them honest: this class is essential for ALL STEM major (which is the only group taking it). If they cannot pass this class, then they are advised to change majors. Owing to the severity of this class, even though they may not be completely honest in their assessment, I can probably believe that most of them have put the effort in for the test they needed to take.

@iwannabe_Brown I see. Looking at grading through the previous HS scale is a failure on my part, I didn’t consider this new formatted scale you mention.

That scale is followed closely in my class, and is reflected in they syllabus, which lists a grading policy that predicts a -60 grade to be in failing territory. The other class, however, needs a massive curve to be similar to our graded system.

In the end, there might be too many variables for me to make an assumption, as has been pointed out many times in the previous posts.