Most common answers on the SAT

PWNtheSAT · March 14, 2011, 3:13pm

So I saw someone on Yahoo! Answers suggesting that C was the most common answer choice on the SAT. I hear people say this all the time and it drives me batty, mostly because I’m afraid students will actually consider such misinformation actionable and fill in a bunch of C’s. You deserve what you get when you do that, I know, but still I worry.

Anyway, I decided to spend some time this morning counting up all the choices in the Blue Book (leaving out Grid-Ins, obviously) to prove a point, and ended up proving the point much less firmly than I had hoped to. There’s actually a fair amount of variability over what seems to me to be a big enough sample to mitigate most of the noise.

Here’s a link to the data.
<a href=“https://spreadsheets.google.com/pub?hl=en&hl=en&key=0Agar2-EatDDSdGZPeUI5UE9mSEtEcnBrWVRzbnpmSkE&output=html[/url]”>https://spreadsheets.google.com/pub?hl=en&hl=en&key=0Agar2-EatDDSdGZPeUI5UE9mSEtEcnBrWVRzbnpmSkE&output=html</a>

I don’t really expect any of this to be useful information, but I did want to solicit your opinions on this, since this is a community of people who seem to think about the SAT as much as I do.

Also, I know there are some people on this board who are at university and still poke around here. I’m interested in hearing from someone who’s taken a stats class more recently than me about statistical significance of, say, the infrequency of A. So, you know, if you wanna nerd it up with me, I’m up for that. PM me.

EAsoccer10 · March 14, 2011, 3:33pm

lol do you have any hobbies, sports, or friends to entertain you? sorry but this is kind of extreme hahahaha ;)

PWNtheSAT · March 14, 2011, 3:35pm

A fair question, to be sure. I get paid pretty well to be very good at the SAT, though. This is work, but I also kinda enjoy it.

EAsoccer10 · March 14, 2011, 3:36pm

ya i was going to say this is kind of interesting though :)

fignewton · March 14, 2011, 4:32pm

A little while ago, I did a study of this question using 13 past (QAS) exams (math only).

The results are consistent with a uniform random distribution of answers, i.e., p(A) = p(B) = p(C) = p(D) = p(E) = 0.2.

For a total of 572 multiple choice questions, here were the letter frequency numbers and corresponding z-scores:

A 108 -0.67
B 116 0.17
C 119 0.48
D 120 0.59
E 109 -0.56

(expected value for each: 114 to 115).

You would need a z-score of more than 1.96 or less than -1.96 to say with a decent level of confidence that these letter frequencies are NOT uniformly random.

BillyMc · March 14, 2011, 4:43pm

Actually, little do most people know, each SAT has a secret code built into the answers. If you get the first 5-6 right in the section, it should only take you 10-15 minutes to crack the rest of the code. In June 2007, they used the Fibonacci Sequence (alphabetical with the usual encoding exclusions, of course) for most of the Writing section, but it was too obvious; that’s why so many people got 760-800 on the June 2007 Writing. But there is suspicion that people knew ahead of time because a North Korean stole the code.

I was able to crack my last Reading one (ha, and who says Tuvaluan history never came in handy), but screwed up the code on one of the math sections. For the essay, I went with the ol’ binary bypass, but the computer picked up the 874th 0 as an O, so I didn’t get a 12.

fignewton · March 14, 2011, 4:47pm

Wait, I thought the code was the digits of pi (mod 5)!

BillyMc · March 14, 2011, 4:49pm

Nah, they tried that November 2008, but just once; nearly as obvious as the whole Fibonacci fiasco.

buffalowizard · March 14, 2011, 5:11pm

I don’t know stats, but I just had some fun at random.org asking for sets of 1600 integers from 1-5 (link: [RANDOM.ORG</a> - Integer Generator](<a href=“RANDOM.ORG - Integer Generator”>RANDOM.ORG - Integer Generator))

Here were my results:

1st Run:
1: 326
2: 311
3: 336
4: 305
5: 322

2nd Run:
1: 315
2: 338
3: 309
4: 314
5: 324

3rd Run:
1: 294
2: 323
3: 322
4: 328
5: 333

4th Run:
1: 329
2: 321
3: 360
4: 291
5: 299

The first three runs were fairly even, but the fourth was even more lopsided than the Blue Book analysis, and it was generated by completely random atmospheric noise!

BillyMc · March 14, 2011, 5:14pm

Atmospheric noise isn’t random; the aliens are trying to contact us.

Unless… College Board… is… Oh my God.

Run.

PWNtheSAT · March 14, 2011, 5:21pm

I guess I deserve the ribbing for going so aggressively public with my nerdery this morning. Thanks, though, fignewton, for admitting that you’ve also done something similar before.

I wasn’t expecting any surprising results, but the difference of 50 between A and D raised my eyebrow a tad. After having a look at buffalowizard’s post…not as surprising as I thought. :)

Thanks guys!

pckeller · March 14, 2011, 7:44pm

Nothing too nerdy about this at all! Some related nerdery: pick up any section, look at the answers and you will find things like: 12 in a row with no ‘d’, only 2 e’s in an entire section…other similar weird stretches – and these are all completely normal. People just don’t expect weird sequences as often as they come up.

I forget where I read this, but it seems related: ask half the students in a class to flip a coin 100 times and record the results, while the other half just pretends to flip coins, but actually just makes up a random string of 100 h’s or t’s. Then examine the data. You can usually tell who did the experiment vs who faked it. The fake data doesn not have as many weird strings: runs of lots of h’s in a row say. For example, 5 in a row seems unlikely, but in a random set of 100, it is actually more likely than not to have at least one such stretch.

Of course, that leads to a probability question, too hard for the sat: you flip a coin 100 times. What is probability that you do not get at least one stretch of 6 h’s or 6 t’s in a row?

mrprez29 · October 5, 2013, 10:00pm

Is this a joke?

MITer94 · October 5, 2013, 10:20pm

Fibonacci numbers mod 5 would be interesting except that every fifth Fibonacci number is 0 mod 5.

This reminds me of the 2012 AMC12A, where the answers to #22 through #25 were all C.

QuantMech · October 6, 2013, 12:02am

PWNtheSAT: Great analysis–I think it’s quite interesting to look at the data, and it’s perfectly valid as a hobby.

If I were an evil SAT test designer, I’d make the answers to the Level 5 questions more likely to be A or E than anything else–the theory being that those who were just guessing would go with B, C, or D. Bwa ha ha! (Oops, kind of a give-away)

malkovichio · October 6, 2013, 2:49am

Hey, you’re that guy who’s the reason I did so well on all the sections! YOU ROCK!

pckeller · October 6, 2013, 6:54am

@Quantmech

Your evil plan raises an interesting question: does the distribution of answers change as the test questions get harder? Maybe if Fignewton still has his data set, he can break it down…

I’m guessing the answer is no. But if the answer is yes, it doesn’t necessarily indicate evilness. They just may not want students getting the right answer for the wrong reason. You can accidentally write a question that has weaker students getting it right more often than stronger ones so that the testers scoring 500 and 750 get it right but the kids in the 600s miss it.

fignewton · October 6, 2013, 8:18pm

^That kind of accidental question would be caught in the pre-testing, I imagine.

Anyway, here are the stats for all the math questions I have:



Total 5-choice questions:   968
Expected number per letter: 193.6
Expected range per letter:  181.2 - 206.0 (one std deviation)</p>

<p>Results:</p>

<pre><code>    observed     expec  one-var z  chi-sq
    --------     -----  ---------  ------
A  187  (19.3%)  193.6  z = -0.53  0.225
B  201  (20.8%)  193.6  z =  0.59  0.283
C  200  (20.7%)  193.6  z =  0.51  0.212
D  201  (20.8%)  193.6  z =  0.59  0.283
E  179  (18.5%)  193.6  z = -1.17  1.101
</code></pre>

<p>For one variable z, we need |z| > 1.96 for
95% confidence of a non-uniform distribution.</p>

<p>Total chi-squared = 2.10  (4 degrees of freedom).

Here are the stats for the level 4 and level 5 questions only:



Total 5-choice questions:   245
Expected number per letter: 49.0
Expected range per letter:  42.7 - 55.3 (one std deviation)</p>

<p>Results:</p>

<pre><code>    observed     expec  one-var z  chi-sq
    --------     -----  ---------  ------
A   53  (21.6%)   49.0  z =  0.64  0.327
B   47  (19.2%)   49.0  z = -0.32  0.082
C   45  (18.4%)   49.0  z = -0.64  0.327
D   43  (17.6%)   49.0  z = -0.96  0.735
E   57  (23.3%)   49.0  z =  1.28  1.306
</code></pre>

<p>Total chi-squared = 2.78  (4 degrees of freedom).

Looks like the harder questions are consistent with a uniform distribution as well.

johnstucky · October 6, 2013, 9:27pm

lol I can’t imagine the college board going though all the trouble to make the answer choices based on the digits of pi. This would be especially difficult for the math section as answers tend to be in order from least to greatest, so they couldn’t just shuffle the answer choices around.

QuantMech · October 6, 2013, 10:28pm

Personally, if you are in to guessing, and the question looks relatively easy, I’d say go with B, C, or D. If it seems hard, go with E by preference, or A for variety.

Most common answers on the SAT

CONNECT WITH US