PWNtheSAT
So I saw someone on Yahoo! Answers suggesting that C was the most common answer choice on the SAT. I hear people say this all the time and it drives me batty, mostly because I'm afraid students will actually consider such misinformation actionable and fill in a bunch of C's. You deserve what you get when you do that, I know, but still I worry.

Anyway, I decided to spend some time this morning counting up all the choices in the Blue Book (leaving out Grid-Ins, obviously) to prove a point, and ended up proving the point much less firmly than I had hoped to. There's actually a fair amount of variability over what seems to me to be a big enough sample to mitigate most of the noise.

Here's a link to the data.

https://spreadsheets.google.com/pub?hl=en&hl=en&key=0Agar2-EatDDSdGZPeUI5UE9mSEtEcnBrWVRzbnpmSkE&output=html

I don't really expect any of this to be useful information, but I did want to solicit your opinions on this, since this is a community of people who seem to think about the SAT as much as I do.

Also, I know there are some people on this board who are at university and still poke around here. I'm interested in hearing from someone who's taken a stats class more recently than me about statistical significance of, say, the infrequency of A. So, you know, if you wanna nerd it up with me, I'm up for that. PM me.

For a total of 572 multiple choice questions, here were the letter frequency numbers and corresponding z-scores:

A 108 -0.67

B 116 0.17

C 119 0.48

D 120 0.59

E 109 -0.56

(expected value for each: 114 to 115).

You would need a z-score of more than 1.96 or less than -1.96 to say with a decent level of confidence that these letter frequencies are NOT uniformly random.

7,753Senior MemberI was able to crack my last Reading one (ha, and who says Tuvaluan history never came in handy), but screwed up the code on one of the math sections. For the essay, I went with the ol' binary bypass, but the computer picked up the 874th 0 as an O, so I didn't get a 12.

1,414Senior Member7,753Senior Member337MemberHere were my results:

1st Run:

1: 326

2: 311

3: 336

4: 305

5: 322

2nd Run:

1: 315

2: 338

3: 309

4: 314

5: 324

3rd Run:

1: 294

2: 323

3: 322

4: 328

5: 333

4th Run:

1: 329

2: 321

3: 360

4: 291

5: 299

The first three runs were fairly even, but the fourth was even more lopsided than the Blue Book analysis, and it was generated by completely random atmospheric noise!

7,753Senior MemberUnless... College Board... is... Oh my God.

Run.

240Junior MemberI wasn't expecting any surprising results, but the difference of 50 between A and D raised my eyebrow a tad. After having a look at buffalowizard's post...not as surprising as I thought. :)

Thanks guys!

1,044Senior MemberI forget where I read this, but it seems related: ask half the students in a class to flip a coin 100 times and record the results, while the other half just pretends to flip coins, but actually just makes up a random string of 100 h's or t's. Then examine the data. You can usually tell who did the experiment vs who faked it. The fake data doesn not have as many weird strings: runs of lots of h's in a row say. For example, 5 in a row seems unlikely, but in a random set of 100, it is actually more likely than not to have at least one such stretch.

Of course, that leads to a probability question, too hard for the sat: you flip a coin 100 times. What is probability that you do not get at least one stretch of 6 h's or 6 t's in a row?

1New Member4,592Senior MemberThis reminds me of the 2012 AMC12A, where the answers to #22 through #25 were all C.

6,781Senior MemberIf I were an evil SAT test designer, I'd make the answers to the Level 5 questions more likely to be A or E than anything else--the theory being that those who were just guessing would go with B, C, or D. Bwa ha ha! (Oops, kind of a give-away)

