Which universities have the most students with over 1500 SATs?

<p>Ok, feeling curious…here’s the CDS 2005-2006 for Yale University:
<a href=“http://64.233.187.104/u/yaleu?q=cache:l10mYoijlsUJ:www.yale.edu/oir/cds.pdf+common+data+set&hl=en&gl=us&ct=clnk&cd=2&ie=UTF-8[/url]”>http://64.233.187.104/u/yaleu?q=cache:l10mYoijlsUJ:www.yale.edu/oir/cds.pdf+common+data+set&hl=en&gl=us&ct=clnk&cd=2&ie=UTF-8&lt;/a&gt;&lt;/p&gt;

<p>We note that there is a distribution of SAT scores from 700-800, per this excerpt of the cited CDS pdf:</p>

<p>‘Percent of first-time, first-year (freshman) students with scores in each range:
SAT I Verbal SAT I Math
700-800 78% 78%
600-699 19% 20%
500-599 3% 2%
400-499 <1% <1%
300-399
200-299
100% 100%’</p>

<p>We also note that the same CDS shows that 1321 men and women enrolled as freshmen at Yale.</p>

<p>Maybe I’m being too simplistic here, but why not just take this data and multiply it out to get the SAT range you want? I just don’t see why this should be complicated…unless this is some kind of math research project or analysis you are conducting. Of course, this method will generate accurate results of the SAT range 1400-1600, not 1500-1600 for Yale in the period 2005-2006.</p>

<p>Oh wow - now I’m wondering what that Afghan Taliban kid at Yale scored on his SAT? Will the sparks fly when he meets up with some strong American females?</p>

<p>parent2noles-
I plugged the Yale data into my formulas. My estimate of the percent of students with over 1400 SATs was 75.2% versus the actual 78%. I think this is very close. This supports the accuracy of my technique even with high-end schools where the SAT “ceiling effect” is most likely to result in a non-normal (skewed) distribution.</p>

<p>There is a curved, not straight-line, decline in number of students as SAT scores increase. That is why it is necessary to use the fancy math.</p>

<p>My technique resulted in an estimate of 47% over 1500 at Yale. What would your method estimate? Yes, you could accurately figure out the number over 1400 using the Common Data Set information but only when the math and verbal percent is the same. Simple multiplication. But, usually the percent over 700 in math is not the same as the percent over 700 verbal. You would have to do math and verbal separately.</p>

<p>Sorry if I missed it, but how did you deal with the factor that makes the distribution definitely NOT normal at the top schools-the truncation of the scores at 800? Once a large proportion of the students truncate at 80, then the assumption of normal distribution fails. </p>

<p>Edit-I took he time to look back and see that this was discussed, but not the answer. Caltech, for example, is going to have a lot of 800 scores. Even it the distribution is normal up to 800, this truncation will reduce the SD, and distort your estimates.</p>

<p>

</p>

<p>This would be fine if the underlying distribution was normal within this range. But if the listed 75th percentil is 800, then one does not know what the real 75th percentile might be. For example, at Caltech, I think that considerably more than 25% of students got 800 in math. The real variance in scores is smaller than estimated.</p>

<p>This approach to finding the SD should work at a “second tier” school where the proportion of truncated scores is low. Then of course there is the problem that the distribution at Yale may deviate from normal even within the interquartile range. Your correct estimate of % over 1400 illustrates the problem. At Yale, 1400 is not a particularly high score, so all your data is bunched above this range.</p>

<p>Observations, not criticisms. I love your work collegehelp.</p>

<p>“This is strong support for the validity of my data.”</p>

<p>1) What data is it strong support for? I thought you had little data. If you had the data then why use this assumed model? Just use the data.</p>

<p>2) I haven’t studied statistics very recently; perhaps someone here can refresh my memory:</p>

<p>What is the statistical confidence level one should ascibe to a prediction of population characteristics , extrapolated solely from one individual data point, when the underlying population consists of many data points? Or for that matter, extrapolated from two data points?</p>

<p>Apprently you believe this confidence level is very high; hence the “strong support” conclusion.</p>

<p>My admittedly fuzzy recollection is otherwise.</p>

<p>There of course is an absolute truncation at 1600. But for each school there will be substantial reduction, past a certain point bordering on truncation, in the size of the distribution below the level of brainbower that the particular school typically deals in. Then, as other posters have suggested. there likely will be a substantial reduction in the curve above the SAT level that the particular school attracts. Because the most adept students may increasingly turn down the school for even higher-brainiac institutions.</p>

<p>So the distribution may exhibit skew (? or kurtosis? I forget my statistics) in both directions, asymetrically. For some schools there might also be 2 humps, as other posters have pointed out.</p>

<p>My guess is that that a normal distribution is not the best-fit curve for this underlying data . One can fit many curves to a pool of data, but some curves will fit bettter than others ; ie have less standard error, and therefore more predictive value. When students have to be first selected by the school, and then they themselves have to select the school, this is not the sort of random pool that normal distibutions frequently reflect with great predictive accuracy. IMO.</p>

<p>collegehelp - </p>

<p>I understand the distribution is not linear; however, your technique has an error rate that is too high at almost 3%, I would suggest. I do appreciate the neat way you’ve put it together, though, and it is fun to think about. </p>

<p>You know, the numbers will be so few at this level is it even material to spend time on? For example, the CDS sets the upper break point at 700-800. I would offer that this limit is born of convention that has substantially proven accurate, in terms of running universities, over millions of kids and decades.</p>

<p>Just a thought. :)</p>

<p><em>cracks knuckles</em></p>

<p>This is cool. It will help me prepare for my doctoral qualifying exam.</p>

<p>As I understand it, your goal is to estimate the percentile associated with a score of 1500 at every school in your data-set. You know mean (or median–I can’t tell), 25th and 75th percentiles, and the total undegraduate population. </p>

<p>There are several problems. First of all, most of the modern statistical research into estimating percentiles starts from sample data. Then there’s a rich and constantly-developing literature on percentile estimates, based on Monte Carlo experiments. Some of the popular percentile estimators are the Harrell-Davis estimator, and various combination of the order statistic. There’s lots of work here, and much of it has taken place in the past few years. BUT, most statisticians are faced with a different problem than yours: they have the sample data, and they’re trying to estimate a population parameter.</p>

<p>You, however, have only summary statistics. What’s more, there would be no sampling involved. The actual populations are finite and knowable. I think most statisticians would find such a problem ‘uninteresting’. If one is going to pay for a statistician to help, one is going to give him (or her) the data.</p>

<p>As for the question of normality, I want to clear up some of statements folks have made. You can not determine normality of a population from looking at a frequency plot. I’ve been taught that it’s a judgment call one makes from looking at the Q-Q plot and the probit plot, and (perhaps) calculating sample kurtosis and skewness. (Plot inspection, however, is considered more reliable)</p>

<p>I think the deviation from normality might defeat you. Some schools may be skewed, while others are symmetrical, but heavy-tailed. Modern practitioners would insist on using a different methodology, depending on the population’s deviation from normality. And, of course, if you changed your estimating model from school to school, well, who would believe or accept the results? </p>

<p>Allow me to suggest that you’re making the model more complicated than you need. “Keep it as simple as possible, but NO SIMPLER” Albert Einstein once said. Wouldn’t you do better by ranking by 75th percentile, and weighting by size? Alternatively, you could rank by “number of merit finalists”. That uses the PSAT in the same way you are attempting to use the SAT. </p>

<p>Anyway, I congratulate you for asking a probing and interesting question.</p>

<p>Ranking by 75th percentile doesn’t answer the question I, for one, find interesting about this topic. For example: Despite the fact that the 75th percentile at Teeny Smart College is 1400, and the 75th percentile at Huge State U is 1250, Huge State might have just as many high-achieving (>1500) kids as Teeny Smart does. That’s what I’d be interested in seeing.</p>

<p>On the other hand, counting the number of National Merit Semi-Finalists would get you to a reasonably equivalent place. IF this data was available. Probably though they just track Scholarship winners, which statistic is somewhat distorted due to corporate and school sponsored NM scholarships.</p>

<p>Collegehelp-</p>

<p>How did WIlliam & Mary and Virginia come out with exactly the same proportion coefficient? W&M’s midpoint is 10-30 points higher than Virginia’s. Sounds fishy…</p>

<p>monydad-
I wish I could determine the shape of the distribution at each school. But, I have confidence that the distributions approximate a bell curve. So far, my predictions have proven pretty accurate and my confidence in the normality assumption is still strong. If my predictions prove fairly accurate, that’s all that matters. Mother nature seems to like normal distributions, especially when it comes to social science.</p>

<p>afan-
Regarding the SAT ceiling and SAT distributions truncated as they approach the ceiling…the worst cases would be Harvard and Yale. But, based on the 25th and 75th percentiles, I estimate the mean/median SAT to be 1490 at both Harvard and Yale. Fifty percent score above the median. 1490 is just slightly below 1500. 10 SAT points represents .076 standard deviations or 3% of the area under the curve. So maybe 53% of Harvard and Yale students actually have SAT scores above 1500. My estimate was 47%, which is pretty close to 53%, and we are talking about the worst-case scenarios Harvard and Yale.</p>

<p>No other schools come as close to the SAT ceiling as Harvard and Yale. Caltech has higher SATs but its distribution is narrower. Therefore, my conclusion is that my estimations are pretty accurate even when the distributions are affected by the SAT ceiling.</p>

<p>macsuile-
W&M and UVA came up with the same proportion purely by chance.</p>

<p>W&M SAT range is 1260-1440, midpoint = 1350, interquartile range=180, standard deviation=180/1.36=132, 1500 minus 1350 = 150, 150/132=1.136 z scores which corresponds to 12.9% of the area under the normal curve.</p>

<p>UVA SAT range is 1220-1430, midpoint = 1325, interquartile range = 210, standard deviation=210/1.36=154, 1500 minus 1325 = 175, 175/154=1.136 z scores which corresponds to 12.9% of the area under the normal curve.</p>

<p>i just take this as further proof that harvard is #1</p>

<p>

</p>

<p>collegehelp, how was the “proportion” calculated?</p>

<p>did you adjust for class size?</p>

<p>“i just take this as further proof that harvard is #1”</p>

<p>-Who in the world disputes this?</p>

<p>US News,apparently :rolleyes:</p>

<p>agoodfella-
see the examples in post #51 and the explanation in post #1.</p>

<p>Perhaps you are asking where I got the conversion from z-score to the proportion of the area under the normal curve above the z-score. There is an online calculator for that:</p>

<p><a href=“http://www.danielsoper.com/statcalc/calc20.aspx[/url]”>http://www.danielsoper.com/statcalc/calc20.aspx&lt;/a&gt;&lt;/p&gt;

<p>You can also find tables in statistics textbooks.</p>

<p>“Mother nature seems to like normal distributions”</p>

<p>Mother nature does not serve on college admissions committees, and only indirectly dictates the decisions of selected students to attend once accepted. These are not random events; they are correlated with the very parameters you are seeking to analyze.</p>

<p>Mother nature probably does like normal distributions in the population at large, but once again the pool here is not the mother nature pool; rather it is selected from this underlying mother nature pool.</p>

<p>Let’s say you took this same pool, and selected only students who had 1300 SATs or 1400 SATs. That is a selected pool, and, though it would have a standard deviation, it’s distribution would not be normal. There would be no 1500 scorers in this selected pool.</p>

<p>The pool you’re working with is also a selected pool. And SAT scores are a criteria in the selection. Not a random variable. So it is a biased pool, and therefore the general tendencies of mother nature are not necessarily applicable to it.</p>

<p>I don’t know how to make this any clearer, so I give up after this.</p>

<p>I was able to get the actual SAT information from U Texas Austin.</p>

<p>from 2005 data
total number of freshmen = 6449 who submitted SAT scores
number with SATs over 1500 = 275
percent with 1500 or above = 4.26%</p>

<p>my estimate of the percent over 1500 = 7.47%</p>

<p>I think my estimate was pretty close considering that:
(a) the percents over 1500 at various universities ranged from 2% to 62%…a 3.21% error doesn’t seem too big
(b) I had very little actuall data to start with…the 25th and 75th percentile</p>

<p>But, yeah, my UT estimate was high.</p>

<p>By the way, the actual data above is for one freshman class of 6449 with 275 freshman scoring over 1500 on the SATs. My number estimate of 2500 plus at UT with over 1500 was for the entire undergrad student body of about 33,000. If you multiply the actual percent 4.26% by 33000 you get about 1400. </p>

<p>So I was off by 3.2% and by about 1100 students at UT. That is a pretty high percent error but 1400 and 2500 (out of 33,000) are in the same ballpark, I would say.</p>

<p>At the beginning of this thread, I described a technique for estimating the percent/number of students over a certain SAT score, in this case over 1500. The method was based on the 25th and 75th percentile SAT scores at a school and assuming a normal distribution of SATs.</p>

<p>I was able to find the actual percent over 1500 at U Florida.
<a href=“http://www.fldcu.org/factbook/2005-2006/xls/t04_00_0506_f.xls[/url]”>http://www.fldcu.org/factbook/2005-2006/xls/t04_00_0506_f.xls&lt;/a&gt;
The published percent is 3.8%.
My estimate was 5%.</p>

<p>My estimate was very close, which suggests that my method based on 25th and 75th percentiles works fairly well.</p>

<p>Look for the “Big 3” state schools (Cal, Michigan and UVa) to have significant leaps in those numbers. Last year, roughly 25% of their freshmen had 1500+ on their SAT. If they can somehow maintain that standard for the next 3-4 years, Michigan should have 6,000 students with 1500+ SAT scores, Cal 5,000 and UVa 3,500.</p>