TA confession: I'm sorry, but most of your children (my students) are average

One needs to be careful about drawing extensive conclusions when looking at correlation with a single variable in a population that was selected across a large number of variables. For example, suppose SAT score is correlated with the quality of career related skills learned during college, and the quality of career related skills learned during college is correlated with job performance. In such a hypothetical scenario. it would be possible for SAT score to have a decent correlation with job performance, and also SAT score not having any significant benefit to hiring beyond testing of career specific knowledge.

Or suppose SAT score is correlated with just about every applicant evaluation criteria including college major, college GPA, interview success, degree of relevant experience, and various other criteria employers hire for. If so, then it’s possible for the score to add little benefit beyond the existing hiring criteria, and yet the score to still show a decent correlation with job performance alone. In such a hypothetical scenario the score might instead have the negative effect of decreasing SES diversity, while not notably improving hiring. I’m not saying SAT score has no benefit, only that looking at correlations with just one criteria in non-random population can result in misleading conclusions. The more interesting measure would be comparing a what happens if a company keeps all of their hiring procedure the same except for adding/removing test scores. In studies of SAT correlation with academic success in college, the testing benefits decrease tremendously when controlling for other applicant evaluation criteria like this.

Regarding management consulting, it depends on the firm. There is not a universal hiring procedure. For example, the study at http://www.sciencedirect.com/science/article/pii/S027656241000065X looked at the hiring procedures at 120 “elite” management consulting, law, and investment banking firms. She describes a large variation in hiring methods, but the typical hiring procedure was quite different than your description, with fewer than 1/3 of these “elite” employers even using standardized testing in hiring decisions for entry level employees. She describes the hiring process as:

She also asked the “elite” employers to list all the criteria they use in hiring decisions and select the one from that list that is most important. 40% of surveyed “elite” management consulting firms said cultural fit was the most important criteria, rather than anything related to performance on the sample case test. The paper also emphasizes the extracurricular screen is a way of checking cultural fit, in addition to the interviews. For example, one hiring manager wrote,

Another wrote

Other industries usually having completely different hiring procedures. In surveys of general employers (not just “elite” banking/consulting/…), employers typically say they emphasize relevant experience, and college reputation has little impact. Most say they do not use scores. For example, I mentioned that when I was a new engineering grad applying for jobs at west coast tech companies, none of the many tech companies I interviewed with requested SAT, GRE, or other standardized tests; yet all of them had interview style testing of technical knowledge that was specific to my field and often the job position.

Anecdotally I’ve observed that when SAT screen is used, it is typically used at jobs that emphasize training a new skill that his little relation to the focus of college coursework. For example, a CS major applying to Apple for a CS related job that closely ties in to what he learned in college probably isn’t going to be required to submit SAT scores and is instead going to be tested on his CS knowledge during a series of interviews; while a CS major applying to a “elite” management consulting job that does not particularly relate to the CS skills he learned during college is more likely to get a SAT screen. This fits with the studies you referenced where testing was more predictive for training success than for direct performance.

@Periwinkle That was their initial results, but surprises come later:

https://www.insidehighered.com/views/2011/06/16/connor_essay_on_why_majors_matter_in_how_much_college_students_learn

According to CLA (Collegiate Learning Assessment) then, the best majors for improving critical thinking are sociology, multi- and interdisciplinary studies, foreign languages, physical education, math, and business, (after factoring out cognitive ability to make sure we are comparing apples to apples).

So after 4 years of study, phy ed majors show greater growth in critical thinking than English majors, and education majors do as well as history majors. What happened to physical science majors? What happened to business majors after their sophomore year?

Here is another example of the power of standardized testing- it shows us things we would never have guessed, as well as things we may prefer not to know.

@ucbalumnus A philosophical question requires a philosophical answer. I do not judge them on an absolute moral level, but I do evaluate them on a relative plane-as students and as employees.

You have offered a derivative of St. Anselm’s Proof. His argument turns on the word “existence”; yours turn on the word “judge”.

@Data10 You have touched upon the problem of “multiplicity” in statistical inference. Psychometric is well aware of the problem of trying to test multiple hypotheses all at once. Social Sciences aside, it is a serious issue in medical research. Ioannidis once stated that most medical research results are wrong. While his mathematical analysis has been challenged (irony!), his conclusion is widely accepted. I begin to think it may be necessary to have a qualified statistician on the research team to ensure research quality, but that is another issue for another day.

You mentioned Rivera’s work, whereas I was looking at Manzi’s response to it. The notion of “coasting on the miracle of admission to a super-elite school” into a Rivera firm is repulsive to me. My preference for Manzi’s meritocracy to Rivera’s plutocracy reveals my own bias, I suppose. The truth may be here:

Hsu, a physicist at the University of Oregon, then offered a further distinction between hard and soft firms, which are looking for subtly different skills. His distinction turns almost entirely on quantitative abilities. Hard firms, like hedge and venture firms and tech startups, demand sheer mathematical brainpower and will take it where they can find it. Soft firms such as investment banks, law and consulting firms that sell services, like advice, that is more “nebulous” and harder to measure, and where “prestige” matters more, embrace the elite-school brand more readily.

http://www.thedeal.com/thedealeconomy/the-debate-over-elite-schools-and-elite-jobs.php

Statistically speaking, testing being more predictive for training success than for direct performance is just another “range restriction “problem.

Physical education is a good major for improving critical thinking? What is the baseline and where do those majors end up on the critical thinking scale? Do the English and IR majors start at a higher level of critical thinking and do they end up at a higher level than phys ed majors? If so, they may no show as much improvement because they did not have as far to go. That is, if there is a ceiling on the critical thinking test, do phys ed majors hit it at a higher rate than other majors? The fact that there is improvement is good, but the goal is to reach a high level of critical thinking.

^This.

It always baffles me when someone complains that the average on a certain test was 50%, without saying anything about what that 50% corresponded to. It could be anything. There is no reason why a certain percentage should correspond to a certain letter grade.

^ The first question anyone asked me in college after I stated what I got on my test was "what was the mean and what was the std dev?

The article isn’t very clear about methods, but if they are using the Kalamazoo data, it appears to only include 140 students. How many of them were phys ed majors? Only 1 or 2? I assume sample size is not statistically significant. This relates to why it’s good to use results from actual studies that list things like sample size, rather that an author of an article summarizing the result he got back after asking someone to “crunch some numbers” for him.

For example, the study of 13,000 students at http://cae.org/images/uploads/pdf/Majors_Matter_Differential_Performance_on_a_Test_of_General_College_Outcomes.pdf found completely different results. Business majors had the least adjusted gains (they were on your list of majors with highest gains). Other vocationally focused majors also did poorly. The majors groups with above average gains were natural sciences, social sciences, and humanities. There was also significant variation on specific CLA tasks. For example, natural sciences were best on 4 of the 6 tasks, yet they performed below average on the other 2 – “Classify writings and artwork as representative of different themes” and “Determine the cause of a recent accident involving a young student.”

Note that Manzi is a blog writer who posted about his personal experiences. In contrast, Rivera published a study encompassing the 120 persons in hiring positions. Those 120 had a good variation between them. There was not one universal hiring procedure. Some of the surveyed “elite” employers probably did have Manzi like hiring policies, but they were not in the majority of those surveyed. Maybe “hard firms” are more likely to test than “soft firms”, but I would not assume all of them have a similar hiring policy, and I certainly would not assume all firms have hiring policies like one blogger’s experiences.

One of the studies you linked offers a different explanation. They found that the bulk of the GMA job performance correlation was driven by job knowledge, rather than GMA factors that were not encompassed by job knowledge. However, GMA showed a much stronger correlation with job knowledge, independent of performance, than it did to anything related to performance or ratings… This would suggest that GMA is more correlated with acquiring job knowledge (training) than with job performance. It also suggests that if employers test for job knowledge (for example, the tech interviews I described), additional benefits of GMA testing will be much smaller than suggested by studies looking at correlations between GMA and job performance alone, without considering the effects of any other existing hiring criteria.

I like the article you posted on post 326. Here are the highlights for me:

1)choice of major is associated with entering academic ability (Arum & Roksa, 2011).
2)critical thinking and writing skills can be assessed without fear of content knowledge conflation.

The article I posted has data that are more current (2011) and more granular (I don’t think it is enough to say humanities. How can I be sure there is no difference between studying philosophy vs studying English?). Roger Benjamin heads the organization that runs the CLA, so his data are the most comprehensive as well. No, the author was not using the Kalamazoo data at all. He pretty much said as much. I agree it is better to use actual studies if it is available though.

By the same token, Manzi is more than just a blocker. He mentioned that he left consulting because he sees great opportunity in technology. I see his qualification as more, not less impressive, than Rivera’s:

https://en.wikipedia.org/wiki/Jim_Manzi_%28software_entrepreneur%29

My point is not related to how many uses the Rivera approach vs. the Manzi approach, but that the Manzi approach is better supported by empirical evidence. This also goes a long way in explaining the obsession with elite schools here on CC. With holistic admission, one can be a lucky sperm. With standardized testing? I don’t think so.

Having other existing hiring criteria can certainly reduce the effect of GMA, but it is also important to note additional criteria are not necessarily associated with better outcome:

In one study, college counsellors were given information about a group of high-school students and asked to predict their freshman grades in college. The counsellors had access to test scores, grades, the results of personality and vocational tests, and personal statements from the students, whom they were also permitted to interview. Predictions that were produced by a formula using just test scores and grades were more accurate.

http://www.newyorker.com/magazine/2005/12/05/everybodys-an-expert

As I often said, sometimes reality is stranger than fiction.

This also gives me another reason to be suspicious of holistic admission.

Freshman grades in college aren’t everything. I got all B’s first term freshman year. I graduated magna cum laude with highest honors in my major.

Most moderately selective colleges just use grades and/or rank and test scores for admission for most students. So nothing new here. The super-selective colleges have huge numbers of applicants with grades and test scores pressed up against the top end of the range, so they either must add other criteria or let trivial differences decide in a strict grades and test score ranking (which can create incentive to game the ranking by choosing easier courses and tests).

The article talks about the Kalamazoo data, then one sentence later asks Benjamin to “crunch some numbers” without any mention of a different sample group than the Kalamazoo data. The fact that Benjamin was involved in the CLA standardized testing initiative does not mean he has access to all raw data for studies involving use of the CLA. However, I think the most reliable to determine what data is reliable is to look at multiple studies and see what results are part of a trend, and what results are outliers; then look for a logical explanation for the conflict.

For example, the study I posted above found among 13,000 students across many colleges natural sciences, social sciences, and humanities had the highest gains; and business majors had the lowest adjusted CLA gains. The study of 2,300 students across many colleges described at http://files.eric.ed.gov/fulltext/ED514983.pdf also found science/math, social sciences, and humanities had the highest gains; and business majors had the lowest CLA gains. The article you posted, which was not a published study and mentions no sample group other than the 140 Kalamazoo students, found completely different results, with business majors having among the highest CLA gains instead of the lowest. Do you think the most likely explanation is the studies of 13,000 students and 2,300 students across many colleges are wrong, and instead the this article that lists unclear details and no sample group besides the 140 Kalamazoo students is the more reliable source?

It’s not a matter of who has the most impressive resume. The difference is Rivera wrote a published and peer reviewed study that looks at 120 persons in hiring positions at “elite” firms and describes + analyzes those results. Some of those 120 probably had more impressive resumes than Manzi. She mentions a good amount of variation in hiring policy among different employers in the study, related to differences in both individual biases and company policies,so different individuals at different companies are expected to have different personal experiences involving hiring. Manzi did not write any kind of peer reviewed study. Instead he posted an article/blog about his personal experiences. Manzi’s sample size was ridiculously smaller than the 120 in the study. In short it’s the same trusting articles with miniscule (or undefined) sample size over published studies with large sample size issue as the CLA article you linked to.

The study you are referring to took place more than 50 years ago, in the 1960s and involved students who attended college in the 1950s. Both the sample group of number of counselors and number of students were very small. The counselors made predictions based on either following a preset equation based on stats, or not following the equation and also having access to results of Vocational Interest test, MMPI, and biographical information. They found that when the counselors had the choice of either including the Vocational Interest test, MMPI, and biographical information or not including them there was essentially no difference in predictive accuracy of freshman grades. They use the word “identical.”

How is this a reason to be suspicious of holistic admissions? College admissions has changed a lot since the 1950s and 1960s. The non-stat criteria used in modern holistic admission, such as LORs, essays, course rigor, … are very different from a set of Vocational Interest and MMPI test results. Vocational interest tests and personality tests probably do not have much impact on freshman grades, but that does not mean LORs, essays, course rigor, and other non-stat criteria also have little impact on a grades. College admissions is also far more complex than just a counselor estimating freshman GPA without direction. Admissions officers are directed about how to select applicants, selections are generally reviewed and voted for in groups, internal studies review success in achieving institutional goals… essentially checking how well they are doing, etc. And of course holistic admission goals involves more than just selecting the class that will have the highest freshman year GPA, without consider effects of the new curve.

@Data10 With regard to the CLA data, I guess we have to agree to disagree again- I have a lot more faith in Robert Conner and Roger Benjamin than you do. We will just have to wait for further studies to untangle the results. I have no problem with it.

As far as Rivera and Manzi go, I believe the Manzi model will win out over the Rivera model in the long run, but those who advocate the Rivera model will fight it tooth and nail. Again, I have no problem with it one way or the other.

The reason I brought up the “college counsellors” is because I find the result delicious. The real issue is the introduction of the human element invariably makes the prediction (decision) worse. This is what research has shown over many years. Why then do they persist unless there is an ulterior motive?

You are a linear thinker par excellence. On the other hand, I think globally. This is the second time in this thread that I questioned holistic admission. There are many other reasons as well, such as the history of holistic admission, the re-centering and the re-making of the SAT, grade inflation, Randall Collins’ work etc.

These are all side issues to my main point- that an exit exam to measure critical thinking is not only possible, but necessary.

But an exit exam to measure which thinking is necessary?

Ah, there’s the real problem.

Verbal reasoning, analytical reasoning, and numerical reasoning would be a good start.

I suggested up thread the use of the GRE and a subject test. As a parent, I am interested in the subject test. As an employer, I want to know both.

@xlmdienex

Agree with you that grade inflation can be a problem.
Agree that some students think that merely working hard entitles them.
Agree with most of the responders here that your attitude is condescending and obnoxious.

Disagree that your test had a “perfect” curve. Really – with 100+ students no one was able to score higher than a 94?? Wow – that tells me that either you have NO intelligent and hard-working students in your class, OR that the test was too hard, OR the prof was ineffective at teaching this subject matter.

Why does everything have to be compared to a bell curve anyway? Shouldn’t it be about measuring who mastered how much of the subject matter? If a test is a good test, the students who mastered the material – no matter how many of them there are – will get a high score, and those who do not know the material will get low scores. Let’s say there was a brilliant and engaging professor who inspired a large percentage of the class to push themselves and really learn the material. Was it a bad test if the average score was above your 70-80 range? Should the “average” student in that class still get a C and half the students score higher and half the students score lower?

Put another way – are grades meant to differentiate students from their peers (i.e., always score on a curve), or are grades meant to indicate whether and to what extent they have mastered the material?

I’d rather see a world where we reward and rank students in accordance with their mastery of the material, not in accordance to their performance relative to their peers. I think the same way about job performance.

When I was supervising a team of employees, I rewarded all of the ones who performed their job responsibilities in an exceptional way, whether that happened to be one person or 2/3 of my organization. Rewarding absolute performance, not relative performance, encouraged teamwork, made for a much more pleasant work environment, and elevated overall performance. It is as absurd to suggest that my team would naturally fall into a bell curve where the average employee was a “C” performer, as it is to suggest that the average college student taking a course that could lead to med school would be a “C” performer.

@soccermomgenie very well said. Why is it important to fail/low grade a certain number of students who actually get it, why is that desirable? So we can produce fewer engineers and doctors and whatever else?

Having faith in a person’s name is one thing. Ignoring multiple published studies that encompass a total of more than 15,000 students at dozens of colleges because of blind faith in an article with lots of relevant detail missing is something else. If you have faith in Benjamin, he wrote the following in COE report that is more recent than all of the referenced studies and articles,

“In other words, students majoring in the arts and sciences tend to do better on all of the [CLA] performance tasks than do students in applied professional fields.”

Rivera didn’t create a model of what employers should be doing. Instead she analyzed what 120 elite firm employers are doing and published the results in a peer reviewed journal. Nobody including Rivera has suggested that Rivera’s results of what employers are doing is desirable, so you have nobody to fight “tooth and nail” against. I brought up the study because you implied all management consulting groups have very different hiring policies.

Rivera does not describe a single hiring policy. Instead she mentions that different percentages did different things. She also emphasizes differences between individual evaluators in hiring positions within the same company. Evaluators tended to favor hiring persons with similar backgrounds to themselves. For examples, evaluators who attended ivies were more likely to emphasize school name. Evaluators who received lower grades during college were more likely to discount lower grades of applicants. Evaluators who were into sports were more likely to favor applicants who were into sports A few had hiring policies like Manzi’s limited personal experiences, but most did not. In interviews, she has been highly critical of the hiring policies that she found were common among elite firms and had a few suggestions for improvement, but those suggestions also were quite different from your ideals. For example, she suggested elite firms should focus on rank in class over test scores to broaden SES diversity.

It’s interesting you use the phrase “ulterior motive” at the same time you mention your HS counselor study. Think about the study. A researcher has a request to do sponsored research from a company that benefits from an emphasis on scores. He decides to use a group of students who had previously attended college and have known college grades, then get an equation to predict their grades based on stats, which is likely simply the optimal regression coefficient equation for such a prediction. He then asks HS counselors to predict the grades based on stats + 3 criteria that have far less correlation with grades – MMPI, vocational interest test, and background information. He finds that as a whole the counselors predictions are slightly worse the the equation when they are not given the equation. However, if the counselors are given the equation and have the choice of when to deviate from it, then their predictions are not worse than the equation. This study was from 50+ years ago. I’ve can’t recall seeing a modern study with this degree of an ulterior motive.

One possible fear I heard from my engineering relatives…including a former Prof who voluntarily left to work in a successful tech startup is that eliminating the bell curve would either mean more underqualified students are passed onto higher levels which means they are much more likely to flunk out/lower the academic rigor/pacing of higher level courses or worse, cause higher failure rates on professional licensure exams.

This doesn’t just mean those who don’t master the material, but also elite academic/professional STEM graduate programs and employers who want to differentiate between the genuine top performers/geniuses versus those who are above-average hard workers*. Rightly or wrongly, there’s a strong bias in the favor of the former in those venues.

Another factor is such harsh curves are necessary to “test the commitment” of STEM majors to be committed to the field rather than pursuing other career paths after graduation. It was certainly one common gripe among some older alums and former employers hiring engineering/STEM majors from certain elite engineering schools where an increasing numbers of graduates ended up going off to non-STEM careers like ibanking or organizational business consulting. Since those fields require high undergrad GPAs, only the most motivated and strong STEM students would stay reducing the number of non-STEM career inclined students occupying higher-level courses/resources which could go towards upperclassmen really keen on STEM careers.

  • Several engineering/programmer friends have stated the engineer/programmer inclined to be lazy tends to be much better at finding innovative ways to work much more efficiently and produce good final work product in a timely manner whereas the "hard workers" tend to spend far too much time plodding away on a method which ultimately turned out to be inefficient and sometimes a non-starter which the "lazy engineer/programmer" would have shied away from in the first place.