A recent test of NIH grant proposal reviews found “Low agreement among reviewers evaluating the same NIH grant applications” http://www.pnas.org/content/early/2018/02/27/1714379115 . This doesn’t come as a surprise to most working in the academic science game. I’ve often wondered is this wouldn’t be true of top end college admissions. It also has occurred to me universities would gain little by actually carrying out such a test.
There is no relationship or similarity between the two.
Why not? Judging essays, ECs, letters of rec, assessing rigor from transcripts that come in different shapes and sizes. My kids college applics struck me a lot like my funding applics. Oh, and the low success rate for both (speaking to highly selective college admissions).
I agree with @billcsho. Maybe if you’re referring to applying to the fellowships or the K99.
But R01 or SBIRs are just totally different. More like a business proposal, with budget, preliminary results, impact on field, etc.
Just like this thread, everybody would have a different point of view. Same for manuscript reviews or college ranking, and yet they have all different processes and considerations. I don’t see how would one get inspired by the NIH grant review process on college admission, or one can be inspired by anything else. SBIR is somewhat different from R1 as it put more emphasis on market analysis and business development.
The similarity is across a range of “proposals” when the selection percentage gets low, let’s say 10%, the difference between the 10% selected by a given panel, and let’s say the next 10% is small enough that all sorts of random things come into play. Which is why studies like the one referenced above find different panels select a somewhat different top X%. This is found in other studies in other science disciplines (I know of a couple of such studies in a physical science in Europe).
In this specific case, aren’t these NIH tests showing low repeatability aimed at the R01s? And just like college applics, there are many more serious applications than spots, where everyone has some preliminary results (e.g. HS grades), capability to do the work (e.g. ACT/SAT), claims their proposed work will have a major impact on the field (e.g. essays), etc. With many more serious applications than spots, how each of these gets evaluated has all sorts of dependencies on the evaluators, no matter how careful and serious they are.
Obviously the analogy breaks down at some point, but the overall question of the repeatability of the selection of a small fraction of qualified applicants seems relevant.
So in the PNAS paper, they are only working with proposals that passed ranking (ie. They were already of a certain quality threshold).
I guess you’re drawing the analogy that in highly qualified pools, evaluation by experts results in low agreement. In that case, I agree that these situations are similar.
It’s also seen in other decision making situations. Think of all the VCs who turn down startups that later become unicorns. All the publishers who turn down best sellers. At some point, the resolution achieved by objective evaluation breaks down, and when forced to choose people have to use other much more subjective and personal criteria.
I guess I just didn’t see the point of the analogy your drawing.
A personal view that the statement that “in highly qualified pools, evaluation by experts results in low agreement” helps provide a sense of balance and realism to extremely selective admissions. Things on CC but also some statements and posts from admissions offices often seem to me to be missing some of the humility that the low agreement in pretty much any test of this type would suggest.