College admissions are not independent events

It is not uncommon for posters to make statistical arguments about getting into one of many universities that one applies to. For example:

http://talk.collegeconfidential.com/discussion/comment/19667673/#Comment_19667673

http://talk.collegeconfidential.com/discussion/comment/19668686/#Comment_19668686

However, these arguments fail in part because college admissions are not independent events, because many colleges consider many of the same criteria for admission.

http://www.mathgoodies.com/lessons/vol6/independent_events.html defines independent events:

Here is an example:

College A and College B both consider only HS GPA and ACT score.

College A admits with HS GPA of 3.0 and ACT score of 27 and rejects otherwise.
College B admits with HS GPA of 2.8 and ACT score of 29 and rejects otherwise.

Applicants applying to both colleges (without knowing the above criteria) have uniformly distributed GPAs from 2.0 to 4.0, and uniformly distributed ACT scores from 17 to 36.

If A and B are the probability of admission to College A and College B respectively for a randomly selected applicant from the above pool:

P(A) = 0.25 (probability of A)
P(A|B) = 0.833 (probability of A given B)
P(B) = 0.24 (probability of B)
P(B|A) = 0.8 (probability of B given A)
P(A and B) = 0.2 (probably of A and B)
P(A)P(B) = 0.06

Note that P(A and B) ≠ P(A)P(B), P(A|B) ≠ P(A), and P(B|A) ≠ P(B).

In any case, there are also other reasons why the argument does not work, such as not knowing (from outside of the admissions office) what the probability of admission actually is.

An even simpler example: if colleges A and B had the exact same criteria (e.g. HS GPA of 3.0 and ACT >= 27, reject otherwise) then A and B are completely dependent events; knowing the outcome of one gives you all the information to determine the outcome of the other.

However, assuming no communication between colleges A and B, I still feel that A and B are independent, since A doesn’t causally influence B, and vice versa. Sure, knowing A might give us evidence that B occurred, but we don’t know P(A) or P(B) to begin with, as you said. Also, we are looking at the probability that a given student is accepted to both universities, so in your example, P(A) should be either 0 or 1.

I’m also not a huge fan of introducing probabilities when it comes to college admissions, since a committee decides, and it is not a probabilistic process. Newcomb’s problem might be a similar analogy, since picking both boxes doesn’t causally influence the event that the second box contains $1000000.

@ucbalumnus: I’m glad you introduced this as a separate thread. This is relatively basic math that people should understand better and be more familiar with. And it’s relevant here. Although I wish it could be explained with better examples – but coming up with simple, intuitive, illustrative examples is not easy.

@MITer94: I don’t believe your comments are accurate. You’re using an intuitive notion of independence that doesn’t match the formal definition of statistical independence. Two events do not need to be directly causally connected for them not to be (statistically) independent. The example here of someone’s GPA and ACT score influencing their admission decisions at two different colleges illustrates that.

Further, a college admission decision is a probabilistic process. That it may be a committee decision doesn’t change that. I think what makes it hard (and not so useful) to think of this way is that it’s a very complex decision, with many factors that go into it, and that changes from year to year. Further, we almost always don’t have all the data necessary to accurately assess the probabilities. (But that’s what things like Naviance are trying to do.)

@csdad2 I understand that A and B are not independent events in ucbalumnus’s example - clearly, if I chose a random real number x ∈ [2,4] and an integer y ∈ {17,18,…,36}, and let A := x ≥ 3 and y ≥ 27, etc., then P(A and B) ≠ P(A)P(B), i.e. A and B are not independent.

However, in an actual scenario, we usually don’t know what P(A) or P(B) is, so proving whether A and B are independent is a bit trickier. Sure, the fact that I was accepted to MIT can increase my confidence that I will be accepted to Caltech (I wasn’t, actually), but in any case I don’t know the probability of acceptance at Caltech. Maybe it was higher to begin with. There’s a difference between credence (confidence) and probability, and quite a bit of debate on the topic, and lots of things I don’t understand. Of course, if you have any insights, I’d be happy to hear.

I think Newcomb’s problem illustrates the difference nicely - when it’s our turn to pick boxes, we can assign credence as to whether the opaque box contains $1000000, but the probability that the opaque box contains $1000000 is either 1 or 0.

This, of course, assumes that colleges A and B do not communicate with each other in any way - otherwise the events may as well be dependent.

I am glad that you created this thread. I acknowledge that, in my earlier post, I glossed over the dependence issue in my calculation just to make a point that it is indeed a very small probabliity event for an applicant with a perfect SAT score to be rejected by all ten super selective schools. However, it is acknowledged as a fact, at least on CC, that it is a crapshoot to get into these schools. This implies that the admission is a very random process, which further implies that the admission decisions by different schools, if dependent, are weakly so at best. As for that Asian kid who got rejected by all ten super selective schools, his racial marker is clearly a factor that causes dependency. One way to handle this is to reduce his probability of admission by any school to, say 15%, from 30%. Now assuming independence, then the probability of him being rejected by all ten schools is about 20%, instead of 3%. I admit this is a huge difference, but it is still a small probability event.

It is more random-appearing from the outsider viewpoint, where it is difficult to know how subjectively graded aspects like essays, recommendations, and extracurriculars will be graded by an admissions reader and how they will help the applicant stack up to the other applicants. It is much less random to the insiders (admissions readers).

Since almost all of the criteria used for admissions by those super-selective schools are the same as or similar to the criterial used by other super-selective schools, independence cannot be assumed. The applicant presents essays, recommendations, extracurriculars, etc. as well as courses/grades and test scores. A weak essay, recommendation, or extracurriculars will disadvantage the applicant at all schools that consider them, making their admission decisions non-independent of each other.

@MITer94: I’m no expert on this stuff, but I think most of what you say is irrelevant to this situation. Not knowing P(A) and P(B) doesn’t matter, and we don’t need to prove mathematically that A and B are not independent. Do you accept that the admission decisions at different schools are generally based on similar criteria? (I don’t see how you can’t; note that I’m specifically not saying the exact same criteria – all that’s necessary is that there is some overlap/similarity in the criteria.) If so, then they are not independent. That’s it, that’s all you need.

The schools don’t need to communicate with each other, because they’re both looking at the same (or very similar) data – the applications.

And do you really believe that someone who gets into one top school isn’t more likely to get into another? (I don’t.) I don’t know if there’s any data out there to confirm/deny this.

I had never heard of Newcomb’s problem before you mentioned it, so I looked it up. Again, I think it’s irrelevant. The main thing there is the Predictor, and there’s no analogous thing in this situation.

@csdad2 Yes - many schools use similar criteria (e.g. GPA, SAT/ACT, letters of recommendation).

This isn’t quite what I meant. What I meant is, if I applied to A and B, and found out I was accepted by A, then the probability that B accepts me should be the same as if I had only applied to B with the same credentials. This is assuming the information B has is constant in both cases. I might be more likely than say, the acceptance rate of B, but that would’ve been the case even if I didn’t apply to A.

My *credence/i in B might increase, but credence and probability are not necessarily the same thing.

Another example: If a 2.0 GPA student applied to college A, then his probability of acceptance is zero, regardless of what he thinks or what his credence is, or what the acceptance rate of A is.

This is just what I believe, and there may be parts that are incorrect or debatable. Again, happy to hear your inputs.

If college admissions are not independent, then how can one get (spring) admission to UNC Wilmington despite getting denied from East Carolina?

@LBad96 that could happen regardless of independence.

Let me offer the explanation that hopefully can unify the different perceptions people have.

@hzhao2004, @Postmodern, first, I do believe about 30% of 2400 SATers can get into Harvard. I remember quotes where 2/3 or 3/4 of those are rejected? I suppose the rest get accepted.

However, the key issue of using 30% in the way that you guys used is that the population of the 2400 SATers is not a homogenous group. That @mikemac had already mentioned in the other thread. Thus some of those may have 2400 and be STS finalist. Their probability can be 80%+. Some may have 2400 as the only calling card. Their probability may be 5% or less. That’s the hidden information (let’s denote with A) that the number 30% does not comprehend. Now let B be the event of a particular person in the group getting into Yale and C be the event of getting into Harvard. Then B, C will be correlated because both B and C will be affected by A (whether the applicant is “STS” or “nobody”) to some extent.

Now there is actually way to make B and C to be completely independent in the statistical analysis. How? By removing the effect of A. That is, considering only the sub population which is completely homogenous which means everyone in the population has completely the same credential. Of course then admission can’t differentiate between you and your twins and the decision will be random and independent between B and C. But the probablity of occurance for the smaller pool could be drastically different too from the 30% of the bigger pool.

So as a summary, the only time the independence can be used in this analysis is when the pool consists of “identical” people.

This is confusing the colloquial expression “random” that means unpredictable with true randomness. You seem to believe that admission decisions are actually akin to rolling dice, that the outcome after evaluation of letters of rec, gpa, essays, test scores, transcript is indistinguishable from putting the applicants names into a hat and drawing them out.

Furthermore you continue to make the mistake of trying to apply a population measure (eg. of the total number of applicants with perfect scores, how many get in?) to a particular individual. This is a well-known statistical fallacy called the “ecological fallacy”.

Lastly, that pronouncement race is “clearly a factor.” Did you read his essays? His letters of rec? The interviewer comments? His transcript? Other factors known to influence admission include where the applicant lives, nature of ECs and leadership/awards, relationship of the HS with the college, alumni status, financial donations made by parents and relatives, the courses taken while in HS. Without knowing anything about this for either the applicant in question or how he compares to the other kids with 2400’s, you conclude that it must be race. This seems to say more about you than about the state of the world.

I agree that the probabilities should be the same. And I don’t think it matters, whether one actually applied to the other school.

The issue is that we don’t know P(A) or P(B) (I’m using A and B to mean getting into the two schools), mostly because we don’t know all the factors that go into them. But if one of those were to happen, it is reasonable to assume that the probability of the other one happening is higher (than a more general initial assessment might have estimated).

This is like saying A and B don’t influence each other, but what goes into A and B are similar/related. Which again, is to say, that they’re not statistically indepedent.

One example that at first glance seems slightly unlikely proves absolutely nothing.

@csdad2 Exactly!

quote.

[/quote]

This is one’s *credence/i, as I stated above, which is not necessarily the same as probability. Suppose I applied to colleges A and B (using ucbalumnus’ example, not knowing the requirements) with a 3.0 GPA and 27 ACT. I find out that I’m accepted by A. However my probability of being accepted to B is still zero - that is, it didn’t increase. My confidence in B likely increased, though, but I will still end up rejected with 100% probability.

The fact that A and B depend on similar information doesn’t prove they are not independent. Suppose I roll two fair six-sided dice. Define A := the first die shows 6, and B := the sum of the two dice is 7. Both A and B involve the result of the first die, but they are in fact independent:

P(A and B) = 1/36 (happens iff the first die shows 6 and the second die shows 1)
P(A) = 1/6
P(B) = 1/6

@mikemac I like this. I still don’t believe admissions is a randomized or probabilistic process, even though it may appear so to us. Unless they were torn between two applicants and decided which one to accept by coin flip.

If admissions was at all randomized, then they might as well flip fair coins. It is provably true that any probability p event can be simulated by flipping fair coins (maybe infinitely many of them).

A quarter will not fit into a nickel slot no matter how many random attempts to insert one.

@MITer94: OK, again, I’ll admit I don’t understand this stuff well enough to be completely certain how everything fits together. But I’d say these things are not relevant, and just confuse things.

For instance, we don’t know the exact admissions probabilities, so to some degree, they’re all our best guesses. There can always be factors we’re not aware of that make the probabilities higher or lower. That doesn’t mean they’re credences/confidences and not probabilities.

And regarding your dice example, one of the background criteria, A, is also one of the propositions (that you’re trying to determine the probabilities of). In my situation, all the criteria are known beforehand, none of them are propositions. So I don’t think the example is relevant.

All of this gets away from the central point of this thread, that admissions decisions to different colleges are not independent. (Although, re-reading things, I’m not certain you agree with that.)

LOL: Math Goodies wrote:
“Two events, A and B, are independent if the fact that A occurs does not affect the probability of B occurring.”

If the factors associated/driving/leading to A are dissimilar to those leading to B.

This thread seems to be about semantics and word definitions - but aren’t we focused on the wrong word?

I think what people want to know is if acceptances at different schools are correlated. The underlying question is something like “if I apply to two schools for which I have a 50% chance, do I have a 75% chance of getting in to at least one?” Assuming access to the data, you could take the population of students with a 2400 SAT, and see which Ivies they get in to. Maybe about 30% would get in to each Ivy. Then you would want to compute the correlation between those n data sets of accepted students per Ivy. They would obviously be partially correlated. So the chance of getting in to either of 2 with a 2400 wouldn’t be 1 - 70%^2 = 51% - it would be less.