So we know the answer to the first part of that question. According to Dartmouth’s white paper, they used “a specialized subsample of applicants in the test-optional cohorts who initially submitted scores but then asked Admissions not to consider them in the admissions decision.” In a note they explain Admissions did not in fact use the scores under those circumstances.
In terms of scale, their notes to Figure 6 say that “we have SAT scores both for students who submitted scores (blue lines) and for a small sample (19%) of applicants who chose to exclude their score from the admission decision but for whom we observe their scores ex post (red lines).”
I am honestly not sure what that 19% is supposed to mean–19% of what? But let’s say this is 19% of relevant applicants who sent them scores, but then 19% asked for them not to be considered (which sounds high to me, but who knows). Even that generous interpretation implies a very small sample size in each of their 50-point bins–there were only so many disadvantaged applicants with test scores in that range to begin with (a fraction of the total applicants with such test scores), which was only a fraction of overall applicants. And then only a fraction of that fraction of a fraction chose to have that score not considered, and then this is now a fraction of that fraction of a fraction of a fraction.
And in fact, if you compare the blue lines to the red lines in Figure 6, the blue lines are mostly pretty smooth and expected curves, whereas the red lines are pretty choppy. That is consistent with the idea the sample sizes for the red line buckets are getting so low there is a lot of statistical noise.
That being said, my two cents is I don’t really doubt that a disadvantaged applicant would usually have been well-advised to submit a 1450 to Dartmouth. The data is not perfect but that is a pretty reasonable hypothesis and the data does support it (imperfectly).
But then they do things like calculate, “Consider students with a score of 1450-1490 from less-advantaged backgrounds. These students increased their admission probability by a factor of 3.7x (from .02 to .074) by revealing their score.” I am very skeptical that the data actually supports a conclusion remotely that precise (3.7x). And in fact, you would need to control for other things that might correlate with test scores, the decision to submit or not submit, and so on.
Long story short, this is an old issue in this sort of research. Often the direction of a relationship is not so hard to figure out, but the magnitude is much harder to nail down. And I believe submitting a 1450-1490 had a positive relationship with admissions chances for disadvantaged applicants, but exactly what magnitude we are really talking about–meh, I don’t know.
And just to reiterate–even if you took that 3.7x as gospel going forward, you are still only talking about a tiny marginal group of applicants. Specifically, only 5.4% of a very small group to begin with (disadvantaged applicants with a 1450-1490 to consider submitting). The other 94.6% of this small group would not by this theory have a different outcome.
So even a generous estimate of the magnitude still ends up very small in the greater scheme. Although of course it could mean a lot to the few people who end up in that fraction of a fraction of a fraction of a fraction (I may be missing a fraction).