<p>
To clarify: the book uses two different sets of data, which are discussed separately. One set of data consists of the complete, raw admissions dataset for 14 schools over a five year span. These schools provided their admissions data to the authors under the condition that they not be named. The other set of data is the survey data. Presumably the authors are aware of the limitations of survey data and have applied the usual techniques for mitigating those limitations.</p>
<p>The point is, the analysis in the book is not entirely based on self-reported data; they had complete data for at least some schools, which they could not name. Obviously, the discussion specific to MIT is based only on the survey data. I did not want to get into such subtleties, since the complete picture is enormously complicated, but since you brought it up, I wanted to respond.</p>