I started this post to refute some specific arguments, but I changed my mind midstream and decided to add a lot more material than I initially envisioned. This is best viewed as being akin to a FAQ (Frequently Refuted Objections – FRO?) relating to the standardized tests and their use in higher education.

SAT and income are not perfectly correlated

The SAT is certainly modestly correlated with parental income, but it is simply not true that the SAT is nothing more than a measure of family income.

I will briefly plot the 2011 SAT reading scores by income level to illustrate that the r**2 is considerably less than one.

There is significant overlap across the entire income distribution:

satv_boxwhiskers.png
Box and whisker plot of simulated test scores
satv_distribution_by_count.png
Simulated distribution by actual test taker counts

satv_density.png
Density plot with simulated test scores

 

satv_stacked_area.png
stacked area (unscaled)

 

satv_percent_simulation.png
Income level proportions by score (simulated)
Note: These are simulated distributions assuming the data are approximately normally distributed at each income level.  This is close enough to the truth to approximate proportions we are apt to find of each group at particular score levels or vice versa (likely under-estimates the amount of overlap and outliers to some degree).  You might also note that some of these simulated scores exceed the maximum (800) and minimum (200) scores the college board reports. That is a actually function of arbitrary score ceilings or floors.  The true distribution of “ability” would resemble  this if they didn’t set arbitrary floors and ceilings (and evidence suggests that it would continue to predict in a similar fashion).

 


Income correlates well with lots of tests

Similarly sized income gaps are found on essentially all decent tests (including, notably, the ACT).

Google Chrome.png
SAT vs ACT income-level comparison

 

Google Chrome.png
10th-90th percentile gap in standard deviations for OTHER reading scores
Google Chrome.png
10-90th percentile gap for OTHER math scores

Likewise, the linear SES-test score relationship is NOT unique to the college standardized tests either.  Similar patterns are found in NAEP scores.

Google Chrome.png

Google Chrome.png


These empirical regularities have been noted by others too

Google Chrome.png

Google Chrome.png

source

If you aggregate this sort data at a high level (or really any reasonably well linearly correlated variables) you can, of course, produce much stronger correlations as noise, measurement error, and other weakly correlated sources of variance tend to get averaged out.


Mean GPA, academic rigor, and the significance HSGPA varies with SES

Low SES students typically take less rigorous courses and they tend to be graded more easily (even within the “same” subject).  To wit, see the relationship between GPA and NAEP test scores by school SES  (percent of school eligible for free or reduced price lunch–it’s widely used as a proxy for SES).

Microsoft Excel.png

This despite the fact that mean GPAs are significantly lower in low SES schools.

Microsoft Excel.png

In short, the gaps the SAT is measuring across income groups are both real and significant.

The same patterns hold true across racial groups, i.e., differences in GPA proportions and differences in the expected NAEP given the same GPA level.

Microsoft Excel.png

Blacks earning a 3.75-4.00 GPA in 12th grade obtain mean NAEP scores below the mean score of whites earning 3.00-3.75 (about halfway between that and 2.50-2.99).

Likewise, blacks are considerably less likely to earn high grades.

Microsoft Excel.png

Whites and asians/pacific islanders are more than 300% as likely to obtain a 12th grade GPA greater than 3.75. This excludes dropouts and doesn’t even account for academic rigor (in other words, it’s generally harder for the same person to earn an A in advanced calculus course than remedial algebra).


Even controlling for SES, different racial groups see different average scores

If the SAT truly only measured income, we would also expect the differences across racial groups to be equalized once we control for income.   We certainly do not find this.  To the contrary, low income whites typically perform about as well as high income blacks (holding income constant we find a bit less than a one standard deviation gap between whites and blacks).

Google Chrome.pngGoogle Chrome.png

source

Google Chrome.png

source

SES & B-W cognitive gaps start very young and keep growing

Significant predictive differences in cognitive ability are found as young as 18 months of age by SES.

Google Chrome.png

Likewise, the black-white gap is found at least as early as 36 months of age.

Google Chrome.png

The income and the black-white gaps are well established by 3rd grade and they grow a bit by 11th grade.

Google Chrome
average district scale scale for W and B by each groups median family income in the district (11th grade at top, 3rd grade at the bottom)

This is yet one more reason why test prep, fancy tutors, and the like are extremely unlikely to explain much of anything here.


The B-W test score gaps are not well explained by schools

Nor are the B-W gaps apt to be well explained by differences in “school quality” because the gaps are large within the very same elementary schools and they are even larger in higher performing schools.

Google Chrome.png
California: in-school B-W 2nd grade math gap

 

These states tests are, incidentally, strong predictors of SAT and ACT scores.

Google Chrome.png

(note: you can also read this study by Roland Fryer for more confirmation of B-W gaps even controlling for school, classroom, SES, birthweight, etc)


The differences in “school quality” are generally overstated

There is more variation in test scores within the very same classrooms than within schools, districts, and states.

Google Chrome.png

Google Chrome.png

source

If one can merely “purchase” test scores through income or wealth, we would expect much less variance within the school or classroom relative to broader (geographic) measures like district, state, or nation.  That we don’t find this ought to raise questions in the minds people that argue the SAT and the like “only measure income”.


The SAT’s association with family income is almost entirely mediated by family education

This has been found in multiple studies.

Parents’ income has a significant association with SAT scores, but parents’ education is consistently stronger, and regression with effective controls for race, education, and other factors, usually suppresses the income variable to insignificance. The income variable achieved significance when the education threshold was high school diploma most likely because so few parents were dropouts that education was no longer effectively controlled, and parents’ income became a proxy variable for parents’ education…. Part of this dominance could result from heritability in test performance corresponding to parents’ educational attainment, given the high heritability estimates from twins studies for high-stakes standardized exams in the UK and the Netherlands (Bartels et al, 2002;  et al, 2013).

source

Or see this report from University of California system.

Google Chrome.png

Note that in all cases parents’ education level is a much stronger predictor.  For instance, SAT-V correlates with income at 0.16 where it correlates with education at 0.39.  Parent education also correlates much better with key outcomes: 1st year GPA, cumulative GPA, 4-year graduation rates, etc.  When people talk about socio-economic status (SES) keep in mind that that usually significantly involves parent education and/or occupational status (which is also proxy for education) and that these components are doing most of the work when it comes to predicting outcomes like these.  [Also parents’ education and income correlate at 0.33 in this dataset]

Google Chrome.png
Correlations within and across high schools
source
Google Chrome.png
PA school districts, correlation between white income and average education levels
source

SES does not significantly mediate the predictive power of the SAT

Google Chrome.png

source

Column 3 here shows that the residual attributable to SES is very small, -0.01 adjusting for national range restriction, and slightly positive within most schools (implying that high SES are somewhat under-predicted by SAT alone at an institution level).

This same analysis shows that SES and HSGPA are correlated nationally (column one, adjusted for range restriction) and that, in fact, HSGPA alone tends to under-predict high SES people (which goes to my earlier evidence concerning NAEP at given GPA levels).

Google Chrome.png

Most competitive schools use something akin to academic index, an equally weighted average of HS GPA and SAT (and sometimes SAT II), to ball-park estimate students academic prospects, meaning that on average high SES people are under-predicted in practical terms.  It’s a sure bet that if they don’t re-weight for the known academic rigor of the school and the curriculum the student took that they’ll systematically under-predict high SES students outcomes.


Some have misconstrued the Univ. of Calif. data to argue otherwise

Similar results were found by that much misconstrued University of California study.  However, few of those that talked it up knew or took care to mention the SAT was only weakened by the inclusion of the SAT II into the analysis (which is also well correlated with the SES and the SAT I).

Google Chrome.png

source

Without including SAT II and parent SES (income and education) into the analysis their beta weights  (standardized regression coefficients)for SAT I would have been much higher.  It’s also worth pointing out that these beta weights implies that they’d be giving high SES applicants extra-weight!   This is the result of the under-prediction of HS GPA for high SES people.

A subsequent reanalysis of the same California university data makes my points clearer (see models 2 and 4).

Google Chrome.png

source

Even controlling for parent SES and California high school rank (API) the SAT I has a total beta weight of about .38 in model 2 (0.28 + .10) and a beta of .23 in model 4 with the addition of HS GPA…. but again, this is controlling for HS GPA.  Unless the anti-test people are willing to take many points off the HS GPA of high SES people or people that attend less competitive schools it’s nonsensical to argue that HSGPA is an appreciably stronger predictor!  Without controlling for SES/school quality measures HSGPA loses much of its validity due to its statistical bias (especially systematic under-prediction of high SES).

The whole point of using standardized tests is that they are relatively unbiased predictors that allow for reasonable apples-to-apples comparisons, i.e., they don’t require adjustments for major systematic error.  Their  advocacy analysis really should have presented what this would look like without any adjustments for parent income, parent education, or school quality because most people want the admissions criteria to be at least neutral.

Someone else re-ran this analysis actually:

Google Chrome.png

source

This multivariate regressions summary table strongly suggests that a simple model using SAT I and HS GPA (model “D”) is a good predictor and that adding family income, parent education, and SAT II into the mix does little to improve the predictive validity.   Model “C” also suggests that, contrary to the “income measurement” people, adding income and education does little to weaken the strength of the SAT I (compare to “D”).


Nor are these tests biased against minorities

Google Chrome.png
University of California
source

According to independent analysis of university of california’s data, blacks and hispanics are somewhat over-predicted by SAT I and SAT II.  They are over-predicted a good deal more by HSGPA though.

This is generally consistent with analysis at a national level:

Google Chrome

source

Although the national residuals are quite a bit larger (probably the result of relatively less range restriction).


The predictive power of the SAT is vastly under-estimated

Because students apply to different institutions based on their ability, because schools reject less qualified applicants, and because students tend to sort into different majors and take different courses based, in large part, on their academic strength (or lack thereof), the nominally reported correlations reported tend to seriously downplay the strength of this predictor in the national admissions strategy context.  Many of these effects fall under the category of range restriction and can be adjusted fairly easily.  Others, like differential course selection behaviors, require more sophisticated methods to estimate their true effects.

If we look within institutions we typically find that SAT scores correlate with GPA at about 0.36.  However, after adjusting for range restriction and course difficulty (within and across schools) the correlation coefficient increases to .67.   Adding in HS-GPA increases the prediction to 0.78 (correcting for range restriction and course difficulty).

Google Chrome

Google Chrome.png

source

 


The strength of this prediction does not weaken past freshman year

That it to say that it predicts freshman, sophomore, junior, and senior years equally well for all intents and purposes.

Google Chrome.png

source

Google Chrome.png

source

Holding HSGPA constant the SAT offers significant incremental validity

Google Chrome.png

Google Chrome.png

Google Chrome.png

source

The SAT is well correlated with IQ tests

Google Chrome.png

Google Chrome.png

source

The SAT correlates about as well with IQ tests as one IQ test correlate with other IQ tests (or the PSAT correlates with the SAT).  In fact, I couldn’t find any statistically significant income effect controlling for IQ scores when I analyzed NLSY97.

 

Google Chrome.png
SAT composite by IQ score, grouped by parents income level

I found more indicator of an income “bias” in high school GPA:

Google Chrome.png
HS GPA by IQ score, grouped by parents income level
source

Lumosity’s cognitive tests show strong correlations with SAT across universities

Do brain games essentially function as IQ tests? A recent analysis suggests they do.

Data scientist Daniel Sternberg conducted an interesting analysis using Lumosity data. In his article titled Lumosity’s Smartest Colleges, he analyzed the scores of 89,699 users between the ages of 17 and 25 who attended a college or university and played the game for the very first time. He then examined he correlations between the median SAT and ACT scores (from the universities they attended) with performance on the aggregate score on Lumosity’s tests, which include the areas of Speed, Attention, Flexibility, Memory, and Problem Solving. So just like traditional intelligence and IQ tests, Lumosity has different measures of cognitive function.

The correlation between the SAT and Lumosity score (r = .85) and the ACT and Lumosity score (r = .84) were both reasonably high. Here is the graph:

Google Chrome.png
University brain game (IQ-proxy) by estimated SAT score (r=0.84)
source

Research shows that some video games can be used as good measures of general intelligence (if we extract the general factor):

Google Chrome.png

It is likely that lumosity’s games functions in a similar way (even if their product is unlikely to change general intelligence).  This evidence is at least highly suggestive.


SAT test prep has little effect

SAT test prep generally have very modest effects (at best). Multiple studies have demonstrated this point.

By far the largest effect sizes belong to the those preparation activities involving either a commercial course or private tutor [NEVERTHELESS THE SCORE CHANGES ARE NOT LARGE], and the effects differ for each section of the SAT. On average students with private tutors improve their math scores by 19 points more than those students without private tutors. The effect is less on the verbal section, where having a private tutor only improves scores on average by seven points. Taking a commercial course has a similarly large effect on math scores, improving them on average by 17 points, and has the largest effect on verbal scores, improving them on average by 13 points. With the exception of studying with a book, no other activity analyzed in this manner has an effect on test score changes that is statistically different from zero at a .05 significance level.

… Does test preparation help improve student performance on the SAT and ACT? For students that have taken the test before and would like to boost their scores, coaching seems to help, but by a rather small amount. After controlling for group differences, the average coaching boost on the math section of the SAT is 14 to 15 points. The boost is smaller on the verbal section of the test, just 6 to 8 points. The combined effect of coaching on the SAT for the NELS sample is about 20 points.

 source

20 combined points is equal to about 0.09 standard deviations.  These are really modest effects.

 

Google Chrome.png
SAT by PSAT scores

Test prep rates do not vary all that much

Test prep varies little with income levels:

Google Chrome.png

They do vary somewhat with race, but whites are the least likely of any major group to take it and when they do they see the smaller gains.

Google Chrome.png

 

source

Blacks are significantly more likely to do test prep (according to several studies) and they see somewhat larger gains!  Regardless, given the minimal differences in test prep and ample evidence that test prep has small effects even when used, it’s extremely unlikely to explain much of the systematic patterns we find nationwide with respect to SES or race.


SATs predict graduation rates between schools too

Using IPEDS it is possible to estimate the effect of the schools (estimated) median SAT score on graduation rates and other outcomes.

Google Chrome.png
6-year graduation rate by institution median SAT score

source

Correlation coefficients

  • White: 0.78
  • Asian: 0.69
  • Black: 0.70
  • Hispanic: 0.71
  • Women: 0.79
  • Men: 0.82
  • Total: 0.82

When schools systematically discount standardized tests the effects are very obvious

For instance, US law schools grant admissions preferences of approximately two standard deviations to blacks across the entire pecking order (save HBCUs):

Google Chrome.png

As a consequence of these policies trickling down nationally, approximately 50% of black law students are clustered in the bottom decile of their classes and most of them aren’t much higher than that.

Google Chrome.png
Elite law schools class rank
Google Chrome.png
Sub-elite law school class rank

This is a direct result of misguided policies like affirmative action and the results are highly predictable (like the SAT, the LSAT is a relatively unbiased predictor).  Moreover, these effects cary on over to graduation, bar passage rates, and even, amongst those that graduate and pass the bar, their employment success.  They actually do somewhat worse than expected due to mismatch.

Similar outcomes are seen at the undergraduate level and in other competitive graduate and professional school programs (there is a reason why affirmative action doesn’t stop in undergrad….)


Heritability explains a great deal of these systematic SES relationships

There is a large and growing body of evidence that many behavioral/personality traits (phenotypes) are highly heritable and that, contrary to popular imagination and many social scientists, the so-called “shared environment”, i.e., that which siblings share in common (parents, housing, schools, neighborhoods, etc), explains very little of the variance on average.

Intelligence, one of the key traits here, is estimated to be more than 50% heritable according to many studies in rich western counties.  The shared environment is pretty consistently close to zero (the remainder being “unshared”, i.e., measurement error or other influences that siblings do not share systematically).    There are other relevant traits that affect outcomes like academic achievement and many of them are also heritable.

See this study on GCSE heritability in the UK (one of the main measures they use for admissions).   They estimate that the GCSE score itself, which is almost certainly more “study-able” than our less curriculum-laden tests like SAT and ACT, are more than 60% heritable (although the shared environment effect appears to be non-trivial)

Google Chrome.png

Meanwhile, we know that intelligence is a decent predictor of adult SES:

Google Chrome.png

In fact, intelligence is a better predictor of (future) SES than parent SES.  Intelligence also predicts education and occupational status better than it does income.

I do not want to get into the weeds too much with this particular topic, but to try (briefly) to open your mind to the possibility that the parent SES association with academic, occupational, and other forms of “success” are mostly explained through the heritability of intelligence and other phenotypes of interest (e.g., conscientiousness, motivation, personality/extraversion, etc).  Most people that attain high SES are significantly more intelligent than average and we know that intelligence is heritable.   Even allowing for some regression towards the mean (which is *not* 100%), we should well expect that child SES would be quite well associated with the intelligence and other phenotypes that helped shape their parents’ success — especially over the course of several generations.  We do not need to be a perfect “meritocracy” to find that heritability explains a great deal of these observed relationships.

If you look closely at SES or income mobility studies you will find that they resemble more commonly accepted heritable traits like height:

child SES by parent SES

Note: If there were zero SES mobility, if children inherited their parents’ SES on average (no systematic bias in either direction), this slope should be approximately 1 (much more vertical than this than the regression line in this plot).   To the contrary, we find that the highest SES are significantly more downwardly mobile (relative to their parents) than those at the 50th percentile (~0 change) and those at the bottom are significantly more upwardly mobile (an average increase of ~30 percentile points from bottom).  Put differently, most of the relative “immobility” is happening in the middle, not the top or the bottom.

source

Google Chrome.png

Although there is clearly some regression towards the mean, (white) males with tall fathers tend to be quite a bit taller than average and males with short fathers tend to shorter than average (even if taller than their fathers).

Google Chrome.png

source