To exploit some of my prior work with California’s test score data, I decided to extend this analysis to SAT, ACT, and AP scores in the state of California, i.e., to compare the relationships of these different tests within and between schools.

Notes/Caveats:

  • To achieve more stable results with schools with small numbers of test takers I averaged 2-4 years worth of test score data together.
  • Unfortunately, AP results are not broken out by subject (some are much harder than others and there are apt to be different test taking patterns at different sorts of schools)
  • SAT/AP/ACT data is not available by race/ethnicity

ACT vs SAT (r=0.97)

sat_vs_act

[It’s almost like they’re testing the same construct…. :-)]

SAT by English grade 11 (r=0.84)

sat_composite_by_ela_g11

SAT by Science Grade 10 (r=0.81)

sat_by_scig10

SAT by World History (r=0.79)

sat_by_world_hist

SAT by Physics (r=0.73)

sat_by_physics

SAT by Algebra I (r=0.56)

SAT_by_AlgebraI

SAT math by Algebra I (r=0.57)

satm_by_algebraI

SAT by Algebra II (r=0.75)

sat_by_algebraII

SAT math by Algebra II (r=0.79)

satm_by_algebraII

AP Average score by Algebra II (r=0.70)

ap_by_algebraII

AP average score by English Grade 11 (r=0.72)

ap_by_ela_g11

AP average score by Science Grade 10 (r=0.70)

ap_by_scig10

AP by SAT composite (r=0.82)

ap_by_sat

AP by ACT average score (r=0.81)

ap_by_act

AP percent scoring 5 by ACT

ap_pct5_by_act

AP percent scoring 4 or higher by ACT

ap_pct_4plus_by_act

AP percent scoring 3 or higher by ACT

ap_pct_3plus_by_act

AP percent scoring 1 by ACT

ap_pct_1_by_act

AP Percent scoring 2 or lower

ap_pct_2_or_lower_by_act


Other correlates

Average number of AP tests per test taker by (school) ACT score

ap_number_of_tests_per_tester_by_act_score

Average number of tests per senior by ACT score

ap_avg_tests_per_senior_by_act

Average percent of senior tests by ACT score

avg_pct_seniors_tested_by_act

Note: I am simply dividing by the number of students enrolled in grade 12.  Since juniors can and do take these tests the totals can exceed 100%.

Percent of seniors taking SAT (test takers / # seniors) by English Grade 11 scores

sat_pct_tested_by_ela_g11

Percent of seniors taking ACT tests (test takers / # seniors) by English Grade 11 scores

act_pct_tested_by_ela_g11

Percent of seniors taking SAT by ACT

SAT_vs_ACT_percent_tested

AP percent tested by SAT percent tested (test takers / reported senior enrollment)

AP_pct_tested_by_SAT_pct_tested

SAT composite by percent of “seniors” taking SAT

SAT by Percent Tested


Some quick and dirty modeling exercises

Model 1 (ELA grade 12 + percent seniors tested): r=0.88

sat_pred1_any_n

sat_pred1_n200plus

Model 2 (Grade 10 Science + Algebra II + percent SAT test takers), r=0.85

sat_pred2_any_n

I’m sure I could do better than this (especially if I deal with some of those outliers).  My point here is simply this: these tests are highly correlated and that accounting for differences in test taking rates or aggregating a few of them together improves the relationships further.


Scatterplot matrices

matrix_including_outliers

matrix_excluding_outliers