Issues with the CBO income distribution data

The Congressional Budget Office periodically produces income distribution and effective tax rate data for households by income group.

One interesting, but little known fact, is they produce their “income categories” (quintiles, top 1%, etc) with a weighting according to the household size.

Here is their definition:

Income categories are defined by ranking all people by their income adjusted for household size—that is, divided by the square root of a household’s size. (A household consists of the people who share a housing unit, regardless of their relationships.) Quintiles, or fifths, contain equal numbers of people, as do percentiles, or hundredths. Households with negative income (business or investment losses larger than other income) are excluded from the lowest income category but are included in totals.

What this means is that a household with 1 person and 50K of income would be ranked identically to a household with 100K of income and 4 people, as would a household with 150K in income and 9 people, and so on.

Although I think this is, in some respects, a useful and perhaps necessary way of approximating the welfare of each individual household, I suspect they unintentionally mislead a lot of people with respect to both the effective tax rates and the actual distribution of income since few people probably know that they do this in the first place and fewer still understand the implication of this.

Consider, for instance, that if the wealthiest 0.5% of households (unadjusted for size) adopted 1 child each, it would surely produce a more “unequal” distribution since highest reaches of the income distribution would account for that much more of the population (CBO income groups always account for similar shares of the entire population), despite the fact that they have less discretionary income and haven’t (for the sake of argument) increased their incomes by one dime.

The CBO further confuses this issue by then quoting the average income, pre- and post- tax, across these adjusted-income groups without actually quoting the adjusted-incomes (which seems very strange to me indeed).    Thus, say, middle 20% of households may shrink dramatically in size and may include people with very different raw-income levels, but their published results do not give you any hint of this at all.

This is not an academic argument since, in fact, the households have never been identically sized and there has been a significant shift in the distribution of population (and, in fact, earners) amongst the households.

Below I have calculated the approximate size using their household count data (they round the numbers so there is a small amount of error between years).

Average number of people per CBO household income group (scaled to 1979)

Observation: The very richest and very poorest grew or stayed the roughly the same, whereas the middle income groups and the like dropped dramatically in size.  (Remember: this is after their weighting method so “middle” can mean very different pre-weighted incomes… the effects are probably even more dramatic w/o this weighting)

Some figures relating US household income inequality

The changing distribution of household income inequality is real, but it’s often overstated since it does not account for household size or the number of earners per household.

Put bluntly, actual work is far from evenly distributed and far moreso than most people realize or admit.

Below are some statistics from the most recent US census [very similar numbers are found pre-recession too]


Domestic corporate profits are NOT at record levels, nor have they fully recovered

Certain people have made the claim that corporate profits are at record levels and that this fact combined with high unemployment proves that there’s been some kind of fundamental shift in the economy.

The reality is that this is mostly a misreading of the data.  Most measures of corporate profits include foreign produced profits (e.g., Apple shipping product to Europe from China) and foreign profits constitute a much larger part of corporate profits.



Though this statistic might be relevant for some things, it doesn’t tell us a whole lot about the relationship between US profits and US labor.  Further, even if you actually compare even this broader rate to the 50s and 60s, corporate profits are not at “record” levels (not once you account for inventory and capital depletion).

Trouble with progressive estimates of the impact of consumption taxes

The Citizens for Tax Justice (CTJ)  and Institute on Taxation and Economic Policy (ITEP), amongst other progressive organizations, have put forth the claim that our tax code is nearly flat on the basis that “regressive” consumption taxes in state/local tax codes offset federal and (to a lesser extent) state income tax progressivity.

The problem with almost all of these analysis is that they all invariably hinge on the fact that their reported ratios between consumption and income is greater than 1 at lower income levels.  In other words, they are counting consumption taxes in the numerator that are not included in the income base (the denominator).

ITEP claims, in their description, that they’re correlating consumption patterns in the BLS’s Consumer Expenditure Survey to reported income.  The problem with this approach is that the BLS CEX consistently indicates that the ratio between consumption to “income” exceeds 1 just shy of the 50th percentile on down (the bottom is >2x)

see here for quick summary

ITEP further papers over these flaws by effectively hiding non-linearity at negative numbers in their models (e.g., consumption 20x negative income).

Some money quotes on their model:

Our procedure for imputing consumption onto individual tax records can be thought of as involving two distinct steps: (i) econometrically estimating the necessary relationships for each of the desired consumption items from the Consumer Expenditure Survey (CES); and (ii) using the resulting regression coefficients to simulate consumption on the merged data file for non-dependents. Implicit in this approach is reliance on the strong separability of a utility function over different categories of consumption; i.e., we used a “utility tree” approach to estimate several systems of share equations.

Next, total non-durable consumption expenditures were imputed in a similar manner: separate ordinary least squares (OLS) regressions were estimated from the CES on both samples with a similar set of predictor variables. Coefficients from these equations were then used to impute mean (non-durable) consumption expenditures to each household and a normally distributed error term with a mean of zero and a standard deviation equal to the standard error of the regression was added to each imputed amount. Two sets of adjustments were then made to the imputed amounts.

First, the particular functional form used was unstable at very low levels of income resulting in extraordinary amounts of imputed consumption for several records. For nondurable consumption, our OLS specification included two terms, 1/Y and 1/Y2, where Y is total family income, that presented problems at both ends of the income distribution. For very low incomes, the nonlinearity introduced by 1/Y and 1/Y2 caused estimates of mean consumption to approach infinity. This was handled by constraining consumption for these records to be no more than 1.5 times income. This limit was based on analysis of the CES data independent of the imputation process.

Second, the tax return data that formed the basis of the income information for filers contained income amounts far outside the range observed on the CES and caused problems when our regression coefficients were used. Our approach was to assume that the estimated equation was valid for incomes within the range of the CES and to fit a spline function for the portion of income in excess of this amount for those households (about 2.5%) with reported incomes outside the range of that reported in the CES.

Without getting into the weeds with respect to how their model works (they don’t disclose nearly enough information or data to do this), I can reproduce their consumption tax numbers very closely by simply using the BLS CEX consumption to pre-tax income ratios and a flat consumption tax.  In other words, you don’t need to assume that the poor are paying higher effective rates as a proportion of their consumption (e.g., on “sin” taxed goods) to get higher effective taxes as a percentage of this very limited definition of “income” on the poor.  It’s quite clearly almost entirely a byproduct of methodological flaws that vastly overstate spending-to-income at low incomes and vastly understate spending-to-income at upper incomes.

Effective Tax Rates 1960-2001, derived from Piketty and Saez

Since the CBO and most other data sources stop their analysis of effective tax rates in 1979, here is a similar analysis as my last using data from Piketty and Saez (two well know liberal economists that are very much in favor in much higher taxes).

their data in excel

their paper in PDF format

My first chart here simply their own computed effective tax rates (less the estate tax component).


Note that, even here, you need to go up to beyond the 99.5 percentile to find any sign of cuts in the effective tax rate.

Effective Federal Tax Rates using CBO data 1979-2005

Many people believe that the rich once paid much more in taxes as a percentage of their income since top marginal rates were once much higher.   The reality, however, is that a combination of relatively higher brackets, larger deductions, and tax avoidance and the like actually reduced the effective rates to MUCH less than is popularly believed.

The CBO published some data a few years ago to break down effective tax rates amongst higher income groups (they usually aggregate the top 1% as one big group), so here is a chart to actually reveal the truth.

Historical Effective Tax Rates, 1979 to 2005: Supplement with Additional Data on Sources of Income and High-Income Households
Below is simple illustration of their data.

According to their calculations,the rich did pay a bit more in 1979 (the late 70s probably had abnormally high ETRs due to bracket creep and the like).   That said, they also imputed 100% corporate income taxes to shareholders (mostly the rich) and the employer-portion of payroll taxes to both the numerator and the denominator [in other words, they assume that the tax payer would have had correspondingly more income and thus add both the the numerator and the denominator of this calculation]

Since many of these same progressives have trouble with this concept (e.g., they wish to assert that Romney only paid a ~15% ETR) I thought it’d be helpful to illustrate what this would look like if we subtracted both of these imputed income sources/tax burdens.  This, in other words, would more closely resemble one’s ETR if they divided the total federal taxes paid by their AGI [although this also includes the miniscule burden added by federal excise taxes]