Wednesday, February 1, 2012

Revised Estimates of Intergenerational Income Mobility in the United States by Bhashkar Mazumder


Abstract: Solon’s (1992) landmark study estimated the intergenerational elasticity (IGE) in income
between fathers and sons to be 0.4 or higher. This dramatically changed the consensus view of the U.S.
as a highly mobile society. In this comment, I show both analytically and empirically how Solon and
others have actually underestimated this parameter by about 30 percent, suggesting that the IGE is
actually close to 0.6 and that the U.S. appears to be among the least mobile countries. There are two key
measurement issues that lead researchers to underestimate the IGE. First, the use of short-term averages
of fathers’ earnings is a poor proxy for lifetime economic status due to highly persistent transitory shocks.
Second, the variance of transitory fluctuations to earnings varies considerably by age causing a
“lifecycle” bias when samples include measures of fathers’ earnings when they are especially young or
old. In this comment Solon’s results are replicated and then re-estimated using a new technique that is
able to address these issues using the same PSID sample. The results confirm that the intergenerational
elasticity is likely to be around 0.6.

In a highly influential study in the American Economic Review, Gary Solon presents compelling
evidence that the U.S. exhibits substantially less income mobility than had been previously thought
(Solon 1992). Before Solon’s article, researchers typically estimated the intergenerational correlation in
income between fathers and sons in the U.S. to be 0.2 or less. These studies appeared to confirm the
widely held view that the U.S. is an exceptionally mobile society and prompted Gary Becker to conclude
that “…low earnings as well as high earnings are not strongly transmitted from fathers to sons…”.1
Solon demonstrates how these previous estimates were sharply biased downwards by using only a
single year of income as a proxy for permanent economic status, and by using non-representative
samples. Solon then constructs an intergenerational sample containing as many as five years of income
for fathers using the Panel Study of Income Dynamics (PSID) and estimates the intergenerational
elasticity (IGE) in income to be “at least 0.4 and possibly higher”. In a separate analysis published in the
same issue, David Zimmerman finds a similar result using panel data from the National Longitudinal
Surveys (NLS). 2
As a result of their careful analyses of the measurement issues and the use of superior data, these
studies led to a rethinking of the degree of intergenerational mobility in the U.S. In particular, it called
into question the ideal of America as a highly mobile society. For example, Solon shows that an IGE of
0.4 implies that a son whose father is at the fifth percentile, has only a 0.17 chance of rising above the
median.3 On the other hand, an IGE of 0.4 also implies that on average, 60 percent of earnings
differences between two families are eliminated in a generation. So observers might still disagree as to
whether we should view the glass as “half-full” or “half-empty”.
In this paper, I argue that despite the dramatic improvement over previous work, Solon still
underestimates the IGE in the U.S. by about 30 percent or more suggesting that the true value of the
parameter is about 0.6. As a point of comparison, recent studies using a similar methodology to Solon
1have estimated the IGE to be only about 0.2 in Canada and Finland and 0.3 in Germany.4 Clearly, an IGE
of 0.6 suggests that the U.S. may be exceptional for its relative lack of mobility.
Still, how much difference does it make if the IGE is 0.4 or 0.6? To illustrate the implications in
practical terms, consider a family whose earnings are half the mean. If the true IGE is 0.6, then it would
require, on average, 5 generations instead of just 3, before the family substantially closed the gap with the
mean.5 Obviously a difference of 2 generations, or about 50 years, is quite significant and strongly
suggests that the glass is more than half-empty.
If the IGE represents a causal relationship, then it also has powerful implications on the
intergenerational impact of government policies.6 For example, Chay (1995) estimates that the Civil
Rights Act of 1964 reduced the earnings gap between blacks and whites born in the 1920s by about 30
percent.7 An IGE of 0.6 suggests that the black-white gap for children of these families might have been
reduced by as much as 18 percent simply due to the elimination of racially based employment
discrimination for the previous generation.8
There are two reasons why Solon’s study leads to estimates that are too low, and both reasons
reflect problems inherent in using the PSID or the NLS for this type of analysis. First, owing to small
samples and high rates of attrition in panel data, Solon and other researchers are forced to measure
fathers’ permanent income using just a few years of data. Solon’s sample is reduced to just 290 fatherson
pairs when he averages five-years of income for fathers to obtain his highest estimate of 0.41.9
However, studies of earnings dynamics suggest that the transitory component to earnings is highly
persistent so that even a five-year average might still provide a rather poor measure of “permanent” or
lifetime economic status. Second, several studies have also shown that there may be substantial differences in the variance of transitory fluctuations in earnings by age causing a “lifecycle bias”. In
particular, the income of fathers who are especially young or especially old, even if averaged over several
years, is not likely to produce an accurate proxy for lifetime economic status.
Solon was certainly aware that transitory shocks might be highly correlated but given the state of
the literature on earnings dynamics at the time, he preferred not to make any strong assumptions and
instead provided bounds for the results. As Solon put it “If the process governing earnings dynamics
were known, that knowledge could be exploited to achieve consistent estimation of the intergenerational
correlation in long-run earnings…because considerable uncertainty still clouds the current understanding
of earnings dynamics and because the data set used in the present study could not possibly resolve the
issues, the present study settles for inconsistent estimators and discussing the likely direction of the
inconsistency.”10 Recent studies on earnings dynamics using much richer models and significantly better
data (e.g. Baker and Solon, 2003; Mazumder 2001a), have made great strides in resolving some of the
issues that were unsettled at the time of Solon’s study. Given these methodological advances it makes
sense to reexamine previous studies on intergenerational mobility.
In this comment I first explore analytically how incorporating serially correlated transitory shocks
into Solon’s measurement framework affects the analysis. Using a simple model of earnings and
incorporating parameter estimates from previous studies on earnings dynamics, I run simulations on the
expected bias from using time averages of various lengths as a proxy for lifetime earnings. The results
suggest that Solon’s estimate of 0.4 based on a five-year average is biased down by just under 30 percent
due solely to the persistence of transitory shocks.
I also replicate the results from Solon’s article and then re-estimate the IGE on the same sample
using a new econometric method, the Heteroskedastic Errors in Variables (HEIV) estimator developed by
Sullivan (2001). With this procedure I am able to take into account measurement problems due to both
persistent transitory fluctuations to earnings and lifecycle bias using data. The HEIV estimator is a two step process. First estimates of the reliability ratio of each data point are needed. I do this by estimating a
highly structured earnings dynamics model using a different dataset containing lifetime earnings histories
drawn from social security earnings records.11 The parameter estimates from this model are then used to
construct reliability ratios for each observation in Solon’s PSID sample. In the second step, the HEIV
estimator directly incorporates these estimated reliability ratios to produce an unbiased estimate of the
IGE. The HEIV estimate is actually larger than 0.6.
These results are also consistent with the empirical findings in Mazumder (2001b) which uses a
much larger intergenerational sample containing the social security earnings records of fathers and their
children. In that study the IGE between fathers and sons is estimated to be around 0.4 when using just
four-year averages of fathers’ earnings, tracking the findings from earlier studies. However, the estimates
rise to more than 0.6, when 16-year averages of father’s earnings are used.

...

II. Re-estimation using the HEIV estimator
I now empirically attempt to quantify the effects of both measurement problems on estimates of
the IGE using a new econometric method. The basic idea is that rather than simply applying a single correction factor for the entire sample an attempt is made to directly address the fact that each observation
has a different degree of measurement error due to the differences in the fathers’ age range. A very large
dataset containing the social security earnings histories of over 20,000 men is used to infer the reliability
ratio for each of the fathers in Solon’s PSID sample. This information is then incorporated in the most
efficient manner possible to derive a new estimate of the IGE. As part of the analysis, Solon’s results are
almost exactly replicated.
The HEIV estimator
In many economic studies it is known that a right hand side variable is measured with error. If
the reliability ratio can be determined, then a simple solution is to divide the regression coefficient by the
reliability ratio to scale up the estimate. This “errors in variables” or EIV estimator is commonly
estimated by statistical packages such as STATA. However, in many situations there is reason to think
that the reliability ratios might vary within the sample. A common situation where this occurs is when
using individual-level data and an explanatory variable is an average of some characteristic of the
population taken at a particular geographic level (e.g. state or county). In this case, the sampling variance
for the right hand side variable and the degree of measurement error will vary across individuals
depending on the size of the sample for each geographic area.
One approach to address this problem is to average the reliability ratios across the observations,
use this as an estimate for an overall reliability ratio and then implement the EIV estimator. While such
an approach is consistent, Sullivan (2001) shows that the most efficient estimator will utilize the
reliability ratio for each observation. He presents an alternative estimator, the Heteroskedastic Errors-In-
Variables (HEIV) estimator that does exactly this. The HEIV estimator is the OLS regression of the
dependent variable on the best linear predictor of the right hand side variable. In a case of a single
mismeasured variable the best linear predictor is simply the regression of the dependent variable on the
right hand side variable multiplied by the observation-specific reliability ratio. The HEIV estimator is
 correction factor for the entire sample an attempt is made to directly address the fact that each observation
has a different degree of measurement error due to the differences in the fathers’ age range. A very large
dataset containing the social security earnings histories of over 20,000 men is used to infer the reliability
ratio for each of the fathers in Solon’s PSID sample. This information is then incorporated in the most
efficient manner possible to derive a new estimate of the IGE. As part of the analysis, Solon’s results are
almost exactly replicated.
The HEIV estimator
In many economic studies it is known that a right hand side variable is measured with error. If
the reliability ratio can be determined, then a simple solution is to divide the regression coefficient by the
reliability ratio to scale up the estimate. This “errors in variables” or EIV estimator is commonly
estimated by statistical packages such as STATA. However, in many situations there is reason to think
that the reliability ratios might vary within the sample. A common situation where this occurs is when
using individual-level data and an explanatory variable is an average of some characteristic of the
population taken at a particular geographic level (e.g. state or county). In this case, the sampling variance
for the right hand side variable and the degree of measurement error will vary across individuals
depending on the size of the sample for each geographic area.
One approach to address this problem is to average the reliability ratios across the observations,
use this as an estimate for an overall reliability ratio and then implement the EIV estimator. While such
an approach is consistent, Sullivan (2001) shows that the most efficient estimator will utilize the
reliability ratio for each observation. He presents an alternative estimator, the Heteroskedastic Errors-In-
Variables (HEIV) estimator that does exactly this. The HEIV estimator is the OLS regression of the
dependent variable on the best linear predictor of the right hand side variable. In a case of a single
mismeasured variable the best linear predictor is simply the regression of the dependent variable on the
right hand side variable multiplied by the observation-specific reliability ratio. The HEIV estimator is
correction factor for the entire sample an attempt is made to directly address the fact that each observation
has a different degree of measurement error due to the differences in the fathers’ age range. A very large
dataset containing the social security earnings histories of over 20,000 men is used to infer the reliability
ratio for each of the fathers in Solon’s PSID sample. This information is then incorporated in the most
efficient manner possible to derive a new estimate of the IGE. As part of the analysis, Solon’s results are
almost exactly replicated.
The HEIV estimator
In many economic studies it is known that a right hand side variable is measured with error. If
the reliability ratio can be determined, then a simple solution is to divide the regression coefficient by the
reliability ratio to scale up the estimate. This “errors in variables” or EIV estimator is commonly
estimated by statistical packages such as STATA. However, in many situations there is reason to think
that the reliability ratios might vary within the sample. A common situation where this occurs is when
using individual-level data and an explanatory variable is an average of some characteristic of the
population taken at a particular geographic level (e.g. state or county). In this case, the sampling variance
for the right hand side variable and the degree of measurement error will vary across individuals
depending on the size of the sample for each geographic area.
One approach to address this problem is to average the reliability ratios across the observations,
use this as an estimate for an overall reliability ratio and then implement the EIV estimator. While such
an approach is consistent, Sullivan (2001) shows that the most efficient estimator will utilize the
reliability ratio for each observation. He presents an alternative estimator, the Heteroskedastic Errors-In-
Variables (HEIV) estimator that does exactly this. The HEIV estimator is the OLS regression of the
dependent variable on the best linear predictor of the right hand side variable. In a case of a single
mismeasured variable the best linear predictor is simply the regression of the dependent variable on the
right hand side variable multiplied by the observation-specific reliability ratio. The HEIV estimator is
correction factor for the entire sample an attempt is made to directly address the fact that each observation
has a different degree of measurement error due to the differences in the fathers’ age range. A very large
dataset containing the social security earnings histories of over 20,000 men is used to infer the reliability
ratio for each of the fathers in Solon’s PSID sample. This information is then incorporated in the most
efficient manner possible to derive a new estimate of the IGE. As part of the analysis, Solon’s results are
almost exactly replicated.
The HEIV estimator
In many economic studies it is known that a right hand side variable is measured with error. If
the reliability ratio can be determined, then a simple solution is to divide the regression coefficient by the
reliability ratio to scale up the estimate. This “errors in variables” or EIV estimator is commonly
estimated by statistical packages such as STATA. However, in many situations there is reason to think
that the reliability ratios might vary within the sample. A common situation where this occurs is when
using individual-level data and an explanatory variable is an average of some characteristic of the
population taken at a particular geographic level (e.g. state or county). In this case, the sampling variance
for the right hand side variable and the degree of measurement error will vary across individuals
depending on the size of the sample for each geographic area.
One approach to address this problem is to average the reliability ratios across the observations,
use this as an estimate for an overall reliability ratio and then implement the EIV estimator. While such
an approach is consistent, Sullivan (2001) shows that the most efficient estimator will utilize the
reliability ratio for each observation. He presents an alternative estimator, the Heteroskedastic Errors-In-
Variables (HEIV) estimator that does exactly this. The HEIV estimator is the OLS regression of the
dependent variable on the best linear predictor of the right hand side variable. In a case of a single
mismeasured variable the best linear predictor is simply the regression of the dependent variable on the
right hand side variable multiplied by the observation-specific reliability ratio. The HEIV estimator is
...

III. Conclusion
Solon’s (1992) landmark study on intergenerational mobility presents powerful evidence that the
U.S. is not nearly as mobile as previous researchers thought. Using better data and methodology, Solon
estimates the intergenerational elasticity (IGE) in earnings to be about 0.4 or higher. Solon argued 0.4
was a lower bound and did not attempt to incorporate the time series properties of earnings into the
measurement framework. However, recent studies on earnings dynamics that use larger and richer
datasets provide strong evidence that proxies for permanent economic status based on short-term averages
of earnings lead to substantial attenuation bias. Several studies have also shown that there is a lifecycle
bias in studies of intergenerational mobility that also lead to underestimates of the IGE.
This comment presents an analytical framework that demonstrates that the problem of persistent
transitory fluctuations alone is likely to lead researchers to underestimate the IGE by about 30 percent
when using a five-year average of fathers’ earnings as a proxy for lifetime earnings.
This comment also replicates Solon’s analysis and applies a new econometric estimator that
addresses the problem of transitory fluctuations and lifecycle bias. The results of the replication are
nearly identical to what Solon found. Using a different dataset containing the social security earnings
histories of a very large sample of men I estimate an earnings dynamics model and construct reliability
ratios for the fathers in Solon’s PSID sample. These reliability ratios are used to implement the HEIV
estimator. The estimate of the IGE is 0.62 and is consistent with the results from the analytical exercise.
Finally, this finding is bolstered by the results in Mazumder (2001b), which uses a large intergenerational
sample containing the lifetime earnings histories of fathers and sons derived from social security earnings
histories to estimate the IGE. The study finds that four-year averages of fathers’ earnings produce results
similar to Solon’s but that the use of a 16-year average of fathers’ earnings results in an estimate of the
IGE greater than 0.6.
The implications of this revised view of intergenerational mobility are quite substantial. If 60
percent of earnings differences in society persists across generations, then it will require many more
decades before historical inequities in American society are likely to be alleviated. Such a high degree of
persistence also suggests that the recent rise in cross-sectional inequality is likely to remain a feature of
the U.S. economy for some time.
In the final analysis, estimates of intergenerational mobility, are most useful as a descriptive
statistic, they tell us something about the nature of inequality in the U.S. So far, the literature has not
pointed to any particular policy recommendations. Given the rising evidence from studies in other
countries it appears that the U.S. may be among the most immobile countries.38 This comparative view
suggests that there might be some important institutional features about the U.S. that create such a high
level of persistence of income. Simply measuring this descriptive parameter is only the first step in
understanding the economics of intergenerational mobility. The important and difficult task of
understanding the underlying mechanisms by which earnings capacity is transmitted from parents to
children remains a key area for future research.



No comments:

Post a Comment