For many, higher education may be a driving force in obtaining greater social mobility and economic freedom. While many counterparts are already financially comfortable with their economic status due to their parents income, higher education will only add onto what they already have and as a result costs of education may not serve as a limitation in whether or not one can attend a certain college or university, allowing them to have better options in schools and post-graduate outcomes. In contrast to lower-socioeconomic households, financial aspects such as cost of tuition, living expenses , etc may influence the schools in which they could attend which in result impacts how great the school may really be and how it benefits them upon graduating
In this paper, our goal is to identify how much of an influence does a parents income impact the tier of college and income outcomes for children who are attending 4 year universities within California. While increasing costs of tuition, housing, etc have been skyrocketing over time, it affects students’ decision making to decide where they would like to attend. To further investigate this study, we examine the mean parental income along with college tier level in hopes to get a greater understanding of a significant relationship between both factors. We look at lower mean parental income and the schools in which their kids attend while comparing and contrasting higher mean parental incomes and the institution in which their kids attend. Thus, the question that drives our analysis is: What is the influence of a parent’s income and tier of college on the income outcomes for their kids in California’s 4-year colleges?
We investigate this question in the context of the dataset, which is the data used for a paper titled Income Segregation and Intergenerational Mobility Across Colleges in the United States by Chetty and co., found from Harvard dataverse platforms which provides us with anonymized data from the federal government, giving statistics for each college and university in the U.S on the distribution of students’ earnings in their thirties and their parents’ incomes (Chetty, Friedman and Saez). Through the entirety of this work, it was found that students from low-income families had great long-term outcomes after attending selective schools, but there were very few low-income students at such selective schools. Through analyzing the data, we found it more relevant to our question to primarily focus on 4-year institutions within California to see how much greater or less the relationship may be between mean parental income and tier of school and the income outcomes of each child.
While there is clear and sufficient evidence that family income plays a large role in a student’s college decisions and outcomes. Hotz, Rasmussen, Wiemers, and Koegel find in their 2018 paper that parental income can influence multiple aspects of this decision, including college attendance, graduation and quality of institution (Hotz, Wiemers and Rasmussen). This effect is not restricted to the United States, as Lin and Lv find similar findings in their 2017 paper in China (Lin and Lv). As a result, it has become a global phenomenon and is worth looking into as it can inform educational reform policies (Lin and Lv), not only within China itself but other countries where this issue continues to be prominent. As both of these papers have based their results on empirical models, it is an approach that will be similar to what we plan to use in our research and analysis.
In addition, we also find experimental evidence in Akee and Co.’s 2010 paper (Akee, Copeland and Keeler). In this, they are able to isolate the causal effect of parental income on children’s educational and income outcomes using quasi-experiments in the United States. While these experiments use exogenous changes to parental income, they find improvements in the outcomes for students in the long run (Akee, Copeland and Keeler). The findings of this experimental research are unique, and while we are unable to recreate such a scenario with our resources, it is a useful resource for potential control variables we may use to improve our models over time.
However, while a lot of these papers do suggest that more research is needed, McPherson and Schapiro write in their 1994 paper (McPherson and Schapiro) that more focus has been given to the question of “access” to education rather than the question of “choice”. They have found that over time, the gulf created by family income for student’s income outcomes has fallen, partly because of financial aid systems and rising incomes overall (McPherson and Schapiro). Many of the papers mentioned above certainly bring up the lack of research into the “choice” aspect. Hotz and Co. talk about how little is known about the effect of parents financing their children’s educations on both parties’ income outcomes (Hotz, Wiemers and Rasmussen). The best piece we found on this discussion is Susan E. Mayer’s review of existing literature in the field, which finds many works with conflicting results (Mayer). She claims that the best guesses have only suggested small increases in future economic outcomes for students, but notes the uncertainty and lack of research surrounding the topic. This motivates our own research in question, as we want to try and fill some of the underlying gaps that exist in the many frameworks. We understand that the idea of “choice” is difficult for us to study, since we do not have data on other factors influencing the decision such as academic ability, proximity to home, and more. However, we may be able to capture a relationship between these 3 variables, which can motivate further research in this direction.
We begin to investigate our question using a dataset from Harvard’s Dataverse platform. It is the replication data for another paper titled Income Segregation and Intergenerational Mobility Across Colleges in the United States, authored by Chetty and others (Chetty, Friedman and Saez). It provides us with numerous variables, but are only using 3 which pertain to our research. These 3 include mean parental income, university tier, and mean kid earnings. To elaborate, mean parental income is the average income for a parent of a child attending a university, while mean kid income is that child’s income after they graduate and find a job. The tier refers to a university’s selectiveness, and is divided into 14 categories. They are:
We plan to recode this variable in order to exclude 2-year colleges, and have each category include an equal number of colleges. This will allow us to divide our sample better for our regression method, discussed in the next section. As for any missing data, we plan to simply drop them, as it is the safest and least time-consuming option, while still allowing us to come to significant conclusions. Below is how we plan to recode our data:
Then, we plan to run 5 regression models. The first one will simply be kid versus parent income for the entire country. The second one will be the same but only for the state of California. We do this to narrow our sample a bit to reduce noise. Additionally, California has some of the more elite schools in the country, and is also an expensive state to live in, and both these effects can make our results more interesting. The third model will add tier into the mix for the narrowed sample. Now, we can look at how each of these variables individually affect kid incomes. Our fourth model adds an interaction term between college tier and parent income. The reason for this is to see if there is a relationship between our independent variables themselves, which will add more explanation to our model. Our fifth model does the same thing, but increases the size of the sample back to the whole country.
The regression coefficient on parent income will provide us with the amount of increase in child income for a unit increase in parent income, in a given tier. Similarly, the coefficient on the college tier will tell us how kid income changes as the university’s quality falls. While it is a categorical variable, the reduction in the number indicates a more selective and competitive university, which in result also shows how these universities are more expensive. The coefficient on the interaction term will allow us to determine what kind of relationship exists between our two independent variables, and how this combined effect influences kid incomes. From this, we will obtain a more clear and concise answer to our own research question.
In the first bar graph, we see a smaller sample size when looking at the number of colleges in California based on tier level. As a result of this, we find that the histogram is skewed more towards the left, and does not give a normal distribution as the highest values lean far left and smallest values on the right. We see our maximum value of 25 colleges under Tier 2, being selective public universities in CA and our minimum value of 5 colleges in Tier 4, non-selective 4-year public and private not-for-profit universities in CA.
In the second bar graph, we have a normal distribution when looking at the number of colleges/universities in the entire nation, United States. We see our maximum value of about a little over 575 schools under tier 3, selective private institutions while our minimum value of about 75 schools is under tier 5, four-year for-profit schools in the United States.
As a result of this, in California, we see a larger number of selective public and selective private schools. In contrast, the entire United States sees a larger amount of private institutions.
Additionally, we calculated some correlations between our variables, listed below:
The negative correlations between tier and both the incomes suggests how more elite universities have both richer parents and students. And this can be connected to the correlation between parent and kid incomes, where it is a positive correlation implying that richer kids are associated with richer families.
We ran 5 different linear regression models, like we mentioned in our Methods section. To recap, model 1 simply has mean kid income versus mean parent income for the entire country. Model 2 is the same but only for California. Model 3 adds in the tier of a college to the regression. Model 4 includes an interaction term to capture the relationship between mean parent income and college tier. Model 5 does the same, but reverts back to the full country sample. Below are our results:
All our results are statistically significant, with very small p-values. In all 5 models, mean parent income has a positive slope, implying the positive correlation we hypothesized between this and mean kid income. This makes sense since richer parents tend to be able to financially support their children better, possibly giving them a better chance of finding high paying jobs. This could be by funding better education, or simply being able to support them as they take their time in finding a job. Poorer families are more likely to need their kids to begin working early on to help support the household. Thus, kids can be pushed into lower paying jobs. Additionally, like we mentioned with education, going to better universities can afford you better opportunities, which can result in better incomes. There are many, many more factors at play, but the correlation between the two is not a coincidence, clearly.
Adding in tier in the third model clearly improved our model, with the R2 values jumping quite a bit. Tier has a negative slope in all its models. As discussed earlier, a rise in the tier value means a drop in college selectivity. Thus, as the tier value “rises”, i.e., the university quality drops, the mean kid incomes also drop. Once again, this aligns with our hypothesis. A lower quality university is not trying to reduce your income, but may simply lack the facilities, opportunities and reputation of some more elite institutions. An argument about the ability of students in these universities being lower can be made, but since we do not have data about students’ grades and academic performance, we cannot make this claim and it would be outside the scope of this paper.
The 4th model that included our interaction term revealed the negative relationship between parent income and college tier, implying a fall in mean parent incomes as university quality drops. The result, however, is not statistically significant so it may not have been very relevant. However, when we expand the sample back to the full country, in the 5th model, the result is now significant. With more data, we can see that the negative relationship between parent income and college tier is real. This seems like an expected situation as well, since better universities can be much more expensive, and the real gain to going to very elite universities might not be desired by lower income groups, who may value financial stability much more in the long run.
Overall, each new model added to our understanding of the relationship between mean kid income, mean parent income and college tier, all having highly significant F-statistics and improving R2 values.
Our project revealed an unfortunate but real correlation between parent and child incomes - children from richer families tend to have better income outcomes into the future, and part of this can be attributed to the fact that they tend to attend better schools. We know there is no causal effect here, i.e., richer parents do not cause richer children in the future solely because they are richer. But, intuitively, one could argue that richer parents help kids afford higher tier universities, which tend to be tied with improved income outcomes for their children.
We noticed that this effect tended to be stronger in lower tier universities. These universities also had lower income levels in general, and a diminishing returns effect could be behind this. We saw graphical evidence of this when we graphed regression lines, split by tier.
We know there is a lot of room for improvement with this project, especially since we cannot arrive at any kind of causal inference. There are numerous factors affecting a student’s college attendance decision, including distance from home, relevant course and program offerings, university facilities, and more. We can also improve the study by bringing in more variables. Some ideas include comparing California to other states, or to include variables like test scores and gender. Additionally, Ivy League schools by themselves could be an interesting study. One of the outliers in our data is Stanford University, seen in the California models as the one data point in the top right hand corner. As it turns out, for most universities, while the mean income is always higher than the median income, the gap is not massive. However, for Stanford, the mean is about twice the size of the median. We left the data point in there because it offers this scope for further study into why elite schools are still “elite”.
In conclusion, we were able to reveal a potential factor in long-run income inequality in the USA. Education costs have been soaring recently, with many questioning whether a college degree is now really worth what it costs, both financially and otherwise. Given rising living and tuition costs across the country, federal student loans need to keep up to ensure that students are not caught in debt traps, all while still encouraging higher education.
Akee, Randall K.Q., et al. “Parents’ Incomes and Children’s Outcomes: A Quasi-Experiment.” American Economic Journal: Applied Economics (2010): 35. Online Document. February 2023. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2891175/pdf/nihms-129450.pdf.
Hotz, V Joseph, et al. “The Role of Parental Wealth and Income in Financing Children’s College Attendance and its Consequences.” NBER WORKING PAPER SERIES (2018): 48. Online document. February 2023. https://www.nber.org/system/files/working_papers/w25144/w25144.pdf.
Lin, Tao and Han Lv. “The effects of family income on children’s education: An empirical analysis of CHNS data.” Asian Academic Press (2017): 6. Online Document. February 2023. https://www.researchgate.net/publication/321317848_The_effects_of_family_income_on_children%27s_education_An_empirical_analysis_of_CHNS_data.
Mayer, Susan E. “Revisiting an old question: How much does parental income affect child outcomes?” Focus (2010): 6. Online Document. February 2023. https://www.irp.wisc.edu/publications/focus/pdfs/foc272e.pdf.
McPherson, Michael S. and Morton Owen Schapiro. “College Choice and Family Income: Changes over Time in the Higher Education Destinations of Students from Different Income Backgrounds.” Research. 1994. Online Document. February 2023. https://files.eric.ed.gov/fulltext/ED380024.pdf.
Chetty, Raj, et al. “Income Segregation and Intergenerational Mobility Across Colleges in the United States.” Quarterly Journal of Economics 135.3 (2020): 66. Online Document. January 2023. https://dataverse.harvard.edu/dataset.xhtml;jsessionid=9d58fcc59ff22b4da1f8390ed2cc?persistentId=doi:10.7910/DVN/RYLJKZ.