Part 1: Analysis with Galton’s original data set
Galton’s work on children and parents’ height was published in: Galton, F. (1886): “Regression
towards mediocrity in hereditary stature”, Journal of the Anthropological Institute, 15: 246-63. In
this first part of the project you are asked to reconstruct the original data from this original article
and replicate his analysis.
Galton’s work on children and parents’ height was published in: Galton, F. (1886): “Regression
towards mediocrity in hereditary stature”, Journal of the Anthropological Institute, 15: 246-63. In
this first part of the project you are asked to reconstruct the original data from this original article
and replicate his analysis.
• Question 1. Find Galton’s original article (on jstor.org or LEARN). On Table I of his article,
the data used are summarized. You need to create a STATA data set that contains the 928
observations that Galton collected. It is recommended that you first type the data in an excel
file and then have STATA read that file. Some versions of the Galton data set are available
online. You are advised NOT to use them. It is part of this project that you show that you
understand how to make a data set from such a table. There are important conceptual issues
that you will miss if you borrow the data from somewhere else.
For those observations reported in Table I of Galton’s article as “below” or “above” the min-
imum and maximum height values, you need to assume some particular values. Please state
these explicitly in a table and provide a justification with one sentence. Define “tall parents”
and “short parents” according to your data. Then divide your sample into these two groups
and report relevant statistics for the adult children and for parents in each group. Report this
information in a table and comment it.
the data used are summarized. You need to create a STATA data set that contains the 928
observations that Galton collected. It is recommended that you first type the data in an excel
file and then have STATA read that file. Some versions of the Galton data set are available
online. You are advised NOT to use them. It is part of this project that you show that you
understand how to make a data set from such a table. There are important conceptual issues
that you will miss if you borrow the data from somewhere else.
For those observations reported in Table I of Galton’s article as “below” or “above” the min-
imum and maximum height values, you need to assume some particular values. Please state
these explicitly in a table and provide a justification with one sentence. Define “tall parents”
and “short parents” according to your data. Then divide your sample into these two groups
and report relevant statistics for the adult children and for parents in each group. Report this
information in a table and comment it.
• Question 2. Galton was the first to describe and explain the phenomenon of “regression to-
wards the mean”. Being concerned about the height of the English aristocracy, he interpreted
his results as “regression to mediocrity” (hence the name “regression”).
Regress the height of adult children against the height of parents. Report your results in a table
and interpret the estimated
wards the mean”. Being concerned about the height of the English aristocracy, he interpreted
his results as “regression to mediocrity” (hence the name “regression”).
Regress the height of adult children against the height of parents. Report your results in a table
and interpret the estimated
What can you say about the relationship between the
height of parents and their children? Are children of tall (short) parents as tall (short) as
their parents?
height of parents and their children? Are children of tall (short) parents as tall (short) as
their parents?
Question 3. Taking your regression results from question 2, and using your definition of
“tall parents” and “short parents” from question 1:
Calculate the predicted adult children’s height whose parents are “tall” after 1, 2, 3, …, Z
generations. And similarly, for adult children of “short” parents. Report your results in a
table. Is there convergence in heights? If so, how many generations does it take? Is Galton’s
prediction correct?
“tall parents” and “short parents” from question 1:
Calculate the predicted adult children’s height whose parents are “tall” after 1, 2, 3, …, Z
generations. And similarly, for adult children of “short” parents. Report your results in a
table. Is there convergence in heights? If so, how many generations does it take? Is Galton’s
prediction correct?
• Question 4. Using the same data set,
Regress the height of parents against the height of adult children. Report your results in a
table. Is this regression equivalent to that in question 2? Are the estimated parameters the
same? Why or why not?
Regress the height of parents against the height of adult children. Report your results in a
table. Is this regression equivalent to that in question 2? Are the estimated parameters the
same? Why or why not?