Why is it important to use clustered standard errors for the regression (2 points)?
Research Questions
1. (1 points) Is the data set a balanced panel? Explain. Hint: Look through the dataset
or use the is.pbalanced() function.
2. (6 points) The index of political democracy/political freedom is labeled dem ind.
(a) What is the value of dem ind for the United States in 2000 (1 point)? What is the
average of dem ind for the United States over all years in the data set (1 points)?
(b) How many missing values are there for dem ind (1 point)? List all countries with the
lowest average value of dem ind (i.e., =0) (2 points). How many countries do have
the highest average value of dem ind (i.e., =1) (1 point)? Hint: You can identify
1These data were provided by Daron Acemoglu of M.I.T. and were used in his paper with Simon Johnson,
James Robinson, and Pierre Yared, “Income and Democracy,” American Economic Review, 2008, 98:3, 808–842. missing values using the is.na() function and compute sample average for each
country using the aggregate() function.
3. (11 points) The logarithm of per capita income is labeled log gdppc.
(a) Regress dem ind on log gdppc using standard errors that are clustered by country.
Report your estimation results in a table similar to Table 10.1 on page 378 of SW
textbook without the test part (2 points).
(b) Interpret the estimated coefficient on log gdppc (1 points)? Is the coefficient statistically significant (1 point)?
(c) If per capita GDP in a country increases by 20%, by how much is dem ind predicted
to increase (1 points)?
Construct a 95% confidence interval for the prediction (2 points)?
(d) Why is it important to use clustered standard errors for the regression (2 points)?
Do the results change if you do not use clustered standard errors (2 points)?
4. (27 points) Consider panel data regressions.
(a) Suggest a variable that varies across countries but plausibly varies little (or not at
all) over time and that could cause omitted variable bias in the regression in Question
3 (Q3) above (3 points).
(b) Estimate the regression in Q3, allowing for country fixed effects. Add the estimation
results to the table in 3(a) (3 points). How do your answers to 3(b) and 3(c) change
(2 points)?
(c) Exclude the data for Azerbaijan and re-run the regression. Do the results change (2
points)? Why or why not (2 points)?
(d) Suggest a variable that varies over time but plausibly varies little (or not at all)
across countries and that could cause omitted variable bias in the regression in Q3
(3 points).
(e) Estimate the regression in Q3, allowing for both time and country fixed effects. Add
the estimation results to the table in 3(a) (3 points). How do your answers to 3(b)
and 3(c) change (4 points)?
(f) There are additional demographic controls in the data set. Should these variables
be included in the regression? If so, re-run the regression, including these controls.
Report the regression results in the table in 3(a) (3 points). How do the results
change when they are included (2 points)