# explains the tests conducted to decide whether numbers are printed in bold typeface or not. Instead of using bold typeface, add the p-values from these tests below the standard errors.

Tax Compliance

Problem description
This assignment is based on Merriman, David (2010): The Micro-geography of Tax Avoidance: Evidence from Littered Cigarette Packs in Chicago, American Economic Journal: Economic Policy 2(2), 61-84. The author derives a strategy to measure tax compliance by analyzing tax stamps on littered cigarette packs.
Use the dataset cig_data.dta provided.
The tables created do not need to look exactly like the tables in the paper, but they should look like tables that could be published.

Question 1: Replication of Table 2
Estimate all numbers provided in Table 2 in the paper (including the standard errors). The note below Table 2 explains which observations are not included in the table. You will need to generate some additional dummy variables to create the complete Table. Report weighted means using 1/os_obs_weight as sampling weights.
The table note also explains the tests conducted to decide whether numbers are printed in bold typeface or not. Instead of using bold typeface, add the p-values from these tests below the standard errors.

Question 2: Replication of Table 3
Estimate all numbers reported in the Table (including robust standard errors). The note below the table explains which observations are included in the sample; in addition to the observations excluded in Table 1, you will need to exclude 17 further observations. Estimate marginal effects at the mean values of the independent variables. For dummy variables, estimate the discrete change in the estimated probability when the variable switches from zero to one (you may need to tell the program which variables are dummy variables). There is one restriction: Do not use the outdated command “dprobit” in Stata.

Question 3: Exploratory Analysis
After having estimated the model in column 3 of Table 3 in the paper, plot the relationship between the distance to the Indiana border (on the x-axis) and the predicted probability of a local stamp (on the y-axis). Use increments of one mile for the distance over a meaningful range and evaluate the probabilities at the mean values of the other independent variables.
(Hint: In Stata, you can use marginsplot.)
Now add a squared term for the distance to the Indiana border to the probit model in column 3 of Table 3 and re-estimate it. Report the probit coefficients (not marginal effects) and robust standard errors in a table. Is there evidence of a nonlinear effect of the distance on the probability of a local stamp? Plot the predicted relationship using this estimated model. What is your interpretation of the results? 