Would you feel comfortable having a judge to use COMPAS to inform sentencing guidelines? What do you think, how well can judges perform the same task without COMPAS’s help?

1           Is COMPAS fair?  (50pt)

The first task is to analyze fairness of the COMPAS algorithm. As the algorithm is proprietary, you cannot use this to make your own predictions. But you do not need to predict anything anyway–the COMPAS predictions are already done and included as decile_score variable!

1.1          Load and check (2pt)

Your first tasks are the following:

  1. (1pt) Load the COMPAS data, and perform the basic
  2. (1pt) Filter the data to keep only Caucasian and African-Americans.

All the tasks below are about these two races, there are just too few other offenders.

 

1.2          Aggregate analysis (20pt)

COMPAS categorizes offenders into 10 different categories, starting from 1 (least likely to recidivate) till 10 (most likely). But for simplicity, we scale this down to two categories (low risk/high risk) only.

  1. (2pt) Create a new dummy variable based off of COMPAS risk score (decile_score), which indicates if an individual was classified as low risk (score 1-4) or high risk (score 5-10).

Hint: you can do it in different ways but for technical reasons related the tasks below, the best way to do it is to create a variable “high score”, that takes values 1 (decile score 5 and above) and 0 (decile score 1-4).

 

  1. (4pt) Now analyze the offenders across this new risk category:
    • What is the recidivism rate (percentage of offenders who re-commit the crime) for low- risk and high-risk individuals?
    • What are the recidivism rates for African-Americans and Caucasians? Hint: 39% for
  2. (7 pt) Now create a confusion matrix comparing COMPAS predictions for recidivism (low risk/high risk you created above) and the actual two-year recidivism and interpret the In order to be on the same page, let’s call recidivists “positives”.

Note: you do not have to predict anything here. COMPAS has made the prediction for you, this is the high risk variable you created in 1. See the referred articles about the controversy around COMPAS methodology.

Note 2: Do not just output a confusion matrix with accompanying text like “accuracy = x%, precision = y%”. Interpret your results such as “z% of recidivists were falsly classified as low-risk, COMPAS accurately classified k% of individuals, etc.”

  1. (7pt) Find the accuracy of the COMPAS classification, and also how its errors (false negatives and false positives) are distributed–compute precision, recall, false positive rate and false negative

We did not talk about FPR and FNR in class, but you can consult Lecture Notes, section

6.1.1 Confusion matrix and related concepts.

Would you feel comfortable having a judge to use COMPAS to inform sentencing guidelines? What do you think, how well can judges perform the same task without COMPAS’s help? At what point would the error/misclassification risk be acceptable for you? Do you think the acceptable error rate should be the same for human judges and for algorithms?

Remember: human judges are not perfect either!

 

 

1.3          Analysis by race (28pt)

  1. (2pt) Compute the recidivism rate separately for high-risk and low risk African-Americans and

Hint: High risk AA = 65%.

  1. (6pt) Comment the results in the previous point. How similar are the rates for the the two race groups for low-risk and high-risk individuals? Do you see a racial disparity here? If yes, which group is it favoring? Based on these figures, do you think COMPAS is fair?
  2. (6pt) Now repeat your confusion matrix calculation and analysis from 3. But this time do it separately for African-Americans and for Caucasians:
    • How accurate is the COMPAS classification for African-Americans and for Caucasians?
    • What are the false positive rates (false recidivism rates) FPR?
    • The false negative rates (false no-recidivism rates) FNR?

Hint: FPR for Caucasians is 0.22, FNR for African-Americans is 0.28.

 

  1. (6pt) If you have done this correctly, you will find that COMPAS’s percentage of correctly categorized individuals (accuracy) is fairly similar for African-Americans and Caucasians, but that false positive rates and false negative rates are different. In your opinion, is the COMPAS algorithm “fair”? Justify your
  2. (8pt) Does your answer in 4 align with your answer in 2? Explain!

Hint: This is not a trick question. If you read the first two recommended readings, you will find that people disagree how you define fairness. Your answer will not be graded on which side you take, but on your justification.

 

 

2           Can you beat COMPAS? (50pt)

COMPAS model has created quite a bit controversy. One issue frequently brought up is that it is “closed source”, i.e. its inner workings are not available neither for public nor for the judges who are actually making the decisions. But is it a big problem? Maybe you can devise as good a model as COMPAS to predict recidivism? Maybe you can do even better? Let’s try!

2.1          Create the model (30pt)

Create such a model. We want to avoid explicit race and gender bias, hence you do not want to include gender and race in order to avoid explicit race/gender bias. Finally, let’s analyze the performance of the model by cross-validation.

More detailed tasks are here:

  1. (6pt) Before we start: what do you think, what is an appropriate model performance measure here? A, P, R, F or something else? Maybe you want to report multiple measures? Explain!
  2. (6pt) you should not use variable decile score that originates from COMPAS model. Why?
  3. (8pt) Now it is time to do the modeling. Create a logistic regression model that contains all explanatory variables you have in data into the (Some of these you have to convert to dummies). Do not include the variables discussed above, do not include race and gender in this model either to avoid explicit gender/racial bias.

Use 10-fold CV to compute its relevant performance measure(s) you discussed above.

  1. (10pt) Experiment with different models to find the best model according to your performance Try trees and k-NN, you may also include other types of models. Include/exclude different variables. You may also do feature engineering, e.g. create a different set of age groups, include variables like age2, age2, interaction effects, etc. But do not include race and gender.

Report what did you try (no need to report the full results of all of your unsuccessful at- tempts), and your best model’s performance. Did you got better results or worse results than COMPAS?

 

2.2          Is your model more fair?  (20pt)

Finally, is your model any better (or worse) than COMPAS in terms of fairness? Let’s use your model to predict recidivism for everyone (i.e. all data, ignore training-testing split), and see if you managed to FPR and FNR for African-Americans and Caucasians are now similar.

  1. (6pt) Now use your model to compute the two-year recidivism rates by race and your risk prediction (replicate 31). Is your model more or less fair than COMPAS?
  2. (6pt) Compute FPR and FNR by race (replicate 33 the FNR/FPR question). Is your model more or less fair than COMPAS?
  3. (8pt) Explain what do you get and why do you get

 

 

Finally   tell us how many hours did you spend on this PS.

 

References

Kleinberg, J., Mullainathan, S. and Raghavan, M. (2016) Inherent trade-offs in the fair determina- tion of risk s