-
Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets
Authors:
Richard A. Berk,
Arun Kumar Kuchibhotla,
Eric Tchetgen Tchetgen
Abstract:
In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the pr…
▽ More
In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the prospects for improved fairness when members of a less privileged group are treated by a risk algorithm as if they are members of a more privileged group. We combine a machine learning classifier trained in a novel manner with an optimal transport adjustment for the relevant joint probability distributions, which together provide a constructive response to claims of bias-in-bias-out. A key distinction is between fairness claims that are empirically testable and fairness claims that are not. We then use confusion tables and conformal prediction sets to evaluate achieved fairness for projected risk. Our data are a random sample of 300,000 offenders at their arraignments for a large metropolitan area in the United States during which decisions to release or detain are made. We show that substantial improvement in fairness can be achieved consistent with a Pareto improvement for protected groups.
△ Less
Submitted 9 August, 2022; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Post-Model-Selection Statistical Inference with Interrupted Time Series Designs: An Evaluation of an Assault Weapons Ban in California
Authors:
Richard A. Berk
Abstract:
There have been many claims in the media and a bit of respectable research about the causes of variation in firearm sales. The challenges for causal inference can be quite daunting. This paper reports an analysis of daily handgun sales in California from 1996 through 2018 using an interrupted time series design and analysis. The design was introduced to social scientists in 1963 by Campbell and St…
▽ More
There have been many claims in the media and a bit of respectable research about the causes of variation in firearm sales. The challenges for causal inference can be quite daunting. This paper reports an analysis of daily handgun sales in California from 1996 through 2018 using an interrupted time series design and analysis. The design was introduced to social scientists in 1963 by Campbell and Stanley, analysis methods were proposed by Box and Tiao in 1975, and more recent treatments are easily found (Box et al., 2016). But this approach to causal inference can be badly overmatched by the data on handgun sales, especially when the causal effects are estimated. More important for this paper are fundamental oversights in the standard statistical methods employed. Test multiplicity problems are introduced by adaptive model selection built into recommended practice. The challenges are computational and conceptual. Some progress is made on both problems that arguably improves on past research, but the take-home message may be to reduce aspirations about what can be learned.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Nested Conformal Prediction Sets for Classification with Applications to Probation Data
Authors:
Arun K. Kuchibhotla,
Richard A. Berk
Abstract:
Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current,…
▽ More
Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current, statistical learning risk methods. Because the content of each offender's supervision is substantially shaped by a forecast of subsequent conduct, forecasts have real consequences. Here we consider the probability that risk forecasts are correct. We augment standard statistical learning estimates of forecasting uncertainty (i.e., confusion tables) with uncertainty estimates from nested conformal prediction sets. In a demonstration of concept using data from the department of probation and parole, we show that the standard uncertainty measures and uncertainty measures from nested conformal prediction sets can differ dramatically in concept and output. We also provide a modification of nested conformal called the localized conformal method to match confusion tables more closely when possible. A strong case can be made favoring the nested and localized conformal approach. As best we can tell, our formulation of such comparisons and consequent recommendations is novel.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Conformal Prediction Sets
Authors:
Richard A. Berk,
Arun Kumar Kuchibhotla
Abstract:
Risk assessment algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we adopt a framework from conformal prediction sets to remove unfairness from risk algorithms themselves and the covariates used for forecasting. From a sample of 300,000 offenders at their arraignments, we construct a confusion table an…
▽ More
Risk assessment algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we adopt a framework from conformal prediction sets to remove unfairness from risk algorithms themselves and the covariates used for forecasting. From a sample of 300,000 offenders at their arraignments, we construct a confusion table and its derived measures of fairness that are effectively free any meaningful differences between Black and White offenders. We also produce fair forecasts for individual offenders coupled with valid probability guarantees that the forecasted outcome is the true outcome. We see our work as a demonstration of concept for application in a wide variety of criminal justice decisions. The procedures provided can be routinely implemented in jurisdictions with the usual criminal justice datasets used by administrators. The requisite procedures can be found in the scripting software R. However, whether stakeholders will accept our approach as a means to achieve risk assessment fairness is unknown. There also are legal issues that would need to be resolved although we offer a Pareto improvement.
△ Less
Submitted 21 May, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Almost Politically Acceptable Criminal Justice Risk Assessment
Authors:
Richard A. Berk,
Ayya A. Elzarka
Abstract:
In criminal justice risk forecasting, one can prove that it is impossible to optimize accuracy and fairness at the same time. One can also prove that it is impossible optimize at once all of the usual group definitions of fairness. In the policy arena, one is left with tradeoffs about which many stakeholders will adamantly disagree. In this paper, we offer a different approach. We do not seek perf…
▽ More
In criminal justice risk forecasting, one can prove that it is impossible to optimize accuracy and fairness at the same time. One can also prove that it is impossible optimize at once all of the usual group definitions of fairness. In the policy arena, one is left with tradeoffs about which many stakeholders will adamantly disagree. In this paper, we offer a different approach. We do not seek perfectly accurate and fair risk assessments. We seek politically acceptable risk assessments. We describe and apply to data on 300,000 offenders a machine learning approach that responds to many of the most visible charges of "racial bias." Regardless of whether such claims are true, we adjust our procedures to compensate. We begin by training the algorithm on White offenders only and computing risk with test data separately for White offenders and Black offenders. Thus, the fitted algorithm structure is exactly the same for both groups; the algorithm treats all offenders as if they are White. But because White and Black offenders can bring different predictors distributions to the white-trained algorithm, we provide additional adjustments if needed. Insofar are conventional machine learning procedures do not produce accuracy and fairness that some stakeholders require, it is possible to alter conventional practice to respond explicitly to many salient stakeholder claims even if they are unsupported by the facts. The results can be a politically acceptable risk assessment tools.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
An Algorithmic Approach to Forecasting Rare Violent Events: An Illustration Based in IPV Perpetration
Authors:
Richard A. Berk,
Susan B. Sorenson
Abstract:
Mass violence, almost no matter how defined, is (thankfully) rare. Rare events are very difficult to study in a systematic manner. Standard statistical procedures can fail badly and usefully accurate forecasts of rare events often are little more than an aspiration. We offer an unconventional approach for the statistical analysis of rare events illustrated by an extensive case study. We report res…
▽ More
Mass violence, almost no matter how defined, is (thankfully) rare. Rare events are very difficult to study in a systematic manner. Standard statistical procedures can fail badly and usefully accurate forecasts of rare events often are little more than an aspiration. We offer an unconventional approach for the statistical analysis of rare events illustrated by an extensive case study. We report research whose goal is to learn about the attributes of very high risk IPV perpetrators and the circumstances associated with their IPV incidents reported to the police. Very high risk is defined as having a high probability of committing a repeat IPV assault in which the victim is injured. Such individuals represent a very small fraction of all IPV perpetrators; these acts of violence are relatively rare. To learn about them nevertheless, we apply in a novel fashion three algorithms sequentially to data collected from a large metropolitan police department: stochastic gradient boosting, a genetic algorithm inspired by natural selection, and agglomerative clustering. We try to characterize not just perpetrators who on balance are predicted to re-offend, but who are very likely to re-offend in a manner that leads to victim injuries. With this strategy, we learn a lot. We also provide a new way to estimate the importance of risk predictors. There are lessons for the study of other rare forms of violence especially when instructive forecasts are sought. In the absence of sufficiently accurate forecasts, scarce prevention resources cannot be allocated where they are most needed.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Using Recursive Partitioning to Find and Estimate Heterogenous Treatment Effects In Randomized Clinical Trials
Authors:
Richard Berk,
Matthew Olson,
Andreas Buja,
Aurelie Ouss
Abstract:
Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively ob…
▽ More
Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively obtained as an experiment's data are analyzed, significant complications are introduced. There can be a need for special loss functions designed to find local average treatment effects and for techniques that properly address post selection statistical inference. In this paper, we tackle both while undertaking a recursive partitioning analysis of a randomized clinical trial testing whether individuals on probation, who are low risk, can be minimally supervised with no increase in recidivism.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Assumption Lean Regression
Authors:
Richard Berk,
Andreas Buja,
Lawrence Brown,
Edward George,
Arun Kumar Kuchibhotla,
Weijie J. Su,
Linda Zhao
Abstract:
It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that ca…
▽ More
It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that can have a number of desirable statistical properties, including estimates that are asymptotically unbiased. Valid statistical inference follows. We generalize the formulation to include regression functionals, which broadens substantially the range of potential applications. An empirical application is provided to illustrate the paper's key concepts.
△ Less
Submitted 26 June, 2018; v1 submitted 23 June, 2018;
originally announced June 2018.
-
A Convex Framework for Fair Regression
Authors:
Richard Berk,
Hoda Heidari,
Shahin Jabbari,
Matthew Joseph,
Michael Kearns,
Jamie Morgenstern,
Seth Neel,
Aaron Roth
Abstract:
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given datas…
▽ More
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given dataset, and we measure the severity of this trade-off via a numerical quantity we call the Price of Fairness (PoF). The centerpiece of our results is an extensive comparative study of the PoF across six different datasets in which fairness is a primary consideration.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Fairness in Criminal Justice Risk Assessments: The State of the Art
Authors:
Richard Berk,
Hoda Heidari,
Shahin Jabbari,
Michael Kearns,
Aaron Roth
Abstract:
Objectives: Discussions of fairness in criminal justice risk assessments typically lack conceptual precision. Rhetoric too often substitutes for careful analysis. In this paper, we seek to clarify the tradeoffs between different kinds of fairness and between fairness and accuracy.
Methods: We draw on the existing literatures in criminology, computer science and statistics to provide an integrate…
▽ More
Objectives: Discussions of fairness in criminal justice risk assessments typically lack conceptual precision. Rhetoric too often substitutes for careful analysis. In this paper, we seek to clarify the tradeoffs between different kinds of fairness and between fairness and accuracy.
Methods: We draw on the existing literatures in criminology, computer science and statistics to provide an integrated examination of fairness and accuracy in criminal justice risk assessments. We also provide an empirical illustration using data from arraignments.
Results: We show that there are at least six kinds of fairness, some of which are incompatible with one another and with accuracy.
Conclusions: Except in trivial cases, it is impossible to maximize accuracy and fairness at the same time, and impossible simultaneously to satisfy all kinds of fairness. In practice, a major complication is different base rates across different legally protected groups. There is a need to consider challenging tradeoffs.
△ Less
Submitted 27 May, 2017; v1 submitted 27 March, 2017;
originally announced March 2017.
-
Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference
Authors:
Daniel McCarthy,
Kai Zhang,
Lawrence Brown,
Richard Berk,
Andreas Buja,
Edward George,
Linda Zhao
Abstract:
We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alterna…
▽ More
We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alternative methods in challenging situations such as these. The superior performance of perc-cal is demonstrated by a thorough, full-factorial design synthetic data study as well as a real data example involving the length of criminal sentences. We also provide theoretical justification for the perc-cal method under mild conditions. The method is implemented in the R package `perccal', available through CRAN and coded primarily in C++, to make it easier for practitioners to use.
△ Less
Submitted 16 January, 2017; v1 submitted 1 November, 2015;
originally announced November 2015.
-
Using Regression Kernels to Forecast A Failure to Appear in Court
Authors:
Richard Berk,
Justin Bleich,
Adam Kapelner,
Jaime Henderson,
Geoffrey Barnes,
Ellen Kurtz
Abstract:
Forecasts of prospective criminal behavior have long been an important feature of many criminal justice decisions. There is now substantial evidence that machine learning procedures will classify and forecast at least as well, and typically better, than logistic regression, which has to date dominated conventional practice. However, machine learning procedures are adaptive. They "learn" inductivel…
▽ More
Forecasts of prospective criminal behavior have long been an important feature of many criminal justice decisions. There is now substantial evidence that machine learning procedures will classify and forecast at least as well, and typically better, than logistic regression, which has to date dominated conventional practice. However, machine learning procedures are adaptive. They "learn" inductively from training data. As a result, they typically perform best with very large datasets. There is a need, therefore, for forecasting procedures with the promise of machine learning that will perform well with small to moderately-sized datasets. Kernel methods provide precisely that promise. In this paper, we offer an overview of kernel methods in regression settings and compare such a method, regularized with principle components, to stepwise logistic regression. We apply both to a timely and important criminal justice concern: a failure to appear (FTA) at court proceedings following an arraignment. A forecast of an FTA can be an important factor is a judge's decision to release a defendant while awaiting trial and can influence the conditions imposed on that release. Forecasting accuracy matters, and our kernel approach forecasts far more accurately than stepwise logistic regression. The methods developed here are implemented in the R package kernReg currently available on CRAN.
△ Less
Submitted 5 September, 2014;
originally announced September 2014.
-
Evaluating the Effectiveness of Personalized Medicine with Software
Authors:
Adam Kapelner,
Justin Bleich,
Alina Levine,
Zachary D. Cohen,
Robert J. DeRubeis,
Richard Berk
Abstract:
We present methodological advances in understanding the effectiveness of personalized medicine models and supply easy-to-use open-source software. Personalized medicine involves the systematic use of individual patient characteristics to determine which treatment option is most likely to result in a better outcome for the patient on average. Why is personalized medicine not done more in practice?…
▽ More
We present methodological advances in understanding the effectiveness of personalized medicine models and supply easy-to-use open-source software. Personalized medicine involves the systematic use of individual patient characteristics to determine which treatment option is most likely to result in a better outcome for the patient on average. Why is personalized medicine not done more in practice? One of many reasons is because practitioners do not have any easy way to holistically evaluate whether their personalization procedure does better than the standard of care. Our software, "Personalized Treatment Evaluator" (the R package PTE), provides inference for improvement out-of-sample in many clinical scenarios. We also extend current methodology by allowing evaluation of improvement in the case where the endpoint is binary or survival. In the software, the practitioner inputs (1) data from a single-stage randomized trial with one continuous, incidence or survival endpoint and (2) a functional form of a model for the endpoint constructed from domain knowledge. The bootstrap is then employed on data unseen during model fitting to provide confidence intervals for the improvement for the average future patient (assuming future patients are similar to the patients in the trial). One may also test against a null scenario where the hypothesized personalization are not more useful than a standard of care. We demonstrate our method's promise on simulated data as well as on data from a randomized comparative trial investigating two treatments for depression.
△ Less
Submitted 21 November, 2020; v1 submitted 30 April, 2014;
originally announced April 2014.
-
Models as Approximations I: Consequences Illustrated with Linear Regression
Authors:
Andreas Buja,
Richard Berk,
Lawrence Brown,
Edward George,
Emil Pitkin,
Mikhail Traskin,
Linda Zhao,
Kai Zhang
Abstract:
In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be trea…
▽ More
In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be treated as fixed.
The consequences are deep: (1)~population slopes need to be re-interpreted as statistical functionals obtained from OLS fits to largely arbitrary joint $\xy$~distributions; (2)~the meaning of slope parameters needs to be rethought; (3)~the regressor distribution affects the slope parameters; (4)~randomness of the regressors becomes a source of sampling variability in slope estimates; (5)~inference needs to be based on model-robust standard errors, including sandwich estimators or the $\xy$~bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test.
△ Less
Submitted 6 July, 2019; v1 submitted 6 April, 2014;
originally announced April 2014.
-
Improved Precision in Estimating Average Treatment Effects
Authors:
Emil Pitkin,
Richard Berk,
Lawrence Brown,
Andreas Buja,
Ed George,
Kai Zhang,
Linda Zhao
Abstract:
The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a…
▽ More
The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a regression, and information is pooled between the groups to produce an asymptotically unbiased estimate; we subsequently justify the random X paradigm underlying the result. Standard errors are derived, and the estimator's performance is compared to the traditional estimator. Conditions under which the regression-based estimator is preferable are detailed, and a demonstration on real data is presented.
△ Less
Submitted 1 November, 2013;
originally announced November 2013.
-
Small area estimation of the homeless in Los Angeles: An application of cost-sensitive stochastic gradient boosting
Authors:
Brian Kriegler,
Richard Berk
Abstract:
In many metropolitan areas efforts are made to count the homeless to ensure proper provision of social services. Some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. Counts are observed in sampled regions but must be imputed in unvisited areas. Along with the imputation process, the costs of underestimating and overestimating may be…
▽ More
In many metropolitan areas efforts are made to count the homeless to ensure proper provision of social services. Some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. Counts are observed in sampled regions but must be imputed in unvisited areas. Along with the imputation process, the costs of underestimating and overestimating may be different. For example, if precise estimation in areas with large homeless c ounts is critical, then underestimation should be penalized more than overestimation in the loss function. We analyze data from the 2004--2005 Los Angeles County homeless study using an augmentation of $L_1$ stochastic gradient boosting that can weight overestimates and underestimates asymmetrically. We discuss our choice to utilize stochastic gradient boosting over other function estimation procedures. In-sample fitted and out-of-sample imputed values, as well as relationships between the response and predictors, are analyzed for various cost functions. Practical usage and policy implications of these results are discussed briefly.
△ Less
Submitted 12 November, 2010;
originally announced November 2010.
-
Counting the homeless in Los Angeles County
Authors:
Richard Berk,
Brian Kriegler,
Donald Ylvisaker
Abstract:
Over the past two decades, a variety of methods have been used to count the homeless in large metropolitan areas. In this paper, we report on an effort to count the homeless in Los Angeles County, one that employed the sampling of census tracts. A number of complications are discussed, includingÊ the need to impute homeless counts to areas of Êthe CountyÊ not sampled. We conclude that, despite t…
▽ More
Over the past two decades, a variety of methods have been used to count the homeless in large metropolitan areas. In this paper, we report on an effort to count the homeless in Los Angeles County, one that employed the sampling of census tracts. A number of complications are discussed, includingÊ the need to impute homeless counts to areas of Êthe CountyÊ not sampled. We conclude that, despite their imperfections, estimated counts provided useful and credible information to the stakeholders involved.
△ Less
Submitted 19 May, 2008;
originally announced May 2008.