This is Team MattMarifelSora’s submission to the National Institute of Justice’s (NIJ’s) Recidivism Forecasting Challenge.
The team applied data processing and machine learning techniques to predict how likely it was that individuals would recidivate. This included applying hierarchical Bayesian target encoding and trained models that are known to perform well on binary classification and multiclass classification problems that involve tabular data. Following the industry standard in machine learning competitions, the team combined predictions from many models into an ensemble to boost the team’s score. In its work, the team used gradient boosted decision trees via the XGBoost and LightGBM libraries and created a custom MLP with skip connections using the PyTorch library. In addition, the team used the dreamquark implementation of a modern neutral network architecture known as TabNet, which takes advantage of attention mechanisms to selectively focus on input features. Further, the team tried NODE and SVM models; however, their performances were comparatively worse and not included in the team’s pipeline. Regarding efforts to reduce racial bias in predicting recidivism this was complicated by bias in initial arrests, since arrest data persist in data that informs recidivism.
Downloads
Similar Publications
- Planning, Implementing, and Assessing Law Enforcement Responses to Homelessness
- Determination of the species identity of necrophagous insect puparial casings using field desorption mass spectrometry
- Development of a spectral X-ray fluorescence database to strengthen the scientific foundations for the forensic analysis and interpretation of modern soda-lime glass