U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

Detection and Characterization of Subpopulations and the Study of Algorithmic Bias in Forensic Identification of Source Problems

NCJ Number
308220
Author(s)
Semhar Michael; Andrew Simpson; Dylan Borchert; Christopher Saunders; Liansheng (Larry) Tang
Date Published
June 2023
Length
11 pages
Annotation

The report identifies potential algorithmic bias towards minority populations.

Abstract

The work in this report identifies that there could be an alarming amount of algorithmic bias towards a minority population as measured by rates of misleading evidence. The authors proposed a mixture-based solution to model subpopulations in hierarchically structured data. In this work, the authors plan to focus on identifying and characterizing subpopulations in the relevant population when there are hierarchically structured data through semi-supervised finite mixture models adjusted for the hierarchical sampling procedure. In addition, the authors plan to study systematic algorithmic biases that can occur as measured by rates of misleading evidence for each of the subpopulations when the subpopulation structure is not accounted for. The authors illustrate this based on a simulation study using synthetic data and classical glass datasets. The semi-supervised model was more accurate over random train test split validation. The semi-supervised approach also performs better at assigning the same membership to technical replicates of the same fragments. The smaller variability supports that the semi-supervised approach gives a more reliable model. The forensic source identification problem involves providing the summary of the forensic evidence to a decisionmaker via the value of that evidence. This can be done via the forensic likelihood ratio, which in turn requires modeling of a relevant background population. Some of the commonly used methods involve the assumption of normality. However, there might exist a latent variable representing an underlying subpopulation structure.