The report identifies potential algorithmic bias towards minority populations.
The work in this report identifies that there could be an alarming amount of algorithmic bias towards a minority population as measured by rates of misleading evidence. The authors proposed a mixture-based solution to model subpopulations in hierarchically structured data. In this work, the authors plan to focus on identifying and characterizing subpopulations in the relevant population when there are hierarchically structured data through semi-supervised finite mixture models adjusted for the hierarchical sampling procedure. In addition, the authors plan to study systematic algorithmic biases that can occur as measured by rates of misleading evidence for each of the subpopulations when the subpopulation structure is not accounted for. The authors illustrate this based on a simulation study using synthetic data and classical glass datasets. The semi-supervised model was more accurate over random train test split validation. The semi-supervised approach also performs better at assigning the same membership to technical replicates of the same fragments. The smaller variability supports that the semi-supervised approach gives a more reliable model. The forensic source identification problem involves providing the summary of the forensic evidence to a decisionmaker via the value of that evidence. This can be done via the forensic likelihood ratio, which in turn requires modeling of a relevant background population. Some of the commonly used methods involve the assumption of normality. However, there might exist a latent variable representing an underlying subpopulation structure.