U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

Evaluation of supervised machine-learning methods for predicting appearance traits from DNA

NCJ Number
302809
Journal
Forensic Science International: Genetics Volume: 53 Dated: July 2021
Author(s)
Maria-Alexandra Katsara ; Wojciech Branicki; Susan Walsh; Manfred Kayser; Michael Nothnagel
Date Published
July 2021
Annotation

In order to identify a potential classifier that outperforms specific trait models, the current study systematically compared the widely used multinomial logistic regression (MLR) and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF), and artificial neural networks (ANN), which have shown good performance outside EVC prediction.

Abstract

The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. As examples, the current study used eye, hair, and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. The performances of each of the four methods were compared and assessed, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, all four classification methods showed similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on the findings, none of the ML methods applied in this study provided any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here. (publisher abstract modified)