Machine Learning (ML) methods for regression and classification, along with the bootstrap, have revolutionized the analysis of data through resampling. The resulting simulated data sets are used to select the best fitting models and to estimate prediction precision and accuracy. These two tasks are especially important in forensic analyses, which should reflect predictive data analysis because they will be applied to new cases, rather than summarized in descriptive data analysis. Naturally, we want to use the methods that are expected to be the most accurate and precise for new cases; however, as the great Zen master Berra noted, "It's tough to make predictions, especially about the future." Predictive methods must therefore incorporate the "Known Unknowns" (Rumsfeld, 2002), and avoid overfitting by analyzing multiple independent training and test samples, each of which ideally should be large. Bootstrap and Monte Carlo methods mimic sampling variability that would be present in future cases, and both methods are incorporated into numerous routines to estimate prediction accuracy. No routine is perfect due to bias and variance issues, and to the nature of the data and the analytical method. New routines are always being explored. This article reports on a project that demonstrates the consequences of supposed overfitting may be relatively small in classification, and predicting age using TA3 is far more accurate than using previous methods, even with their underestimated prediction error. (publisher abstract modified)
Downloads
Similar Publications
- Testing Gender-Differentiated Models of the Mechanisms Linking Polyvictimization and Youth Offending: Numbing and callousness versus dissociation and borderline traits
- Genetic Architecture of Skin and Eye Color in an African-European Admixed Population
- Shoe-Print Extraction from Latent Images Using CRF