Machine learning refers to the development of systems that can learn from data. A machine learning algorithm can, after exposure to an initial set of data, evaluate new, previously unseen examples and relate them to the initial "training" data. It is ideally suited for classification problems that involve implicit patterns, and it is most effective when used in conjunction with large amounts of data. Although machine learning has not previously been used in DNA mixture analysis, it is well-suited to such analysis because of two key problem characteristics. First, there is a large repository of human DNA mixture data in electronic format. Second, patterns in such data are often obscure and beyond the capability of manual analysis; however, they can be statistically evaluated by using one or more machine learning algorithms. The system was trained, tested, and validated using electronic data obtained from 1,405 non-simulated DNA mixture samples composed of 1-4 contributors and generated from a combination of 16 individuals. This report concludes that the proposed method for DNA mixture deconvolution, including determining the number of contributors, is a robust and reproducible method that was developed using an expansive AmpFISTR Identifiler PCR Amplification Kit. A description of materials and methods covers data acquisition and exportation, the locus-sample-specific threshold (LSST) calculation, data partitioning, feature scaling, feature selection, and machine learning algorithms. A more detailed discussion of the optimized system will be addressed in the Final Report. 10 figures, 8 tables, and 21 references
Downloads
No download available
Similar Publications
- Determining the Precision of High-Throughput Sequencing and Its Influence on Aptamer Selection
- A DNA Barcoding Strategy for Blow and Flesh Flies Encountered during Medicolegal Casework
- Large-scale Selection of Highly Informative Microhaplotypes for Ancestry Inference and Population Specific Informativeness