Whole genome sequencing (WGS) has powerful potential to aid forensic investigations. The scale and complexity of WGS, however, present sizable barriers to its widespread adoption. In particular, some form of standardized pipeline is needed to provide certainty in and an accounting of the final genotypes. To that end, we introduce Tapir (two-step analysis pipeline for investigative reporting). Tapir is a reproducible end-to-end scientific workflow that ingests raw WGS data from Illumina platforms (i.e., BCL format) and produces a GEDmatch-compatible genotyping result. Tapir combines many extant (and some custom) tools for forensic assessments of WGS data, including two modern and powerful probabilistic genotyping algorithms: one that relies on genotype refinement/imputation (GLIMPSE) and another that uses maximum likelihood (BCFtools). Additionally, Tapir provides summary and inferential statistics relevant to the forensic audience (e.g., breadth and depth of coverage, estimation of mixture status). Tapir comes with both mamba- and conda-compatible YAML environment files and has been packaged in a virtual machine image that allows it to run on commodity x86-64 computers, including Microsoft Windows.
(Publisher abstract provided.)
Downloads
Similar Publications
- Raman spectroscopic signature of vaginal fluid and its potential application in forensic body fluid identification
- Is the Gender Gap in Overdose Deaths (Still) Decreasing? An Examination of Opioid Deaths in Delaware, 2013–2017
- Spectroscopic Differentiation and Regioisomeric Indole Aldehydes: Synthetic Cannabinoids Precursors