Whole genome sequencing (WGS) has powerful potential to aid forensic investigations. The scale and complexity of WGS, however, present sizable barriers to its widespread adoption. In particular, some form of standardized pipeline is needed to provide certainty in and an accounting of the final genotypes. To that end, we introduce Tapir (two-step analysis pipeline for investigative reporting). Tapir is a reproducible end-to-end scientific workflow that ingests raw WGS data from Illumina platforms (i.e., BCL format) and produces a GEDmatch-compatible genotyping result. Tapir combines many extant (and some custom) tools for forensic assessments of WGS data, including two modern and powerful probabilistic genotyping algorithms: one that relies on genotype refinement/imputation (GLIMPSE) and another that uses maximum likelihood (BCFtools). Additionally, Tapir provides summary and inferential statistics relevant to the forensic audience (e.g., breadth and depth of coverage, estimation of mixture status). Tapir comes with both mamba- and conda-compatible YAML environment files and has been packaged in a virtual machine image that allows it to run on commodity x86-64 computers, including Microsoft Windows.
(Publisher abstract provided.)