This study aimed to characterize the impact of different sequencing workflows on the detection and interpretation of length heteroplasmy (LHP), a particularly complicated aspect of mtDNA analysis.
Advancements in sequencing technologies allow for rapid and efficient analysis of mitochondrial DNA (mtDNA) in forensic laboratories, which is particularly beneficial for specimens with limited nuclear DNA. Next generation sequencing (NGS) offers higher throughput and sensitivity over traditional Sanger-type sequencing (STS) as well as the ability to quantitatively analyze the data. Changes in sample preparation, sequencing method and analysis required for NGS may alter the mtDNA haplotypes compared to previously generated STS data. In the current study, whole mtDNA genome (mitogenome) data were generated for 16 high-quality samples using well-established Illumina and Ion methods, and the NGS data were compared to previously-generated STS mtDNA control region data. Although the mitogenome haplotypes were concordant, with the exception of length and low-level variants (<30 percent variant frequency), LHP in the hypervariable segment (HVS) polycytosine regions (C-tracts) differed across sequencing methods. Consistent with previous studies, LHP in HVS1 was observed in samples with nine or more consecutive cytosines (Cs) and eight Cs in the HVS2 region in the STS data. The Illumina data produced a similar pattern of LHP as the STS data, whereas the Ion data were noticeably different. More complex LHP (i.e. more length molecules) was observed in the Ion data, as length variation occurred in multiple homopolymer stretches within the targeted HVS regions. Further, the STS dominant or major molecule (MM) differed from the Ion MM in 11 (37 percent) of the 30 regions evaluated and six instances (20 percent) in Illumina data. This is of particular interest, as the MM is used by many forensic laboratories to report the HVS C-tract in the mtDNA haplotype. In general, the STS MMs were longer than the Illumina MMs, while the Ion MMs were the shortest. The higher rate of homopolymer indels in Ion data likely contributed to these differences. Supplemental analysis with alternative approaches demonstrated that the LHP pattern may also be altered by the bioinformatic tool and workflow used for data interpretation. The broader application of NGS in forensic laboratories will undoubtedly result in the use of varying sample preparation and sequencing methods. Based on these findings, minor LHP differences are expected across sequencing workflows, and it will be important that C-tract indels continue to be ignored for forensic queries and comparisons. (publisher abstract modified)
