Hidden Markov Models for DNA Sequencing

4 pages

This paper proposes Hidden Markov Models (HMMs) as an approach to the DNA basecalling problem. The authors model the state emission densities using Artificial Neural Networks, and provide a modified Baum-Welch re-estimation procedure to perform training. Moreover, the authors develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand-labeling. The results demonstrate the potential of these models and suggest further research. The authors also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. The authors conclude by suggesting further research directions.

Date Published: January 1, 2002