Deconvolving Sequence Variation in Mixed DNA Populations

by A. Wildenberg, S. Skiena and P. Sumazin, 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB02), Washington DC, April 2002.

Download

Introduction

The need for DNA sequencing did not end with the successful public and private projects to sequence the human genome. Indeed, attention is shifting from de novo sequencing of new organisms to analyzing sequence variation for research and diagnostic purposes.

Contemporary electrophorisis-based sequencing machines produce curves registering the amount of each of the four nucleotide bases as a function of sequence position. For homogeneous DNA samples, the largest peaks at each position define the underlying sequence. However, more careful analysis of sequence trace data holds promise for determining the presence and frequency of mutations in inhomogeneous samples.

In this paper, we look at the problem of using sequence trace data to identify sequence variants in mixed DNA populations. Our work is motivated by a new line of capillary electrophorisis sequencing machines being developed by BioPhotonics Corporation. By using advanced single-photon detectors and other technologies, BioPhotonics has the capability to not only detect but accurately determine the relative frequency of each base at each position to within 10%, and expects to reduce this error rate to 1% in the near future.

This motivates a variety of questions concerning how accurately we can sequence mixed populations from a single sample using relative frequency information. Possible applications of this technology include: