DNA Damage - the major cause of missing pieces from the DNA puzzle

DNA is nature’s most widely used long-term information storage system. The elegant simplicity of the double helix belies the complex pathways that have evolved to copy, modify, and maintain the integrity of the genome. In the last 50 years insights into the structure of DNA and the enzymes that act upon it have led to the development of powerful tools for genetic analysis and engineering, including cloning and subcloning with restriction enzymes, DNA sequencing and amplification using PCR. As DNA methodologies have matured they have been applied to more diverse problems such as deriving genetic information from degraded samples, including blood and tissue samples. Gathering sequence information from these samples is critical in forensic identification (1), in the Consortium for the Barcode of Life initiative (2), in deducing the evolutionary relationships of living and extinct species (3), and in associative studies of tissue biopsy collections (4), to name a few. Such studies are experimentally limited by the quality and quantity of the DNA extracted from biological samples. In addition, compounds that inhibit the amplification of the small amounts of extracted DNA often co-purify with the DNA sample, further complicating analysis. Damaged DNA has therefore become an important experimental issue in many areas of research.

There are many common types of DNA damage that impact accurate replication by DNA polymerases (5). Furthermore, the degree and spectrum of DNA damage depends on the sample source and the type of environment to which it was exposed. Some types of damage are ubiquitous and can potentially be present in all extracted DNA, while other types of damage are the result of exposure to a specific source (see Table 1). Under physiological conditions the most labile bond in DNA is the N-glycosyl bond that attaches the base to the deoxyribose backbone.

This is in contrast to RNA in which the phosphodiester bond in the backbone is the least stable under the same conditions. Hydrolysis of the N-glycosyl bond results in the loss of a base leaving an apurinic/apyrimidinic (AP) site that itself eventually decomposes into a nick. Because the reactive species is H2O, AP sites are expected in all stored DNA samples. This includes lyophilized samples because it is very difficult to remove the final shell of H2O molecules immediately adjacent to the DNA.

Under metabolically active conditions it is estimated that approximately 2,000-10,000 AP sites are formed in a single human cell genome each day (5). This rate will vary from sample to sample, especially in samples taken from a crime scene because the type of environmental exposure will vary.

The presence of AP sites in a DNA sample is problematic for two primary reasons. First, genetic information is lost because the AP site cannot form a base pair with an incoming nucleotide during DNA replication. Second, typical PCR polymerases stall at the AP site preventing further replication (6). If enough AP sites are present, amplification or sequencing reactions will simply fail. The breakdown of AP sites into nicks further compounds the problem as it eventually leads to the fragmentation of the DNA.

Another common type of DNA damage that occurs under physiological conditions is the hydrolytic deamination of cytosine to form uracil (5). Sequencing studies on DNA extracted from very old samples, termed ancient DNA, have determined that this is the major damage complicating data analysis (7,8). Cytosine deamination, like AP site formation, is caused by hydrolysis and is probably present in the DNA extracted from many sources. Interestingly, unlike depurination, the rate of cytosine deamination is slowed in double-stranded DNA as compared to single stranded DNA.

The effect of deaminated cytosine in the amplification or sequencing reaction is polymerase dependent. Some polymerases (e.g. Taq DNA Polymerase) are able to effectively extend past the deaminated cytosine (i.e. uracil), inserting an adenine residue opposite the uracil instead of a guanine. This generates a mutated daughter strand even though the polymerase was 100% accurate. Alternatively, common proof-reading polymerases, including archaeal polymerases (e.g. Vent, Pfu, 9°N DNA Polymerases), stall at deoxyuracil encountered in DNA templates (9). The active site of these polymerases contain a binding pocket that specifically recognizes deaminated cytosine (10). This prevents the damage from creating a permanent mutation in the daughter strand. Because deamination of cytosine may result in inhibition of PCR or mutagenic DNA products, this is a particularly important issue in methods where DNA sequence is crucial. In contrast, methods that rely on the amplicon length rather than the exact sequence (i.e. short tandem repeats used in human identification) are not impacted by the mutagenic effect of cytosine deamination.

A third and common type of DNA damage is oxidation. As in the case of hydrolytic damage, most DNA samples are susceptible to oxidation, as they are exposed to oxygen throughout storage. Many types of base modifications are created by oxidation, but the conversion of guanine to 8-oxo-guanine is one of the most common (5). 8-oxo-guanine can base pair with adenine and is therefore a mutagenic product. Such damage is prevalent in mitochondria and may be one of the factors in the aging process (11). In studies that quantify oxidative damage it has been shown that the DNA extraction process itself can introduce this modification and therefore must be carefully considered (12).

Other types of damage become prevalent only in certain circumstances. DNA-protein or DNA-DNA crosslinks are a specialized, but important, type of damage that blocks the genetic investigation of an enormous number of stored samples. In both the museum and medical research communities a large number of samples are either stored in formalin (formaldehyde) or exposed to formalin at some point. The formalin-induced cross-linking effectively preserves structural morphology, but it is extremely detrimental to subsequent DNA analysis because crosslinked bases stall polymerases and DNA-DNA crosslinks can inhibit denaturation. In addition, the pH of formalin solutions drop over time due to the formation of formic acid, increasing the rate of AP site formation and subsequent fragmentation (13).

Other well-studied lesions that occur only in certain instances are pyrimidine dimers (14). These form when DNA is exposed to UV light and are very effective at stalling DNA polymerases.

In conclusion, there is a wealth of DNA sequence information contained in degraded samples; however, extracting that information is sometimes difficult. Whether the major difficulty is the efficacy of DNA extraction, the presence of PCR inhibitors, or the extent of DNA damage has not been fully determined. It is most likely a case in which the most prevalent problem varies with the sample and techniques that address all three possibilities are needed. Studies determining what types of damage are present in degraded samples and new methodologies to overcome them will hopefully make previously hidden information accessible.

Table 1: Types of DNA Damage

Note: The extent of damage caused by exposure to different reagents can vary, and its importance will depend on how the DNA is being used.

Click here for a list of DNA repair enzymes suitable for a variety of damaged DNA offered by NEB. Many of these recombinant enzymes can be produced on a large scale and are available for customized solutions.


  1. Butler, J.M., (2006) J. Forensic Sci., 51, 253–265.
  2. Hajibabaei, M., et al. (2007) Trends Genet., in press.
  3. Noonan, J.P., et al. (2006) Science, 314, 1113–1118.
  4. Thompson, E.R. et al. (2005) Hum. Mutat., 26, 384–389.
  5. Lindahl, T., (1993) Nature, 362, 709–715.
  6. Sikorsky, J.A., et al. (2007) Biochem. Biophys. Res. Comm., 355, 431–437.
  7. Stiller, M., et al. (2006) Proc. Natl. Acad. Sci. USA, 103,13578–13584.
  8. Gilbert, M.T., (2007) Nucleic Acids Res., 35,1–10.
  9. Lasken, R.S., et al. (1996) J. Biol. Chem., 271,17692–17696.
  10. Fogg, M.J., et al. (2002) Nat. Struct. Biol., 9, 922–927.
  11. Weissman, L., et al. (2007) Neuroscience, in press.
  12. Hofer, T., et al. (2006) Biol. Chem., 387, 103–111.
  13. Kelly, K. (2006) Path to Effective Recovering of DNA from Formalin-Fixed Samples in Natural History Collection, (pp. 5–14). Washington, DC: The National Academies Press.
  14. Sinha, R.P., et al. (2002) Photochem. Photobiol. Sci., 1, 225–236.

From NEB expressions Spring 2007, vol 2.1
By Thomas C. Evans, Jr., New England Biolabs, Inc.