Anatomy of a Polymerase - How Structure Effects Function

Accurate genome replication is critical for the viability of an organism. The general concept for copying DNA was evident upon the elucidation of DNA’s double helical structure and the identification of base pair complementarity (1). Within a decade of these discoveries, an agent was purified from E. coli that catalyzed DNA strand duplication (2). This agent was termed a "polymerase". E. coli DNA Polymerase I, the first DNA polymerase discovered was not the primary replicative polymerase, but instead one involved in Okazaki fragment resolution and DNA repair. This foreshadowed future discoveries of many DNA polymerase families, each serving specific cellular requirements.

Polymerases play a key role in the life sciences for the same reason that they are critical in nature: they copy DNA. Additional polymerase applications include DNA labeling, sequencing and amplification. One specific amplication protocol, the polymerase chain reaction (PCR) is a widely used technique that employs thermophilic polymerases to exponentially amplify specific DNA segments (3).


PCR puts the same basic demands on a polymerase as a cell puts on its replicative polymerase. Essentially, it should be reliable, accurate and fast. Polymerase “accuracy” or “fidelity” refers to the propensity to incorporate the correct nucleotide as specified by the template strand. The standard PCR enzymes are not surprisingly quite accurate. Even Thermus aquaticus (Taq) DNA polymerase, a low fidelity PCR polymerase, only makes a mistake once in approximately 100,000 nucleotide insertions, depending on reaction conditions (4).

The basis of polymerase accuracy is an exciting area of investigation. Although most work on the mechanism of DNA polymerization has been performed with mesophilic polymerases, it is believed that the general trends can be extrapolated to related thermophilic enzymes.

Geometric Selection

At first thought, a reasonable method to discriminate between correct and incorrect nucleotides would be based on forming the correct hydrogen bond to the template base. However, hydrogen bonding between complimentary bases is not enough to explain the high fidelity displayed by DNA polymerases. Studies using a deoxynucleotide triphosphate with a non-standard difluorotoluene base (dF) demonstrated the importance of base pair size and shape, as opposed to hydrogen bonding during incorporation (5). The pyrimidine ring of difluorotoluene is isosteric with thymine but the oxygens are replaced by fluorine atoms and therefore dF cannot form effective hydrogen bonds in water. In synthetic oligonucleotides, dF was found to base pair equally poorly with all four of the standard bases. If hydrogen bonding was the most critical parameter dictating incorporation accuracy, then it would be expected that dF would be incorporated into a template poorly and with low fidelity. However, it was incorporated only 40-fold less efficiently than dT. This finding could be explained if the polymerase active site accommodates only proper base pairs of the correct size and shape. The “geometric selection” mechanism (6) is an elegant method of ensuring accurate incorporation is extremely effective, but there are also more active ways to ensure high fidelity replication.


A perhaps better-known method of increasing fidelity is for the polymerase to have 3´→5´ exonuclease activity, termed “proofreading”. Considering the complexity of DNA polymerization, Taq DNA polymerase is incredibly accurate, but proofreading enzymes can have even higher fidelity. This is accomplished by the polymerase “checking” whether the correct nucleotide has been inserted into the template. If a mismatch is detected the DNA is transferred from the polymerization domain to an N-terminal 3´→5´ exonuclease domain of the polymerase. The incorrectly incorporated nucleotide is excised and the DNA is moved back into the polymerization domain, permitting copying to resume (Figure 2).

Figure 2: Polymerases that have a 3´→5´ exonuclease activity are able to excise mismatched base pairs when an error is encountered, thereby increasing enzyme fidelity.

Bacteriophage T4 proved to be a useful experimental system for evaluating the importance of 3´→5´ exonuclease activity for accurate DNA replication (6). Mutations in T4 gene 43 were identified that either decreased or increased fidelity. By defining an exonuclease/polymerase (N/P) activity ratio for a particular mutant enzyme it was found that polymerases with low N/P ratios were more error prone than those with high N/P ratios. An explanation for this observation is that upon incorporation of a mismatched base it is more likely that the exonuclease will remove the nucleotide before the polymerase activity extends it in enzymes with higher N/P ratios. Interestingly, the proofreading effectiveness of a polymerase can show sequence dependence. For example, AT-rich sequences are more effectively proofread than GC regions. This is thought to be due to the lower stability of AT stretches that facilitates melting and therefore, proofreading activity.

The absence of 3´→5´ exonuclease activity may have ramifications other than fidelity in PCR. The lack of proofreading activity in Taq DNA Polymerase has been proposed to limit the amplicon size possible with this enzyme (7). As a generality, Taq performs best when amplifying DNA fragments < 2 kb, but works on fragments up to 3–4 kb. When kept to this amplicon size, Taq is a robust, easily optimized enzyme. However, above ~3 kb it quickly drops in effectiveness. During PCR, Taq DNA Polymerase will misincorporate nucleotides at a particular rate leading to mismatch formation. This is termed “error rate”. Taq, and polymerases in general, will stall at these mismatched bases and are more likely to dissociate before extending as compared to correctly base paired 3´ ends. Therefore, at a certain amplicon size and polymerase error rate enough mismatched 3´ ends may accumulate to effectively inhibit the PCR process. These mismatched 3´ ends are particularly problematic for Taq because it lacks the 3´→5´ exonuclease activity to remove them. By adding in a small amount of proofreading enzyme such as Deep VentR™ DNA Polymerase, amplification of fragments ≥ 20 kb can be achieved (Figure 3). Since the vast majority of the enzyme in the blend is Taq DNA Polymerase it is probably doing the bulk of the primer extension. The proofreading polymerase is most likely removing the inhibitory 3´ mismatches generated by Taq.

Figure 3: A unique blend of Taq and Deep VentR™ DNA Polymerase, LongAmp Taq DNA Polymerase enables amplification of larger PCR products with a higher fidelity than Taq DNA Polymerase alone. Amplicon sizes are indicated below the gel. Marker M is the NEB 1 kb DNA Ladder (NEB #N3232).


The importance of proofreading activity to PCR has been widely known for nearly two decades, but another property, processivity, has recently gained attention. “Processivity” is a term that refers to the number of nucleotides incorporated by a polymerase before it dissociates. Taq DNA polymerase adds approximately 50 nucleotides per binding event (8). Why does this matter? A distributive polymerase extends a populationof 1 kb templates in a noticeably different manner than a processive polymerase. The distributive polymerase binds to a template, adds a couple of nucleotides, and dissociates, leaving a population of templates that can be extended equally with time. The processive polymerase binds a template and extends to its end. Therefore, it is an all-or-nothing extension, where a fraction of the templates are fully copied and the remaining fraction are unextended. It would follow that given enough time the outcome of either a processive or distributive polymerase would be a population of copied templates. However, in certain circumstances it is possible that the processive polymerase has superior performance. The E. coli polymerase III α subunit, part of the main replicative polymerase, has a processivity of < 10 base pairs and a speed of < 20 nucleotides/second (nt/s). However, when the subunit associates with the other replisome subunits, particularly the sliding clamp, the effective processivity and replication speed increase to > 50 kb and 1,000 nt/s, respectively (9). The term “effective processivity” is used because there is data indicating the polymerase subunit can exchange in the replisome, but the replisome as a whole maintains fast, processive DNA replication (10). In order to take advantage of processivity in PCR, researchers have fused a DNA binding domain to an archaeal polymerase (11). This chimeric enzyme has a number of improved properties, but notably it is able to amplify DNA with shorter extension times, thus shortening overall thermocycling times. This fusion is the base of Phusion™ High-Fidelity DNA Polymerase and Phire™ Hot Start High-Fidelity DNA Polymerase, two polymerases available from NEB (Figure 4).

Figure 4: Phire Hot Start DNA Polymerase is constructed by fusing a DNA polymerase (orange) and a small dsDNA-binding protein (yellow). This technology increases the processivity of the polymerase and improves its overall performance.

Future Directions

Many properties affect the efficacy and utility of a PCR polymerase. Polymerase active site architecture and proofreading activity affect the accuracy of the final product. Polymerase blends and fusion to a DNA binding protein confer superior PCR performance for amplicon length and, in the case of the chimera, reaction speed. Other important advances in PCR, such as hot-start polymerases to increase reaction specificity, multiplex PCR (Figure 5) and qPCR have also revolutionized many aspects of the life sciences. As demonstrated by engineered blends and chimeras, properties of the polymerase itself can be modulated to improve PCR performance. In the future, it is likely that polymerase properties will increasingly be tailored to specific PCR applications, and as such, this is an important area of research at NEB.

Figure 5: The Multiplex PCR 5X Master Mix includes Taq DNA Polymerase and buffer that has been optimized for amplification of multiple targets. Its performance is illustrated in a 15-plex reaction with various amounts of human genomic DNA as template (shown below gel).


  1. Watson, J. D. & Crick, F. H. (1953) Nature, 171, 964–967
  2. Lehman, I. R. (2003) J. Biol. Chem. 278, 34733–34738
  3. Saiki, R. K., et al. (1988) Science, 239, 487–491
  4. Eckert, K. A. & Kunkel, T. A. (1990) Nucleic Acids Res. 18, 3739
  5. Liu, D., Moran, S., & Kool, E. T. (1997) Chem. Biol. 4, 919–926
  6. Goodman, M. F. & Fygenson, D. K. (1998) Genetics, 148, 1475–1482
  7. Barnes, W. M. (1994) Proc. Natl. Acad. Sci. USA, 91, 2216–2220
  8. Merkens, L. S., Bryan, S. K., & Moses, R. E. (1995) Biochim Biophys. Acta, 1264, 243-248
  9. Pomerantz, R. T. & O’Donnell, M. (2007) Trends Microbiol. 15, 156–164
  10. Lovett, S. T. (2007) Mol. Cell, 27, 523–526
  11. Wang, Y., et al. (2004) Nucleic Acids Res. 32, 1197–11207
From NEB Expressions Summer 2008, vol. 3.2
By Thomas C. Evans, Ph.D, New England Biolabs, Inc.