Behind the paper: Examining Sources of Error in PCR by Single-Molecule Sequencing

Jennifer Ong explains her recent NEB publication demonstrating how single-molecule PacBio sequencing was used to better understand sources of error introduced by PCR.


Hello, my name is Jennifer Ong. I'm a senior scientist in the Research Department at NEB, in the DNA Enzymes Division. I'll be talking about our recent paper called Examining Sources of Error in PCR by Single-Molecule Sequencing.

We wanted to study the accuracy of DNA amplification. PCR is upstream of many NGS sample prep workflows, and errors that arise during PCR can show up in your sequencing data. And so, we wanted to get a better understanding of what types of errors occur.

We use PacBio single-molecule realtime sequencing to sequence individual molecules that are produced after amplification. PacBio sequencing is very accurate because each molecule is read multiple times, allowing true replication errors to be distinguished from sequencing errors. We tested a variety of different DNA polymerases in PCR and generated SMRTbell libraries from those PCR products. After sequencing, we could compare the sequencing reads to the reference sequence and catalog the different types of errors that we saw.

One of the challenges that we faced was to try to derive as much accuracy as possible from our PacBio sequencing data. And so, we developed custom software tools to try to achieve this, and we've made these publicly available for the research community. Throughout this process, we learned a lot about the accuracy of PacBio sequencing. For instance, we're able to confidently call base substitution errors. But because high fidelity polymerases make very few insertion and deletion mistakes and PacBio sequencing has a higher background rate for these types of mistakes, it was much more challenging for us to distinguish replication indels from sequencing indels.

We measured the base substitution error rate for several engineered and wild type DNA polymerases. One of the most accurate polymerases that we measured was Q5 DNA polymerase. Q5 makes a base substitution error about once in every 1.9 million bases. Compare this to Taq polymerase, which makes an error about once every 6,000 bases, Q5 is about 280-fold more accurate.

We wanted to do an experiment to determine the amount of DNA damage that occurs during thermocycling and whether or not this can introduce mutations into PCR products. One of the experiments that we did was to take plasmid libraries and heat and cool them as you would during PCR. After sequencing, we found that heating and cooling DNA introduces an elevated level of mutations. Almost all of these mutations were C to T substitutions, and we utilized an enzyme repair cocktail, called PreCR, which specifically recognizes and repairs DNA damage, to determine that all of these C to T mutations that we observed were actually the result of cytosine deamination that occurs during thermocycling.

Template switching is another error pathway that has been observed in PCR. We're able to observe template-switching events during amplification of lacZ, and we also set up a specific assay to detect template-switching events.

If you'd like to learn more about our research, we've made our paper open-access, and it can be found at the link below.

Loading Spinner