alt

Improvements in Library Quality

While sufficient yield of a library is required for successful sequencing, quantity alone is not enough. The quality of a library is also critical, regardless of the input amount or GC content of the sample DNA. A high quality library will have uniform representation of the original sample, as well as even coverage across the GC spectrum.

Uniform GC Coverage
Libraries from varying input amounts of three microbial genomic DNAs with low, medium and high GC content (H. influenza, E. coli and H. palustris) were prepared using the NEBNext® Ultra™ II DNA Library Prep Kit. In all cases, uniform coverage was obtained, regardless of GC content and input amount (A). GC coverage of libraries prepared using other commercially available kits was also analyzed using the same trio of genomic DNAs. Again, NEBNext Ultra II provided good GC coverage (B).

  1. NEBNext Ultra II DNA Library Prep Protocol

    This video walks you through DNA Library Preparation using the NEBNext Ultra II DNA Library Prep Kit. The video also includes tips for optimization as well as safe stopping points.

  2. 12 Quick Tips for NGS Library Preparation

    Using beads to clean up your DNA prior to NGS library prep can be quick and easy, if you follow these few, simple tips.

NEBNext Ultra II provides uniform GC coverage for microbial genomic DNA over a broad range of GC composition and input amounts
Libraries were made using 500 pg, 1 ng and 100 ng of the genomic DNAs shown and the Ultra II DNA Library Prep Kit (A) or using 100 ng of the genomic DNAs and the library prep kits shown (B), and sequenced on an Illumina MiSeq®. Reads were mapped using Bowtie 2.2.4 and GC coverage information was calculated using Picard’s CollectGCBiasMetrics (v1.117). Expected normalized coverage of 1.0 is indicated by the horizontal grey line, the number of 100 bp regions at each GC% is indicated by the vertical grey bars, and the colored lines represent the normalized coverage for each library.
When amplification is required to obtain sufficient library yields, it is important to ensure that no bias is introduced, and that representation of GC-rich and AT-rich regions is not skewed in the final library. Comparison with libraries produced without amplification (“PCR-free”) is a useful measure (1). In this example, a library was prepared from human genomic DNA using NEBNext Ultra II as well as other commercially available kits, and coverage was compared to a PCR-free library. Results demonstrate that the Ultra II library coverage is most similar to the PCR-free library, and also covers the range of GC content (see below).

NEBNext Ultra II provides GC coverage of human DNA comparable to PCR-free libraries

Libraries were made using 100 ng of Human NA19240 genomic DNA and the kits shown, following manufacturers’ recommendations. Libraries were sequenced on an Illumina NextSeq® 500 (A) or MiSeq (B). Reads were mapped to GRCh37 using Bowtie 2.2.4 and GC coverage information was calculated using Picard’s CollectGCBiasMetrics (v1.117). Expected normalized coverage of 1.0 is indicated by the horizontal grey line, the number of 100 bp regions at each GC% is indicated by the vertical grey bars, and the colored lines represent the normalized coverage for each library. Ultra II provides GC coverage most similar to PCR-free libraries and enables coverage across the range of GC content.
Ultra II libraries provide the highest quality sequencing data

DNA INPUT LIBRARY KIT TOTAL READS % MAPPED % DUPLICATION % CHIMERAS
 100 ng Ultra II 419,093,838 96 1.87 0.48
Kapa Hyper 419,097,926 96 2.00 0.60
TruSeq Nano 419,086,546 97 1.91 0.53
1 ng Ultra II 226,860,968 96 3.96 0.44
Kapa Hyper 226,857,578 96 11.40 0.53
TruSeq Nano 226,857,754 97 34.80 0.41

Libraries were prepared from Human NA19240 genomic DNA using the input amounts and library prep kits shown, following manufacturers’ recommendations. Libraries were sequenced on the Illumina NextSeq 500. Reads were mapped to the GRCh37 reference using Bowtie 2.2.4. This data illustrates that the NEBNext Ultra II DNA Library Prep Kit enables high quality sequence data, even with very low input amounts.

% Mapped: The percentage of reads mapped to Human GRCh37 reference.
% Duplication: The percentage of mapped sequence that is marked as duplicate.
% Chimeras: The percentage of reads that map outside of a maximum insert size or that have the two ends mapping to different chromosomes.


Coverage of Known Low-Coverage Regions of the Human Genome
Regions of the human genome typically covered at a relatively low level have been identified (2), and the majority of these regions have high GC content. Library preparation can contribute to low and uneven sequence coverage, or even drop-outs, of these challenging regions. Depending on the polymerase used, PCR amplification of a library can result in under-representation of GC-rich regions, and libraries constructed by PCR-free workflows can provide more uniform coverage than amplified libraries (1). Improvements in efficiency and reduction in bias at each step in library preparation, including improved uniformity of library amplification over the full range of GC content improves the evenness of sequence coverage of these regions. Here we show a comparison of sequencing data from human genomic DNA libraries prepared with NEBNext Ultra II and other commercially available kits. Ultra II provided the highest and most uniform coverage of difficult sequence regions, as well as the coverage most similar to the PCR-free library (see below).

NEBNext Ultra II provides the highest and most uniform coverage of difficult sequence regions
alt

A: Indexed libraries were prepared from 100 ng of Human NA19240 genomic DNA using a PCR-free workflow or the library prep kits shown, following manufacturers’ recommendations. The PCR-free library was prepared using NEBNext Ultra II. Libraries were sequenced on the Illumina NextSeq 500. 420 million reads were randomly extracted from each dataset, to produce an average coverage of 10X. Reads were mapped to the GRCh37 reference genome using Bowtie 2.2.4. Reads on each region were counted using bedtools v2.19.1. The number of reads overlapping distinct difficult, low-coverage regions of the human genome (2) are shown for each library. Ultra II provides the highest and most uniform coverage of these difficult regions, and provides the coverage closest to that obtained with a PCR-free protocol.

B: From the 420 million 75 bp reads randomly extracted from each dataset, 10X coverage was expected. The percentage of difficult regions covered at ≥ 10X is shown for each library prep kit and for the PCR-free workflow. Ultra II provides the highest percentage of reads at ≥ 10X coverage and also provides the coverage closest to that obtained with a PCR-free protocol.
Sequence Coverage
As described above, an ideal library will represent completely and proportionally the sequence of the input DNA. When library preparation is inefficient or when input amounts for a library are very low, there is a risk that the resulting library will lack this diversity, and that some sequences will be over- or under-represented. Comparison of the level of sequence coverage, in 10 kb intervals, achieved with libraries from different input amounts is a useful measure to determine the effect of input amounts on coverage. The increased efficiency of each step in the NEBNext Ultra II library workflow improves the library diversity. Here we show comparisons of libraries prepared for 100 ng, 1 ng and 500 pg human genomic DNA prepared using NEBNext Ultra II. The results demonstrate consistently even coverage for the range of input amounts indicated (see below).

Read depth correlation shows consistently high coverage for 500 pg–100 ng input amounts

alt

Libraries were prepared with 100 ng, 1 ng and 500 pg of human NA19240 genomic DNA and sequenced on the Illumina NextSeq 500. Each library was downsampled (sambamba view -s) to include 423 M reads and mapped to GRCh37 using Bowtie 2.2.4. Coverage of each 10 kb region of GRCh37 (as determined by bedtools coverage) was compared between low (500 pg and 1 μg) and 100 ng input. Most regions are covered by ~1,000 reads, as expected. Low and high coverage regions are well correlated.

References:
  1. Kozarewa, I. et al. (2009). Amplification-free Illumina sequencing – library preparation facilitates improved mapping and assembly of (G+C) – biased genomes. Nat. Methods 6:291–295.
  2. Aird, D. et al. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biology 12(2), R18.