Megabase Mapping

Guidelines for Choosing Restriction Endonucleases

Due to the non-random arrangement of base pairs in a genome, certain restriction enzyme recognition sequences may be substantially over or underrepresented. The base composition and/or sequenced DNA from a genome can be used to predict which recognition sequences will be rare. Summarized in this table are predicted average fragment sizes generated by commonly used restriction enzymes in various genomes. Basic guidelines for predicting the frequency of cleavage in other genomes of interest are presented below.

Bacterial Genomes:

CCG and CGG are the rarest trinucleotides in most A + T rich bacterial genomes. Endonuclease recognition sequences that contain these trinucleotides will be correspondingly rare. Similarly, CTAG is the rarest tetranucleotide in most G + C rich bacterial genomes. Endonuclease recognition sequences that contain CTAG will be correspondingly rare. Suitable endonucleases for three categories of bacterial genomic G + C content are presented below.

Yeast Genomes:

The Saccharomyces cerevisiae genome is very A + T rich (38% G + C)(3), so G + C rich restriction endonuclease recognition sequences are rare. Among these G + C rich recognition sequences, those that do not occur in dispersed repeats, such as the Ty elements or tRNAs, are particularly rare (2,4).

Mammalian Genomes:

The nuclear genomes of mammals are all approximately 41% G + C and the dinucleotide CG is five-fold more rare than expected from G + C content (3). Restriction endonuclease recognition sequences that contain CG are very rare in mammalian genomes (2,5,6,7). However, most CG sequences are methylated in mammals and almost all the enzymes with CG in their recognition sequence cannot cleave if CG is methylated (8). Nevertheless, certain CG sequences in the genome of a particular cell type are either completely methylated or completely unmethylated. This differential methylation results in 'complete' digests at sites that are unmethylated in the cell type. Given these facts, one can select endonucleases that give discrete cleavage patterns despite the fact that they are mCG sensitive. The average fragment sizes that result are quite large. The genome is divided into large A + T rich regions with very few CG dinucleotides and "islands" of a few hundred or thousand base pairs that are about 50% G + C with CG occurring at almost expected frequencies (2,5,6,7). The islands are often located 5´ to genes (5,6,7). There is reason to believe that the unmethylated CG sequences are most often found in the G + C rich islands.

Other Genomes:

Using the G + C content, nearest neighbor data (dinucleotide frequencies) (3) and a few thousand base pairs of nucleotide sequence data, it is often possible to predict which restriction endonuclease recognition sequences will occur least frequently in the genome of interest. For instance, both Drosophila and Caenorhabditis are A + T rich (~40% G + C), and the most rare dinucleotide in both species is CG. However, CG is not as rare in these species as it is in mammals, so recognition sequences that contain CG are not as rare. Furthermore, these genomes are not methylated at CG, so all recognition sequences can be expected to be cleaved to completion. Thus, very similar endonucleases to those used with mammalian DNA are suitable for these species (4), but the fragment sizes produced are somewhat less than half the size of those produced from mammalian genomic DNA.


  1. McClelland, M. et al (1987) Nucl. Acids Res. 15, 5985-6005. PMID: 2819819
  2. McClelland, M. and Nelson, M. (1987) Gene Amplification and Analysis 5, 257-282. PMID: 2851534
  3. Normore, W. M., Shapiro, H. S., and Setlow, P., (1976) CRC Handbook of Biochemistry and Molecular Biology. (Ed. G.D. Fastman), CRC Press.
  4. McClelland, M. (unpublished results)
  5. McClelland, M. and Ivarie, R., (1982) Nucl. Acids Res. 10, 7865-7877.PMID: 7155899
  6. Brown, W.R and Bird, A.P., (1986) Nature 324, 477-481. PMID: 3016554
  7. Lindsay, S. and Bird, A. P. (1987) Nature 327, 336-338. PMID: 2438557
  8. McClelland, M. et al. (1994) Nucl. Acids Res. 22, 3640-3659. PMID: 7937074
  9. Suwanto, A. and Kaplan, S. (1989) J. Bacteriol. 171, 5850-5859. PMID: 2808300

Restriction Endonuclease Cleavage of Chromosomal DNA

Agarose-embedded E. coli chromosomal DNA digested by NotI (d), SfiI (e), PmeI (f), PacI (g), and AscI (h). Lanes (a) and (i) are Low Range PFG Markers. Lanes (b) and (c) are Mid Range PFG Markers I and II, respectively. Electrophoresed in a 1% agarose gel at 170 V, 15°C, for 20 hours. Switch times ramped from 5-20 seconds.