Sequencing Technologies and Bioinformatics: From data to workflow improvements
Posted on Friday, August 4, 2023
Topic: Career advice, What is Trending in Science
Bioinformatics and computational biology relate to the analysis, interpretation and application of the data collected through biological research. These specialties were born as the volume of biological data increased over time. Definitions of bioinformatics and computational biology overlap, with difference in emphasis on biological interpretation vs. data/technology, respectively. Bioinformatics is specifically involved in extracting, distilling, and understanding information contained in diverse scientific data, allowing research findings to be better understood, improving subsequent experiments, and furthering scientific goals. Bioinformatics can be applied to many varied subfields (molecular biology, structural biology, biochemistry, genomics, etc.), adapting to technologies and methodologies in use in each area. Successful applications improve the quality and quantity of data generation, increase workflow efficiency, and boost accuracy of detection methods. In this blog post, we explore bioinformatics applications to genetic sequencing and the ways proximity between molecular biologists and bioinformatic scientists accelerates progress, amplifying research and product development impacts.
Scientists and Bioinformaticians: Why proximity matters
Having bioinformaticians in close proximity to the bench scientists they collaborate with can have a profound impact on the effectiveness and applicability of bioinformatics findings. Sharing physical space fosters collaboration across the cyclical lifespan of research questions, from experimental design to data acquisition to interpretation and back. Ideally, bioinformaticians and bench scientists share an understanding of the biological questions at play, as well as approaches to experimental design, data generation, and the result interpretation processes.
NEB’s Brad Langhorst is a Development Group Leader whose team focuses mostly on NEBNext projects for sequencing applications. Brad says, “We have a bunch of computational folks who are working closely with the lab folks. They actually sit together with the lab people, so we don't have all the computational people in one place. I think it's important that everybody be in physical proximity. They work really closely together to design experiments, analyze results, and figure out what the next iteration of the experiments is, as we're trying to figure out how to make better RNA-Seq products or how we are going to evaluate the next pathogenic virus that's coming along.”
Bioinformatics tools for better diagnostics of rapidly evolving viral genomes
Genetic sequencing and bioinformatics were made for each other. A single sequencing run can produce megabases to terabases of data, all of which need to be ordered and analyzed to be useful. Computational biology tries to make use of that data to inform us about underlying genomics.
During the COVID-19 pandemic, NEB computational biologists were closely watching the constant output of SARS-CoV-2 genomes as the virus spread and mutated quickly – and they also worked closely with colleagues working to perfect lab methods being used to identify SARS-CoV-2 infections. They wondered how they could help. Langhorst observed “We've got qPCR assays and LAMP assays and sequencing methods that all depend on primers to sit on specific places of the SARS-CoV-2 genome. But SARS-CoV-2 was modifying itself really quickly over time, and we were worried whether the primer sites we were trying to use to detect the virus were there, in the viral sequence, or not. If the virus is changing and the primer can't bind, we won't know if a detection method is saying there's no virus there or we just can't see a virus there.”
Langhorst and his colleagues were surprised to see that no tool was available to confirm that primers used for detection were present in current strains of the SARS-CoV-2 virus. So, they set out to build one. They began by collecting publicly released SARS-CoV-2 genome sequences, compiling them, and developing visualization methods and notifications for users. The group released the Primer Monitor Tool, which enables diagnostic methods around the world to ensure that the primers they are using for detection can recognize variants of the virus as it continues to change and adapt in the future.
The basics of this analysis methodology aren’t just applicable to COVID. Langhorst explains, “We just designed something that would be able to take in sequences and compare whether those sequences are changing, and what kind of differences we are seeing. We want to extend it to encompass other viral sequences.”
The NEB Bioinformatics team plans to expand the utility of the Primer Monitor Tool, but their primary motivation is to inform the research, product development, and production efforts throughout the company. They collaborate on a multitude of projects with NEB research teams, formulating initial analysis methods, sharing that information with researchers worldwide, and welcoming feedback that helps them improve upon what they have built. Another project harnesses the power of unique molecular identifiers for high-throughput sequencing methodologies.
Unique Molecular Identifiers, Indexes, and De-multiplexing
As sequencers get bigger and produce even more data, multiple individual samples (bacterial, viral, or human) are often combined for efficient processes. But how do we tease apart an individual data set from the mélange? To do that, each individual sample must be labeled with a DNA barcode (also called an “index” sequence) that marks each sample as unique. More complex labeling schemes can be employed, utilizing more than one unique identifier, to improve confidence that there has been no mixing between samples.
Barcode sequences enable deconvolution of samples on one sequencing run, but many scientists want to go further, accounting for the multiple copies of molecules created during PCR amplification. Sequences with random bases, molecular indices or UMIs, have been used since the 1990s1 to link multiple observations as coming from the same original molecule. Langhorst explains, “If we have a small sample we want to amplify that signal first. We have to make a bunch of copies, but the copies aren't perfect. Sometimes if one gets started quickly, you might get a lot of copies of that one and if another one gets started more slowly, just by chance, you get fewer. But we really want to know the original abundance of the two things we're trying to evaluate. A UMI is a nice little trick allowing us to stick a random piece of sequence onto each of those two things we want to study. If we see many copies of that same random sequence, we have a good chance of guessing correctly that they're actually from the same original molecule.” This enables count correction, and ultimately more accurate data interpretation.
Is bioinformatics a good career path?
Bioinformatics and computational biology could be a great career choice if you have strong data analysis skills and scientific curiosity. Langhorst points out, “Even high school and middle school students are learning all kinds of data skills and they don't think of themselves as bioinformatics people. They have a toolkit that they can use to answer questions or solve problems.” Growth in data analysis skills earlier in education means that students will be able to bring those tools to all their future studies, deciding where to focus based on their interest in a field without being intimidated by the challenges of dealing with lots of data.
1Wagner, A., Blackstone, N., Cartwright, P., Dick, M., Misof, B., Snow, P., Wagner, G.P., Bartels, J., Murtha, M. and Pendleton, J. “Surveys of Gene Families Using Polymerase Chain Reaction: PCR Selection and PCR Drift.” Systematic Biology 43, 2 : 250–261.
Don’t miss out on our latest NEBinspired blog releases!
- Sign up to receive our e-newsletter
- Download your favorite feed reader and subscribe to our RSS feed
Be a part of NEBinspired! Submit your idea to have it featured in our blog.