NEB Podcast #51 -
Interview with Brad Langhorst: How bioinformatics informs sequencing

< Return to NEB Podcast Home

 

Transcript

Interviewers: Lydia Morrison, Marketing Communications Writer & Podcast Host, New England Biolabs, Inc.
Interviewees: Brad Langhorst, Development Group Leader, New England Biolabs, Inc.


Lydia Morrison:
Thanks for joining us for this episode of the Lessons from Lab and Life Podcast, brought to you by New England Biolabs. I'm your host, Lydia Morrison, and I hope this podcast offers you some new perspective. Today I'm joined by New England Biolabs Development Group Leader Brad Langhorst. Brad's with me today to talk about how bioinformatics informs sequencing and its analysis as well as how it can help design successful primers. Brad, thanks so much for joining me today.

Brad Langhorst:
Well, I'm happy to be here. This is, it's fun.

Lydia Morrison:
Tell us about your role at NEB and your team, I'd love to hear about your team and how you support research and development for the NEBNext portfolio (for next generation sequencing).

Brad Langhorst:
Yeah, so that's one of my favorite things about working at NEB is I get to work on lot of stuff and including NEBNext, and so my team, most of the people are working on NEBNext projects. So there's a lot of data there and people are moving pretty quickly in different areas of sequencing. So we have a bunch of computational folks who are working closely with the lab folks. They actually sit together with the lab people, so we don't have all the computational people in one place. I think it's important that everybody be in sort of physical proximity and they work really closely together to design experiments, analyze results, figure out what the next iteration of the experiments are as we're trying to figure out how to make better RNA-Seq products or how are we going to evaluate the next pathogenic virus that's coming along?
Can we really get the sequencing in place? How do we design that and test it and design and test? So that's really the pattern is be really close to it. There are a few people on my team who work in other places of NEB. So Jake (Miller) and myself, we both work on e-base, which is a system for keeping track of all the quality information from NEB. So all the stuff that goes through our QC department. Really, if there's anything science and data related, I'm interested and we can kind of work on projects with people. Super fun.

Lydia Morrison:
That's great. So one of the things you and the team did with all that data was develop the primer monitor tool during the COVID pandemic. So can you share a little bit about that and how you envision it growing or changing in the future?

Brad Langhorst:
Sure. Yeah. So during COVID we were building all kinds of different ways to try to help between the proteins that we're using to make mRNA vaccines or the test kits and methods we're using to amplify SARS-CoV-2, we need to understand the details of exactly how that's working. So we've got qPCR assays and LAMP assays and sequencing methods that all depend on primers to sit on in specific places on the SARS-CoV-2 genome. But it's moving really fast, that this is not a human strain, so it's kind of modifying itself really quickly over time, at least certainly in the early pandemic, it seemed like it was moving really fast and we were worried that all these primaries we were trying to use to detect whether the virus is there or not. If the virus is changing and the primer can't bind, we won't know if it's saying there's no virus there or we just can't see that there's a virus there.
So we really needed to sort of stay up on that and I was surprised to find that there was not really a resource in the world where I could just say, okay, here are the primers that we're using. Tell us if something breaks. So I decided to just build it. So Matt (Angel) and I, one of my teammates, he built the backend pieces, so we had to go fetch all the new viral sequences, so other people were sequencing viruses in Southeast Asia and Japan and in Africa and Europe and California, wherever, and submitting all those viral sequences to central databases. So every day we just pull down the latest set of viral sequences, put them in our database and evaluate where they're different, how are they changing so we can understand mostly we want to know where the primers are sitting, but other people are interested in is the virus changing?
Is it getting more pathogenic, is it going to get around our immune response? All those questions became interesting for people as well. So we just kind of stayed focused on that narrow piece of are our primers going to keep working? I think that helped us not get overwhelmed by the difficult problem. We just said like, let's solve that problem. So we collect that information and then we build some visualizations and notification scheme to let people know. They can say, okay, I'm using this primer in this location. If we see a new variant arrive there that it starts to grow or maybe it seeds in a particular region and then increases in frequency. Like fairly recently we saw one in Singapore where there's this extra variant that showed up. There was already one there on the probe that most people are using to detect SARS-CoV-2, and we were worried that this might be a problem.
So we were watching it in Singapore and have starting to increase and we started seeing at other places and said, okay, we need to do some experimental testing here, and colleagues Guoping Ren and Greg Patton in the qPCR group use that information from primer monitor to evaluate, is this really going to cause a huge problem? Fortunately, no, that extra mutation that was there, we found it and saw it early and enough time for us to do an experiment to evaluate whether we needed to modify the assay. And turns out. They still get good detection even though there's two variants that are underneath that probe in that qPCR set. So we just kind of saw a problem and decided we needed to fix it.

Lydia Morrison:
Really interesting and so important, I think that we be able to monitor whether our detection methods are actually going to be able to detect the new variants. Any plans to grow the primer monitor tool or plans for future changes to it?

Brad Langhorst:
Absolutely. We've had lots of requests from different groups at CDC or and other places in the world who are interested in viral sequence X. So antivirus or Zika or Ebola or whatever it is, people want to know, could we monitor this? Monkey Pox was a hot one for a while there too. And so we didn't really design it for, specifically for SARS-CoV-2. We just designed something that would be able to take in sequences and compare whether those sequences are changing and what kind of differences are we seeing. So yes, we want to extend it to do those other viral sequences. And I have a student now who's a great student working with me on a different project, but he's kind of in his spare time, he got interested in this project and he's been extending things to make SARS-CoV-2 work a little better and be able to do a better job at looking at the really complicated primer schemes we use for sequencing instead of just the detection ones.
Spectrums are pretty simple and that's what we really built it for. There's kind of an add-on that does the big sequencing schemes where we have hundreds of primers instead of just three or four or six or whatever it is for a qPCR or a LAMP assay. So those ones it wasn't really designed for is we're extending it for that. Once we've done that, I think we could think about generalizing it for other organisms, assuming we can find the data. We really need to go find the sequences that people are using for those, and many of these viruses are just not sequenced in the same way we did SARS-CoV-2. I think flu is a good one for us just to start with because there's enough sequencing for flu to populate the data.

Lydia Morrison:
I was going to ask about flu. Yeah, that would be a good one. I'm sure there'd be lots of people sort of interested in increasing detection methods around influenza, so.

Brad Langhorst:
Yeah, it's been fun learning about new viruses. I know a little bit about viruses from just academic training, but didn't really know very much about viruses and how they work and what their packaging schemes are and all the places where they're conserved and not conserved. Just really interesting biology to be learned along the path. So I've been having some kind of fun to look at this from the biology perspective, even if you ignore the serious aspect of it or the impact on people's lives from the biology side of it, it's definitely some interesting things happening there.

Lydia Morrison:
I wanted to switch gears a little bit and talk about indices, UMIs and de-multiplexing, which seem to cause I think a lot of confusion for people who are new to next generation sequencing. Can you share a bit about how you envision it growing or changing in the future?

Brad Langhorst:
Sure. As the sequencers get really big and they can produce tons and tons of data, we want to be able to study more than one individual at a time, more than one virus, more than one bacteria, more than one person, and try to understand what sequences that we see are coming through. So in order to do that, we need to combine samples together to run them together on the same sort of execution of the instrument. So if we want to look on our newest instruments, maybe we want to look at 50 people all at once to screen them for a cancer particular, maybe there's got cancer cells from 50 people. We want to understand is there some additional disease there or what's the mutations, which is the right drug to use? Somebody might want to be able to run that in a more efficient way.
So to do that, we need to label every sample with a sequence that we know. We can look up that sequence and say, oh, well that sample belongs to Lydia, or this one belongs to Brad so we know it can be convoluted it and we don't mix up a little bit of Lydia's sample with my sample. That would be really bad. Right. We really don't want to do that.

Lydia Morrison:
Seems important.

Brad Langhorst:
Yeah. So super important for these clinical situations. So we've got some complex schemes to match up, to label things twice basically, and look at both labels and say, oh, we want to be sure it's Lydia's, that both of the things that we put in have to match. If it's Brad's, they both have to match. If there is a conflict, then we just throw it away, say, wait a minute, we can't really tell what this is. So we put that in its own bucket. Hopefully that bucket stays small. We want to measure that, and that's part of the performance we're trying to look at and design into these sequences. So that's kind of the purpose of these barcodes in UMIs. The UMIs are another layer to that. We can look at some of the technical effects if we add amplification, if we have a small sample, let's say it's a blood sample and we want to look at just this circulating DNA that's in blood, very little tiny amount of it. We want to evaluate how much is there or what the properties of it might be.
We want to amplify that signal first. So we have to make a bunch of copies, but the copies aren't perfect. Sometimes if one gets started quickly, you might get a lot of copies of that one and another one gets started more slowly just by chance you get way fewer. But we really want to know what's the original abundance of the two things that we're trying to evaluate. So a UMI is a nice little trick where we can stick on a random piece of sequence onto each of those two things we want to study, and if we see many copies of that same random sequence, we have a good chance of guessing that they're actually from the same original molecule. So we can kind of collapse it down and correct our counts to get a better accurate picture of how much is A and how much is B in a sample.
So all that stuff requires a lot of detail. We need to design those really well and we need to evaluate them over time. So we've been building a lot of tooling to help both our customers understand what we have and how those can be applied, but also to understand ourselves every batch, every lot of new material. We're evaluating those things in big databases, pulling that information together to understand what they're doing. So I think one, as customers are getting more and more sophisticated, people want to do these more complicated bar coding and UMI schemes, they need more sort of examples of how this could be done.
So that's one of the things my team is going to be working over this next year, is building tools to help people get a starting point where they can apply it in other labs. And it's great to be able to share that kind of information. So we kind of figure it out and then build a starting point analysis method or a starting point sort of way to prove that things are working correctly and then we can hand that out to the world and let people make it better, give us feedback, and hopefully give people a good starting point to be able to use these products to the best.

Lydia Morrison:
Yeah. Those sound like really powerful tools and being able to evaluate the success and the progress of an experiment or a sequencing event. What do you see as the evolving role of bioinformatics as more data is generated, which will then need to be analyzed and managed much like the data from all the labels,

Brad Langhorst:
Absolutely.

Lydia Morrison:
The UMI and such?

Brad Langhorst:
It just gets bigger and bigger, and one of the things I'm really encouraged by is the young people joining our team on the NEBNext team and other places, everybody seems to have really good data skills. Even high school and middle school students are learning all kinds of data skills and they don't think of themselves as bioinformatics people. They have a toolkit that they can use to answer questions or solve problems. That was much more unusual when I came up as a scientist working in biophysics, I was interested in proteins and how those work, and I needed some data tools to be able to understand the experiments that I was looking at. So I think that same motivation is there for lots of other people, lots of other scientists and maybe who haven't specialized specifically in the data things, but they have enough tools to be able to get where they're going.
So I think there's, we'll probably continue to have specialists, people who work primarily with data and we'll have a bunch of those. But I think also we're seeing most of the scientists who are primarily working at the bench designing experiments, laying out experiments, those folks are also doing a lot of data analysis and trying to understand. I think things go a little quicker when it's in one person's head instead of going back and forth between two people's heads. You can kind of put together the information a little more quickly and move forward. So it's great to see the people kind of working together, but also gaining skills and doing their own bioinformatics as they go. So we want to build tools to make that easier and easier for people as well.

Lydia Morrison:
Yeah, I love that. I mean, bioinformatics is definitely a specialized area of focus, although as you mentioned, I think it is becoming more popular, the ability to work with those large data sets. How did your previous roles prepare you to become a bioinformatician?

Brad Langhorst:
I guess it's one of those terms. It's sort of a loose thing. It means a lot of stuff to different people. For me, I was interested in the biology. That's where I came from. So I think there are some bioinformatics folks or computational biologists, lots of different words for this. Basically the same kind of people who want to use a lot of data to try to understand what's happening in the science. And from my perspective, it was always biological focus. So I was interested in proteins and how they work, and I heard about the Human Genome Project and I really wanted to work on that a little bit. I didn't want to miss that. So before I went to grad school, I went and worked a little bit on the Human Genome Project to try to understand, we thought we were going to figure out all human disease really quickly and this was going to be great, and look at this work.
We're getting all this information, it's going to be so exciting. So I went to go work on that project, didn't want to miss it. I thought it would be over by the time I had finished grad school, so I didn't want to miss out on that. So that was really fun to sort of start working with that data as it was kind of being generated. Got me excited about DNA, but also kind of gave me a reality check, that this is way more complicated than we think it is when the simple view of, oh, we've got a GCAT, we can understand, know exactly what's going on. We just have to count up a lot of them. It'll be fine. Actually connecting that to does this person have diabetes or is this person going to have schizophrenia or what, like these complex traits turned out to be a little more complicated and took a long time to figure out.
So I moved back away from that. Those data skills definitely helped me, and I moved back to doing protein work where I still needed a lot of data skills, different kinds of data skills to be able to answer the questions that I was interested in, which proteins are sticking together. But I think that diverse experience, working with DNA and trying to understand disease and then giving up on that and moving back to proteins and trying to say, okay, well proteins are closer to what's actually happening in biology. Let me get a little closer to that, the different set of tools that I built for those different scientific challenges, maybe much more adaptable and able to sort of swing into the early days of NEBNext when we had very initial questions about, is this library actually representing the sequencing? What we're trying to do, is it representing the whole thing in this organism or are we missing pieces?
So that kind of question kind of drew me in and it continues to be exciting because we are always changing stuff and looking at more detailed aspects of biology. Whether it's immune biology or it's RNA-Seq, or it's chromatin structure and state. There's always something, a new kind of sequencing that such a powerful tool we can apply in so many places. It never gets boring.

Lydia Morrison:
Yeah, definitely. Your breadth of knowledge and experience, I'm sure has been really helpful in sot of helping develop these tools for scientists in various fields to be able to identify what's happening in their sequencing experiments. I wanted to thank you so much for sharing your perspective today. I think it's been really educational for me and hopefully for others listening who are new to NGS. And I think also it's been really interesting to hear about the primer monitor tool, which I think is a really amazing resource and I can't wait to see what happens with that in the future.

Brad Langhorst:
Yeah, I'm excited about it too. I think there's so much stuff to do, and one of my favorite things about working at a place like NEB is that we can do this more in the open than we could in other companies. So we can share exactly how we're building the primer monitor, all the details of exactly how it works are just on the web so other people can collaborate with us, give us ideas and work together on these kind of problems. So super fun to be able to work in that kind of environment.

Lydia Morrison:
Yeah, I think you're right. I enjoy it as well. Thanks so much for joining me today, Brad.

Brad Langhorst:
Sure. Thanks Lydia.

Lydia Morrison:
Thanks for joining us today. In our next episode, we'll continue our series on next generation sequencing. Hope you'll tune in for some new perspective.


Loading Spinner
"