Sam Siljee – Passion in Science Award winner 2024

Sam Siljee, M.D. shares how he thinks about data interpretation and how he was led to develop a sound-based representation of experimental data. 

Script

 In the introduction, I mentioned that I'm working on lung cancer. That's what I'm studying. I'm very lucky in that our lab is right next to the hospital, so we have patients that very kindly donate some of their lung tissue that would otherwise be binned from surgery. We take it into the lab, I extract cells from it, normal lung cells from adjacent tissue, adjacent to the tumor, whatever the piece of lung was removed for. Then I grow these in the lab in cell culture in ways that mimic as closely as possible the human body so findings we have from this research can be applied to the human body. Half of them, I split the cells up half. I block a very important protein, which is nicknamed the guardian of the genome, it protects us from mutation and has a lot of implications in cancer. I use that to model cancer. Then I plug it into my favorite machine in the lab, the mass spectrometer, and I get too much data out at the other end.

So this problem I run into, the data is so big, it is literally too big for Excel. It will not fit in Excel. So I had to teach myself coding in order to manage the data, and it is very much a wrestling process. I see actually engaging with the data and getting meaning out of it, because the data that comes out is good for computers. Computers can read it, they're fine with endless lists, millions of numbers. But as people, it's very hard to understand. And back in the day, when you run a mass spectrometry experiment, get a handful of spectra, you can annotate them by hand. It would take multiple lifetimes to analyze this kind of data that I generate in half-an-hour.

So then you need to resort to these sorts of tools. This is MaxQuant, which is not very accessible or approachable. A lot of people would be intimidated by this. So our baking, yes. Asking the question why, I felt an enormous drive to do this process. And I was asking myself, "Where is this coming from?" And I think it comes from in school. So my schooling process is slightly different. We'd have the same topic for a period of few weeks. It was architecture, history, mathematical patterns, et cetera, and at the end of that period, we had to do what was called a creative response. So we'd study books, and I'd write songs and skits in response to that, including mathematical patterns.

So these are Fibonacci sequence biscuits, and I have to admit, this is not a picture from when I did it back in school. Those are a lot nicer, came out a lot cleaner. It's actually really hard to record that, but I didn't take pictures then. So Fibonacci sequence biscuits, I think that might be one of the origins of why I'm doing this.

So I will now teach you about mass spectrometry. I've been learning about mass spectrometry for at least three years, and every time I sit at the machine, I learn something completely new. So my challenge is going to be to explain to you guys in two minutes. So light spectrum. I remember very clearly the magical moment when I realized how people worked out what stars are made of, the elemental composition of stars, and thought, "How can you possibly know that without going there and taking a sample?" And how they do it is they take the light from the stars, they use a glass prism, and I expect mostly you would've seen the rainbows that get produced, and that is called a light spectrum. See, split the light up into all its different colors, and each element has got different peaks at different colors that come up. And so that's how you work out what stars are made of.

So this is a mass spectrometer. This is the expression of a guy who hasn't yet learned how much data comes out. Just kidding. I still love the data. So this is the machine that feeds the sample in. It's a big black box with lots of very, very expensive stuff on the inside that you don't want to touch. And this is how we get mass spectra. So instead of light, we have our sample here, and there's many, many different variants and flavors of this, but essentially we turn the sample into ions, so charged particles in gas state. And then instead of a glass prism, we actually have magnets and very complicated electromagnetic fields that then move those ions.

And as with the prism, you can actually get magnetic fields that will split up these ions based on their mass, so how heavy they are. Now, caveat for anyone that's into mass spectrometry, I will say mass, technically it's mass-to-charge ratio, but I will say mass because it's easier. So we have our sample gets turned into ions, those ions get moved by these magnetic fields, and then we get spectra out, which are similar to splitting up the light and looking at all the different peaks.

So this is what the data actually looks like. So we've got m/z, that's the symbol for mass, so mass along this axis here. And then we've got intensity, so how much of that ion of that particular mass is present is up on the Y-axis. But, and remember, we'll come back to this later, there is a third dimension to this data, which is very important. That is time.
So my samples are so complex, a biological mixture is so complex, it would be trying to look through thousands of microscope slides at the same time. Now, you can't see any detail. That's why we have the machine that drip feeds a sample into the machine over time. That spreads it out, and that then lets us get a little bit more in depth. So there is a time dimension to this data as well. But the screen only lets me show two at a time. Yeah?

Okay, coercion. So it really was very much like a flash of inspiration that you hear about where I was sitting at my desk, thinking, "Is there a way to actually present this data non-visually, yeah?" I'm making an assumption that the only way to show this data is through graphs. Then it occurred to me, what about sound? We have many other senses. Why are we not using them? And I looked into other projects turning scientific data into sound, so I'm not the first to do this.

What I noticed, though, a lot of other projects in the very sort of dry scientific way reduced the data down to a simple string of numbers, which was then used to make a piano note go up and down. So this picture is MIDI, which is a music device interface, which is what we use to tell computers to make music. So we've got the different notes of the piano going up and down, and then we've got time along here. What this does is it coerces the data in a few ways. So firstly, the tone is generated by a synthesizer in the computer, often default piano. The tones, so the way in which we break up the octaves into the notes that we know, it's a western cultural construct, and again, that's a coercion. The timing of breaking it into square pieces, that's a coercion as well, and I wanted to honor the original data. So I wanted the tone, the pitch, the timing to come from the data itself instead of being somewhat arbitrarily applied on.

Right. So what makes tones different? What makes tones different? I've got some sounds here for you, so this first one is a pure sine wave, which is sort of mathematically described by the movement of a circle.

Yeah. So we can hear it. So that's a sine wave. That's the wave form that you're seeing at the top, so the shape of that wave. Then we have a guitar.

And the piano's last. We might be missing the piano. You know what a piano sounds like.

So what we can see is that there's a repetition in the wave, and the repetition, the cycle comes together at the same point that is the pitch. So the frequency is the same, but the shape of that wave is different, and it's a shape that gives each its unique tone. So this is where I remembered a really amazing piece of math that I cannot do any justice, because I only know about that much about it, which is the Fourier theorem, which in my understanding is that you can make any complex wave form by adding simpler wave forms together of different frequencies. So we can use that to make complex wave forms, which gives us unique tones.

So adding together wave forms. So I've got two pure sine waves, so similar to the one you saw before, of different frequencies, a fast and a slow one, and we add them together, you get a more interesting shape. Now what I do with my spectrum, for each of these peaks, and a spectra in my data typically has thousands of peaks, I make a sine wave. And the mass, so how heavy it is, I map to frequency. And the intensity I map to amplitude, so how tall I make that sine wave. You can add them together and get wave forms that look like this. Probably curious what this sounds like, most people are, so we'll have a listen.

My test experiment had 58,000 spectra in it, more than 58,000. So 58,000 unique tones later, I need to put them all together. And this is where that third dimension of the data is really important, the time. So this experiment runs over 90 minutes, and I have observed each spectrum at a particular point in time. I use that point in time to play back the tone at that point in time. Each spectrum also has an overall intensity and use that to determine the loudness. I can put them together, and this is just a basic visualization I made to go along with it, but what I will show is you've got the two sides here, the left and the right, and that is to show that we actually have two types of spectra that we take.

We've also got two ears, the left and the right ear, and so the overview spectra I use for the left audio channel, and the spectra that we drill into more details to actually identify the proteins in the sample, I play on the other channel. So this is what they'll sound like. You won't have to listen to all 58,000, because that would take 90 minutes, and I don't have that long.

I listened to it in the lab, I played the full 90 minutes as I'm doing my work, so I've listened to it multiple times. A lot of things sort of come up when you listen to it. The left and right audio I mentioned, another sort of real challenge in mass spectrometry is what we call dynamic range. So the difference between the high and the low, the very intense signal, the very weak signal. In the very start, it was very quiet. You could barely hear it. And there's parts, I'd have to keep changing the volume in the lab, there's parts which are just so loud. So it really sort of makes much more tangible and it opens up conversations with people that I play this for about some of the challenges that we see in science, and dynamic range is a big one.

So I've got another one, because I can, and this is a slightly different acquisition method, and we'll have a listen to this one too.

One sort of the detailed in-depth, we call them MS2 scans, where we actually identify the different proteins. You select little peaks, and you specify, I'm going to pick that one, that one, that one, and it's different every cycle. So the repetition that you're hearing reflects what we call cycle time in the machine where you repeat a cycle of a certain order of certain types of scans that you're doing. And with this second method, instead of picking specific peaks each cycle, you actually scan across the entire range in separate windows, and you start low and you move high, and that's actually what we're hearing. And this is something which took me quite a bit of time to actually understand properly, and it comes out in listening to the sound like this. Some of the other things that come out is I've, of course, experimented more with these sounds, because the mass-to-charge or the mass ranges that we look at actually happen to be very similar to the frequencies that we can hear ourselves.
But I played with mapping it across a wider spectrum of frequencies. I played with moving the frequencies sort of beyond what we can even hear as people, and you still get audible output, which is really interesting. These sine waves that I'm making are well beyond what we can hear as humans, and they're still hearing stuff. And so that tells us about sort of complexity, and you get these things called emergent properties, so things that come out of complexity. And I believe that a lot of the sound that we're hearing is actually interference between those peaks so that even if you scale it beyond what we can actually hear, there's still audible signal there from the interference of those peaks. So that's a really interesting look at complexity. Again, complexity is a really interesting challenge in science at the moment.

So emergent properties, I talked about AI. So I find that AI can sometimes be a bit of a black box where you put something in, you get something out. This, for me, is much more direct, and it's much more hands-on practical exploration and engagement with the data. It might just be because I made it so I know what's going on. Hopefully, you guys by now understand a little bit as well. Methodology versus biology. So people ask me, "Okay, what am I hearing? How different does the normal cells sound from the cancerous cells?" Which is a really interesting question. At this point, I can say it doesn't sound very different, which is really interesting in and of itself. If we look at all of the proteins that are different between cancer and normal cells, it's actually only a handful that really account for all of the changes that we see. What these sounds reflect more is the actual acquisition methods. So hopefully, I was able to demonstrate a little bit the differences in sounds between the different acquisition methods.

That is because this is very much the raw data. This is literally what comes out of the machine before any kind of processing. In order to get biological meaning from sort of a more standard analysis, it undergoes many, many layers of statistical summarization and graphical presentation at the end.

So implications, because as scientists, I feel that it's very important that there is a practical site to everything we do. So there are vision-impaired scientists, and this is a way for them to engage with the data more directly and more meaningfully. I have a friend who has ADHD, and she feels very seen by presenting data in different ways and different ways of looking at data or rather listening to data. It's good for teaching mass spectrometry. If I had two years, I'd be able to teach you guys a lot more about mass spectrometry as a wonderful ... Hopefully, you're able to learn a little bit. And approachability, because I find that with my non-scientific friends and family, if I show them a graph, they're intimidated. If I show them that screen at the very start with the colored heat map in green and yellow, yeah, they will just instantly assume that they don't know what's going on, they'll switch off, they'll disengage.

If I say, "Do you want to listen to some alien sounds?" it's a lot easier to get them into conversations about things like dynamic range, like complexity, like how different is cancer really from normal then? And I think that it is a big challenge in science at the moment is we are great at producing data, but our rate at which we produce data is outstripping our ability to understand the data. I think that's a real challenge for us at the moment. And I think we need to really take a different approach, I think, if we want to face this challenge. And while this won't replace conventional analysis and graphing anytime soon, I think that new ideas have to start somewhere.

So where to next? I'd like to bring the biology back, so I'm planning to get the process data and use that to compose the piece, to have the raw data as a source of tone and sound, and then use the process biology to actually do an overall composition. At the moment, the composition is again driven by the acquisition method, so a 90-minute experiment gives you an epic 90-minute soundscape. I'm presenting this at the Australasian Computer Music Conference at the end of the month. Title is What is a Biologist Doing in a Music Conference?

I found a publication called Leonardo, which is about the intersection of art, science and technology where I think this really fits, which I think is another way I think I would like to spread this work, and then working with artists and scientists. So a lot of people are interested in this, both in the arts and the science, and so working together with them is really exciting.

Now, this bottom bit has come out a bit funny. That's because I wrote this in code, and I must have saved it incorrectly. So I'm using R, which is a language for data analysis. It is not designed for making presentations, and yet, you can use it to make presentations. Sometimes, they come out funny. You can use it to make websites, and you can use it to turn your raw data into sound. So I made a very basic website up there, which you can upload your own data, and you can play with it yourself, and it gives you lots of fun little sliders and things to test around with.

The code you can see here on my GitHub. And then if you have any further questions or if you want to just chat, because I love talking about this, thank you for letting me, you can have my email there. And so finally, I'd like to thank the Gillies where I do all of my research. Without samples, there'd being nothing to put into the mass spectrometer and nothing to get sound. I'd like to thank Victoria University where I'm doing my PhD. Thank them for their doctoral scholarship and my supervisors there. And at the Gillies, I'd like to thank Wellington Hospital for providing the infrastructure and, of course, the patients for generously donating their samples for me to work with. And finally, thank you to NEB for this wonderful opportunity. Thank you.


Loading Spinner