Notes on Genia’s new paper – nanopore SBS

geniaporeGenia have released a new paper showing recent data from their “nanopore SBS” platform.

Summary: The best data in this paper is a 20bp read on a synthetic template with no homopolymers. This has long dwells (multiple seconds) and levels looks clearly differentiated. The second has short dwells (100ms?) under different experimental conditions they say gives better resolution on homopolymers.

The first dataset looks like reasonable progress, the second I’m not sure I buy, and is very low complexity in any case (just 3 homopolymers runs).

Overall this is an R&D level system. It’s interesting progress, but not useful for any application at present.

Genia’s nanopore SBS technology is shown in the figure above. To a computational scientist like myself it seems like an interesting system. Genia have had modified nucleotides created such that each nucleotide has an oligo hanging off it. That seems pretty amazing, but it appears that the nucleotides are incorporated by the polymerase. During the nucleotide incorporation process, the tag breaks off and passes through the pore. The cartoon below shows the basic idea:

geniacartoon

The diagram above shows each nucleotide tagged with a longer oligo which goes down the pore. When the tags sit in the pore they block the flow of ionic current through the pore. While in the diagram above I show polyN tags, Genia have selected tags to give a good spread of current blockages (and have included modified bases in the tags). Uses oligo tags has two benefits over competing systems. Firstly, each tag is providing a signal from a single base. In some competing nanopore systems multiple template bases are in the pore at the same time. This means that more than one base effects the readout. This results in a convoluted signal from which it can be difficult to extract the original template sequence [1]. The second advantage is that you can optimise the spread of the tags so that each tag and be easily differentiated.

 

There seems to be one other trick in the system described with this sentence “The applied voltage is adjusted to ensure that, in a majority of cases, one and only one pore is inserted into the membranes of each well. “. My understanding was that the number of pores in each membrane is poisson limited (see new post on this). But if they’re able to control the pore insertion with an applied voltage that’s pretty neat (perhaps someone who understands this can comment). The paper discusses a 264 pore chip, which stuck me as odd as I believe they’re talked about chips with many more pores.

Data

genia_trace1

The first dataset is shown to the right. This is the dataset that contains no homopolymers runs. To my mind it’s the most convincing dataset in the paper. Raw data for this plot isn’t available (why is that still ok in 2016?). So we’re forced to draw our conclusions from eyeballing the data, and their analysis.

The data however looks quite clean, I’d assume this is the best data they’ve seen on the chip, and it’s a shame their aren’t more examples of this read. The base dwells seem to be all over the place, and I’d assume, much like other nanopore systems, they are exponentially distributed.
genia_trace2

The second dataset describes their experiments optimising the system for homopolymer detection. I find this less convincing. It’s a short run, and it’s hard to tell how much longer the ‘T’ calls are than the noise spikes that appear to be at almost the same level. The following statement also gives me some concern:

“Base calling was carried out by manual inspection of the current level of each deflection, ignoring ones with dwell times less than 10 ms.”

I guess this is effectively thresholding the data, but in that case why not say that? Regardless the fact that an automated base caller wasn’t used most likely means that datasets are very small at the moment.

Overall this is interesting progress and represents a solid milestone in their development. It’s not clear that this actually represents the state of the art Genia system. It may be that this is an older platform and doesn’t reflect the current system, as the low pore count might indicate. However, it’s common for every vendor to say this when a new paper is released, and it’s difficult to discern the truth without further disclosures.

 

[1] In the pore used here this is particular important. You have about 15 picoamps between the maximal and minimal blockage. This isn’t a huge amount of signal. Before even considering thermal noise, if we were to sample at 1MHz 1 picoamp would be 8 electrons per timestep. As a colleague used to say… so few electrons that you could name them.

Disclosure/Disclaimer: I have worked and continue to work in the DNA sequencing industry. I own stock in DNA sequencing companies. While I have tried to be unbiased this represents my opinion and speculation only. I recommend reading the publicly available paper for yourself.

PunkSeq10 Schematics and Gerbers

PunkSeq10

This post contains the current gerbers, schematic and layout files for the PunkSeq10. The currently shipping version is r2. You can buy this from my shop.

Schematic as pdf: punkseq10

Kicad files (including gerbers in gerbers.zip): BabySeq.tar

Are you sure this isn’t horse? – DNA Sequencing is Universal Sensing

 Today DNA sequencing is dominated by research applications. This is likely to continue in the short term, but in the medium and long term the future of DNA sequencing is likely to look very different.

Fundamentally I see sequencing as a new class of sensor. People talk generally discuss sequencing in the context of human health. But as sequencing decreases in cost and becomes easier to use it becomes more like a general purpose sensor. Like CMOS imaging chips for example, it will have research,clinical, and consumer applications.

The Medium Term

In the medium term it’s clear that DNA sequencing will be making its way into the clinic. Companies like 23andme which screen for inherited genetic traits are one obvious application. The genetic screening of every child at birth is somewhat attractive, and could provide everyone with a dataset they could draw upon throughout their lifetime. With 4 million births a year in the US, this is a pretty big market, but it’s perhaps not the biggest.

There are 1.7 million new cancers reported each year in the US, cancer is an inherently genetic disease, and each of theses cases would ideally be sequenced in full to better understand its genetic cause.
But better than understanding a cancer once you know it exists is to detect it early so you can do something about it. Companies like Illumina’s newly founded GRAIL seek to use DNA sequencing to regularly screen for cancer. For various reasons both complete, and fragmented cancer cells end up in a patents blood. By taking a simple blood sample you should therefore be able to screen for cancer using DNA sequencing. Screening every US adult every 5 years gives you a market for 50 million test a year.

The screening market is much bigger than this. There are more than a million sepsis infections a year in the US screening for early detection of sepsis and other infectious diseases would be of huge value. In fact is seem obvious that when bulk sequencing for <$100 becomes generally available, blood serum would be sequenced as a matter of course. That single test could detect cancers, infectious diseases, and provide a genetic profile of any unborn children as well as the patent themselves.
If we sequenced every patient admired to hospital, that would result in ~30 million genetic tests in the US alone.

The Long Term

In the longer term it’s clear that sequencing costs will drop even further. My guess is that the basic sensing technology will drop as low as cheap CMOS imaging sensors are today. These cost almost nothing, on the order of a dollar. Unlike CMOS sensors though I expect DNA sequencer to always be single (or few) sample use. But overall it seems reasonable to expect that the cost of sequencing will drop to the $10 mark.
Combined with on chip extraction techniques this could create a simple consumer grade platform. But what exactly would a consumer do with cheap sequencing.

There are a few ideas I’ve heard thrown around. One is routine sequencing at borders. This would allow quick detection and containment of viral outbreaks. This might be feasible at the $10 mark, though it may have significant legal and moral implications.

I think it’s more likely that at that level people would be routinely sequencing themselves anyway. Every time you have a cold, or feel a bit under the weather you’d sequence some samples and find out exactly what was wrong with you. It’s possible that this could lead to targeted medication but reassurance that “there’s nothing serious wrong” is probably worth $10 to most people, and is a lot more comforting than a doctor saying “it’s probably nothing come back if you still feel bad in a week”.

You’d likely be using DNA sequencing as a QC in agriculture and food processing too. To track contamination, source of infection, or monitor supply chains. It’s so cheap that it makes sense to do this on a per-batch basis.

And in agriculture, it will be used to monitor the health of livestock, in much the same way as it would be used to monitor human health.
Some have even suggested that DNA sequencing will be integrated directly into toilets, which will in no doubt cheerily intone your wellbeing, or setup a doctors appointment for further tests.

DNA sensors in public spaces might continuously monitor for airborn viruses.

Ultimately I think everyone will have a DNA sequencer in their home, either continuously monitoring the occupants health, or used as required. Perhaps enabling to answer questions like “is this really not horse meat?”
If we can beat a $10 sequencing run and head toward $1 even more applications open up. “What’s this plant?” You could try and look it up, but why not just sequence it, and as a bonus get a complete genotype. How clean are my work surfaces? Sequence the bacterial population?
The eventual global market for sequencing is likely to be in the 10s of billions of tests per year.

Disclosure: I always try to be as unbiased as I can. However, I have worked for DNA sequencing companies. I own stock in DNA sequencing companies. And I continue to work in the industry.

Road to the $1 Genome

There’s been much talk of the $1000 genome. But it’s clear that the price will continue to drop even further. Ultra-cheap sequencing (and sample prep) would open up entirely new applications. The route to ultra-cheap sequencing may hold some surprises and it’s interesting to run the numbers. Let’s begin imaging our ideal sequencing platform.

DNA sequencers are often characterised in terms of throughput. That is, how much DNA they can sequence per unit of time. The human genome is about 3 billion basepairs. You’d be forgiven for thinking that you’d only ever want a sensor that delivers that much sequence. Turns out there are lots of applications where lots more sequencing would be useful. Often you’re sequencing larger populations of organisms. Or you’re looking for something that doesn’t happen very often (low abundance fragments of cancer in blood plasma).

As a convenient guesstimate I’d say we want to be able to sequence 1000x the size of a human genome. Oh and I’d like to be able to run this in 5 mins. For fear of being accused of overkill I’ll leave the specs at that (this much sequencing costs 1000s of dollars and would take days currently).

How many sensing elements and what kind of throughput would be needed? Sensing DNA at more than 100 bases per second (bps) is likely to be tough. Nanopore approaches generally generate signals in the picoamp range. Amplifying these signals at anything more than a few 10s of Kilohertz gets hard. SBS approaches will certainly also have issues running at speeds faster than this.

How many sensing elements do we need? 3^10^9/60/5 is 10 million. At a 100bps, that gives us 100,000 sensing elements.

An iPhone 6s camera can reliably sustain these kinds of data rates. 120fps at 1080p (~2megapixels). That camera module regularly goes for about $20 on eBay, similar modules are likely available OEM at much lower prices. Which is to say, we can build CMOS chips at volume, which produce data rates in the right ballpark.

While $20 is cheap, it’s still not cheap enough to be a throw away component in a $1 sequencing system. And while semiconductors are crazy cheap (you certainly can buy cheap CMOS sensors for around $1), it’s hard for me to imagine shipping a complete, semiconductor derived, consumable for this price.

So what does the technology behind a $1 run sequencer look like?

The great hope for the future of sequencing has always been nanopores. In this technique the DNA passes through a small aperture. As it moves through the hole the DNA sequence is read off using one technique or another.

Nanopore arrays are based around semiconductor fabrication, and the cost profile is likely to be similar. It’s hard for me to see how a nanopore sensor could be produced for less than $1, and delivered to a user for less than $10. It’s also hard for me to imagine that the array is reusable. Any system where the DNA is in contact with the sensor will result in the contamination of that sensor, and its rapid degeneration.

So what might a $1 sequencer look like? Somewhat controversially, I think it might decouple the sensing technology from the substrate in much the same way current massively parallel sequencing platforms do today. An optical system, with cheap reagents, and a cheaply manufactured substrate (which costs <$1) would seem logical (though a reusable FET sensor might work too). The instrument itself might use more expensive CMOS cameras and cost a few hundred dollars, but because the sensing and sample are decoupled they could be used repeatedly.

Disclosure: While I always try and remain unbiased. I own stock in sequencing companies. I’ve worked for DNA sequencing companies. And I continue to work in the industry. Exercise your own judgement.