The Centrillion VirusHunter

DISCLAIMER: I’m currently working on a seed stage sequencing related project. So, keep that in mind when evaluating my comments. And get in touch ([email protected]) if this might be of interest.

There’s a new press release out from Centrillion suggesting their sequencing platform should be available now/soon. I’ve previously covered Centrillion, and recommend referring to that post for background on the company.

The press release doesn’t give much information, but references a recent publication, which provides more details. The paper describes a tiling array covering the entire SARS-CoV-2 genome. This is essentially a traditional microarray which covers the every possible SNP in the known SARS-CoV-2 genome. They describe this as follows:

“Here we describe a full genome tiling array with more than 240 000 features that provide 2x coverage of the entire SARS-CoV-2 genome and the use of such a genome tiling array to sequence the genome from eight clinical samples”

“Each base has two corresponding probe sets: one for the sense strand and one for the antisense strand.”…”one for each base”

So, Centrillion use a total of 8 probes per site. SARS-CoV-2 has an ~30Kb genome, which gives us the 240K features mentioned above. They use 25bp strands in their array. Tiling arrays have a number of limitations as compared to sequencing based variant detection. Firstly, it’s not clear what issues might be caused by multiple mutations covered by the 25mer. Such mutations would likely reduce hybridization efficiency, and could result in a variant miscall or “nocall”. The approach is also not able to detect deletions or insertions. These limitations make tiling arrays less interesting as a general purpose tool.

I personally would not characterize tiling arrays as “sequencing” or “resequencing”. In a tiling array you get a single read out for each position. Sequencing approaches generally produce a continuous readout of bases, without prior knowledge of the sequence context.

The paper presents a comparison of their approach against Illumina sequencing, summarized the in table below:

From the table above, noncalls are in the region of 1% seems to be one of the more concerning issues. I assume this means that both sets of probes essentially failed to provide a useful signal at these positions. The accuracy for the remaining positions seems reasonable (though less than I’d expect, given a known reference for the exact variant).

Summary

This is a fairly traditional tiling array for SARS-CoV-2 variant detection.

From the press release, it seems this is pitched as a lower cost alternative to sequencing. I don’t have good current pricing for microarrays, but I imagine the array they’re suggesting would cost >>$30, based on microarray prices I’ve seen. This is significantly more expensive than qPCR based testing (probably x10). This makes it too expensive for routine use in SARS-CoV-2 testing.

A detailed cost comparison against sequencing would be interesting. But I suspect we’re not talking about an order of magnitude cost difference. For me, this wouldn’t be enough to make an array based approach attractive, given that sequencing provides a richer dataset and more accurate.

Hopefully Centrillion will continue to develop some of their sequencing based ideas in the future.

Armonica update

DISCLAIMER: I’m currently working on a seed stage sequencing related project. So, keep that in mind when evaluating my comments. And get in touch ([email protected]) if this might be of interest.

There’s a new Genomeweb article on Armonica Technologies. So I figured it was probably time to revisit the approach, which I’d previously covered. The Genomeweb article is pretty vague, featuring one very blurry plot:

But it turns out they have a couple of posters on their website from AGBT that add more context. The following shows dNTPs (nucleotides free in solution). The idea is that the Raman spectra gives a characteristic signal for each base type.

The Genomeweb plot expands on this to show different base modifications. The individual dNTP traces show characteristic differences, but given that these are single traces (perhaps averaged over the surface, but I suspect not over experiments) it’s difficult to understand how much experimental variation there is. The poster does show traces under two different conditions (in solution versus dry monolayer). We can combine these plots and see how consistent the traces are between conditions:

In the above I’ve placed the traces side-by-side. For the most part the traces match pretty well, certainly A and T traces look similar. However the C and G traces don’t look as good. I’ve marked locations where peaks differ (red arrows). Without more to go on it’s difficult to come to a strong conclusion. But ultimately the consistency of these traces may limit accuracy.

From what I can tell this is the best data they have presented, there’s nothing from nucleotides on strands. Slowing the translocation of strands would be critical for this. In their other poster they make the following statement “Molecule penetration over the 5 μm wide barrier takes around 2.5 s for as-fabricated roof. Speed ~2 μm/s (>160 ms per base).”. 160 milliseconds would probably be enough time to get a single molecule optical readout. But the numbers don’t quite make sense to me… and suggest that this should be 160 microseconds. That would be faster than any single molecule optical imaging system that I’m aware of.

Beyond this we can make some guesses about what a platform built around this approach would look like. Taking the full spectra of each pore would be problematic. Hamamatsu do some really neat spectrometer modules, but I suspect having a spectrometer for each pore would be unrealistic.

I’d therefore imaging that you’d pick a number of peak locations which uniquely identify base types (as shown by the vertical bars in the Raman spectra above). Traditionally for these kinds of single molecule experiments you’d use sCMOS cameras. For the locations shown in the plot above you’d need 6 cameras, simultaneously monitoring pores during translocation. Typically these cameras cost >$10000 (for the cheapest models). Cameras will need to be fixed, monitoring a single region in real time. This will limit throughput.

As we’ve seen with PacBio however, it’s possible to integrate much of this onto a chip based platform.

Conclusion

Oxford Nanopore showed base identification in solution back in 2009. The first external data from that platform was publicly shown in 2014. So I’m skeptical of the statement that “If someone gave us $20 million tomorrow, we’d have a working product in two years.” as suggested in the Genomeweb article.

The platform being developed seems somewhat similar to PacBio’s, with an optical readout and realtime monitoring of strands. So at a first pass looking at PacBio numbers, a ~$1000 run cost and ~$500K instrument cost. I can imagine there’s an order of magnitude potential for cost reduction, but difficult to see it being cost competitive with Illumina.

Being able to detect base modifications is certainly interesting, particularly as there appears to be a consensus that base modifications are important for early stage cancer detection. The market for long reads is currently less clear. In the GenomeWeb article they propose that “the company will try to provide long reads with epigenetic information to guide therapy selection”. Which is certainly an important application, but not as large a market as NIPT/cancer screening.

DNA Sequencing and Japan

A Short History of Sequencing in Japan

Japan played an important role in the human genome project. Being the top contributor outside of the UK/US.

They also developed instrumentation, and technology used in the human genome project, for example multi-capillary sequencing, “The multi-capillary DNA sequencer equipped with Kambara’s device was commercialized in 1998 in alliance with Hitachi and Applied Biosystems“.

But after the human genome project, Riken and other sequencing centers in Japan, didn’t really transition into “genome centers”. In short, no “production scale” sequencing core facilities where maintained in Japan.

Japan was slow to take on next-gen sequencing, and made some significant mis-steps. For example, acquiring a number of the commercially failed Helicos instruments. This was likely a significant investment (probably enough to buy 8 Illumina instruments). While some interesting work may have been done on the Helicos instruments, it was’t really suitable for the kind of large scale projects (1000 Genomes project etc.) that other centers were engaging in.

So, large sequencing projects fell by the wayside in Japan… and both research and instrumentation seemed to stagnate for many years.

Japanese Components Used in DNA Sequencing

At the same time, Japanese companies have continued to play an important role in DNA sequencing instrumentation, ABI contracted Hitachi to develop their (failed) next-gen sequencing instruments:

But more importantly Illumina instruments used (and likely still use) a number of critical components developed in Japan.

The Illumina GA2 used at least the following components from Japanese companies:

  • Nikon x20 Objectives (and likely other optical components)
  • An ASI stage, incorporating IKO linear guides (and possible other components)

The Miseq uses two Sony DSLR grade CMOS image sensors (IMX038?), and likely other components of Japanese origin. The Hiseq instruments use Hamamatsu TDI linear CCD imaging sensors, which it seems are likely still used on the Novaseq and other instruments.

While I have no direct evidence, many other critical components (laser diodes, filters) are commonly made in Japan and may be used in Illumina instruments. So while much of DNA sequencing instrumentation is built upon Japanese technology, Japan has failed to capitalize on the added-value produced by DNA sequencing instrumentation.

Japanese DNA Sequencing Startups

In part because there have been few sequencing startups in Japan. Elsewhere I’ve listed ~40 startups engaged in the development of new approaches to DNA sequencing. ~30 of these are based in the US. 5 in the UK. And in Japan, 1 (of which I was previously the CTO).

This lack of sequencing startups has contributed to Japan’s lack of capability in this area…

Some Random Filters from eBay

I picked up some ASI FW1000 filter wheels from eBay that were going cheap. One of them was fitted with unspecified filters. Given that a number of the FW1000s on eBay are from Genome Analyzers I suspected these would be the same as those I already have. But they were completely different.

So I stuck them up against my Xenon light source and looked at the spectra… the plots are shown at the bottom of this post. Two of the filters have spectra that seem easy to interpret. 1000736 69799 (A) seems to be a band pass between ~550nm and 570nm. And 1000735 69790 (B) seems to be a band pass between ~576nm and 655nm.

The other two, I don’t understand. 1000732 69780 is a bandpass over roughly the same range as (A) above. But seems to strongly attenuate the signal. 1000733 69768 has some kind of peak around 660nm? But it’s much less clear.

Googling around I found the following reference to filters with similar serial numbers:

These three filters match the recorded spectra well, but what exactly 1000732 is doing is less clear to me. Shame the documentation doesn’t list the manufacturer, which might yield more details…