Singular Genomic Systems S-1 Review

I’ve previously written about Singular Genomics. It now looks like their preparing for an IPO and have filed an S-1. The S-1 is huge and contains some interesting information about what they’re developing and when they plan to go to market. It’s fairly readable and I recommend taking a look if that kind of thing interests you.

In this post I’m going to try and quickly review the technological aspects of the Singular Genomics approach based on the S-1. This document doesn’t really go into specifics (and I’ll refer to other posts on this) but we can get a sense of where they lie technologically in respect to Illumina. So I’m going to dive into the technology and briefly review some of the business aspects at the end of the post. There are two aspects to the Singular Genomics play. The first is the basic sequencing instrument. The second are various sample prep and other analytical approaches they have in development, some of which complement sequencing.

Sequencing

In summary, they look to be developing a 4 lane, (4 color?), Miseq-like instrument. There’s a figure buried in the middle of the document that describes the basic chemistry:

As in Illumina style sequencing, they perform on surface amplification of a single template to generate clusters, and then sequencing-by-synthesis to read out the sequence. The S-1 doesn’t state exactly how they are generating clusters. A quick patent search doesn’t yield anything either. Illumina use a bridge amplification approach, the initial IP was created by Manteia (acquired by Solexa, and then Illumina). I’ve discussed that IP elsewhere, and it appears to have expired. So my guess would be that they’re using this expired IP for cluster amplification.

They don’t appear to be using patterned flowcells, or ExAmp. This makes the approach more like that used on the Miseq (and original Genome Analyzer) than that used on the NovaSeq (and NextSeq 2000). This will limit the density of reads on the flowcell somewhat. From the image above it also looks like a 4 color chemistry, which complicates the optical system somewhat as compared to Illumina’s current generation instruments.

The S1 lends some weight to the idea that they are targeting “MiSeq-like” throughput: “We purposely designed our G4 Integrated Solution to target specific applications and to be capable of competing with other instruments across a range of throughput levels, particularly in the medium throughput segment.”. Flowcells, reagents cartridges, and instruments look fairly familiar:

The projected run times appear to be similar to a Miseq giving a “sequencing time of approximately 16 hours to complete a 2×150 base run.”. Similarly, Illumina quote a 17h run time for their Miseq nano kits at 2x150bp.

As I understand it the Illumina reversible terminator chemistry isn’t yet off patent. The S1 states that they “anticipate initiating an early access program followed by a commercial launch of the G4 Integrated Solution by the end of 2021, with intentions for units to ship in the first half of 2022”. Which suggests that they wont be using Illumina-style reversible terminators unless they believe that IP wont hold up anyway (this seems risky as Illumina are currently using it to block MGI from selling instruments in the US).

The S1 also states that “We in-licensed certain patents and other intellectual property rights from The Trustees of Columbia University” so it’s likely that they’re using Jingyue Ju’s nucleotides, as previously discussed.

They show some results on sequencing, overall data quality looks like it’s in the same ballpark as current Illumina systems. I’d suggest, much like MGI, it’s a reasonable drop in replacement for Illumina:

“This figure displays the current sequencing performance of our core Sequencing Engine with a demonstrated accuracy of 99.7% on 150 base reads (Q30 on greater than 70% of base calls) with a throughput of 153M reads per flow cell. We are targeting sequencing performance of Q30 on greater than 80% of base calls for 150 base reads and 330M reads per flow cell.

Increased data quality though sample prep

Another aspect of the Singular approach appears to be what they call “HD-Seq”. HD-Seq is designed to enable lower error rate reads that can “achieve accuracy levels of Q50” (error rates of 1 in 100,000). It appears to read out both strands of a double stranded fragment and combine the basecalls from both. They pitch this as being particular important for cancer diagnostics through the sequencing of cfDNA. And show some results:

They don’t state how this works exactly, but we can make some guesses. One way of doing this that’s been suggested is to tag the forward and reverse strand with the same index. You can then combine this information, giving you two observations of the same base, reducing the error rate. There’s a patent on this which I guess they may have licensed (though it’s not mentioned in the S1). Other approaches such as the 2D sequencing approaches used by Oxford Nanopore introduce a hairpin, allowing you to read through the forward, and then reverse strand.

You can also use a read pooling/indexing approach (such as 10X use to generate synthetic long reads) to make it easier to pair forward and reverse strands.

Beyond this, I could imagine a neat approach that takes advantage of cluster generation technology. Essentially, flow in double stranded DNA and weakly immobilize it on the flowcell. Then melt it such that the strands separate and attaches to nearby probes/primers on the flowcell. These two templates are then physically close on the flowcell. During image processing/basecalling you can then see that these two templates have a similar sequence (in reverse complement) and likely come from the same source double stranded DNA.

Singular state that their method can “provide higher accuracy than standard single-strand NGS sequencing methods (including ours)” so it’s likely agnostic of sequencing approach. Which suggests to me it’s likely an indexing or hairpin based technology.

Their motivation seems to be “oncology where there is an increasing need for higher sensitivity technology such as rare variant detection in liquid biopsy”. I’m not entirely convinced by this, beyond sensitivity, base modifications seem to be becoming increasingly important for early stage cancer detection. Is the difference between a single base accuracy of Q30 and Q50 critical? Or to put it another way, would you swap two Q30 reads for one Q50? Would be interesting to see more of a justification (or a reference) on the requirement for high quality reads.

Beyond this, they mask out potentially error’d bases: “the base call was only made if there was agreement in the base calls on the complementary strands”. So, while the overall error rate might be lower they limit their ability to detect individual SNPs… using this approach do you still retain the benefits of a lower overall error rate?

If a “read it twice” approach really is of critical importance to cancer diagnostics. There are a number of other techniques that would also work on Illumina’s platform (as discussed above). So this doesn’t feel like a key advantage of the Singular platform.

Other Stuff

Singular seem to be suggesting that they’ve developed a number of other analytical approaches and techniques and have a general purpose “multiomics” platform. The exact methods are not described. But they seem to be pushing further downstream into various applications. Many of which seem to be in 10Xs general territory:

Of particular note is their work on Single Cell and Spatial applications. They also mention Protein expression…

This is worth noting, but there not much in the S1 on the approaches used, and a quick search didn’t pull up any interesting patents here.

Business Stuff

That wraps it up for the technology. The S1 lists current investors, which contains many of the usual suspects:

Entities affiliated with Deerfield Private Design Fund IV L.P, Axon Ventures X, LLC, Entities affiliated with Section 32 Fund 2, LP, LC Healthcare Fund I, L.P, Revelation Alpine, LLC, Deerfield Private, Design Fund I.V., Domain Partners IX, LC Healthcare Fund I, ARCH Venture Fund IX, L.P., Axon Ventures X, LLC.

And that they’ve applied for the NASDAQ symbol “OMIC” which is a pretty cool symbol! They also state that they have 138 full-time employees (106 in R&D).

They state that: “Single cell, spatial analysis and proteomics markets: We are building our PX Integrated Solution to address the single cell and spatial analysis markets, which we estimate to be approximately $17 billion in 2021 based on available market data”. Given that 10Xs 2020 revenue was $298.8M. There’s probably something I don’t get here… where’s the rest of the single cell market?

——

So… those are my initial thoughts on the Singular S1. The basic sequencing approach looks very “Illumina-like” to me. There are a number of other plays like this around (e.g. MGI), and I suspect they will continue to put pressure on Illumina to reduce their consumable prices (which appear to be sold at ~10X cost of goods). But otherwise, I don’t see the approach as opening up new markets or giving us the ability to solve novel research problems.

DISCLAIMER: I own shares in various sequencing companies based on my previous employment. I consult in sequencing, and sequencing applications. And I’m working on a sequencing related project currently.

The Centrillion VirusHunter

DISCLAIMER: I’m currently working on a seed stage sequencing related project. So, keep that in mind when evaluating my comments. And get in touch ([email protected]) if this might be of interest.

There’s a new press release out from Centrillion suggesting their sequencing platform should be available now/soon. I’ve previously covered Centrillion, and recommend referring to that post for background on the company.

The press release doesn’t give much information, but references a recent publication, which provides more details. The paper describes a tiling array covering the entire SARS-CoV-2 genome. This is essentially a traditional microarray which covers the every possible SNP in the known SARS-CoV-2 genome. They describe this as follows:

“Here we describe a full genome tiling array with more than 240 000 features that provide 2x coverage of the entire SARS-CoV-2 genome and the use of such a genome tiling array to sequence the genome from eight clinical samples”

“Each base has two corresponding probe sets: one for the sense strand and one for the antisense strand.”…”one for each base”

So, Centrillion use a total of 8 probes per site. SARS-CoV-2 has an ~30Kb genome, which gives us the 240K features mentioned above. They use 25bp strands in their array. Tiling arrays have a number of limitations as compared to sequencing based variant detection. Firstly, it’s not clear what issues might be caused by multiple mutations covered by the 25mer. Such mutations would likely reduce hybridization efficiency, and could result in a variant miscall or “nocall”. The approach is also not able to detect deletions or insertions. These limitations make tiling arrays less interesting as a general purpose tool.

I personally would not characterize tiling arrays as “sequencing” or “resequencing”. In a tiling array you get a single read out for each position. Sequencing approaches generally produce a continuous readout of bases, without prior knowledge of the sequence context.

The paper presents a comparison of their approach against Illumina sequencing, summarized the in table below:

From the table above, noncalls are in the region of 1% seems to be one of the more concerning issues. I assume this means that both sets of probes essentially failed to provide a useful signal at these positions. The accuracy for the remaining positions seems reasonable (though less than I’d expect, given a known reference for the exact variant).

Summary

This is a fairly traditional tiling array for SARS-CoV-2 variant detection.

From the press release, it seems this is pitched as a lower cost alternative to sequencing. I don’t have good current pricing for microarrays, but I imagine the array they’re suggesting would cost >>$30, based on microarray prices I’ve seen. This is significantly more expensive than qPCR based testing (probably x10). This makes it too expensive for routine use in SARS-CoV-2 testing.

A detailed cost comparison against sequencing would be interesting. But I suspect we’re not talking about an order of magnitude cost difference. For me, this wouldn’t be enough to make an array based approach attractive, given that sequencing provides a richer dataset and more accurate.

Hopefully Centrillion will continue to develop some of their sequencing based ideas in the future.

Armonica update

DISCLAIMER: I’m currently working on a seed stage sequencing related project. So, keep that in mind when evaluating my comments. And get in touch ([email protected]) if this might be of interest.

There’s a new Genomeweb article on Armonica Technologies. So I figured it was probably time to revisit the approach, which I’d previously covered. The Genomeweb article is pretty vague, featuring one very blurry plot:

But it turns out they have a couple of posters on their website from AGBT that add more context. The following shows dNTPs (nucleotides free in solution). The idea is that the Raman spectra gives a characteristic signal for each base type.

The Genomeweb plot expands on this to show different base modifications. The individual dNTP traces show characteristic differences, but given that these are single traces (perhaps averaged over the surface, but I suspect not over experiments) it’s difficult to understand how much experimental variation there is. The poster does show traces under two different conditions (in solution versus dry monolayer). We can combine these plots and see how consistent the traces are between conditions:

In the above I’ve placed the traces side-by-side. For the most part the traces match pretty well, certainly A and T traces look similar. However the C and G traces don’t look as good. I’ve marked locations where peaks differ (red arrows). Without more to go on it’s difficult to come to a strong conclusion. But ultimately the consistency of these traces may limit accuracy.

From what I can tell this is the best data they have presented, there’s nothing from nucleotides on strands. Slowing the translocation of strands would be critical for this. In their other poster they make the following statement “Molecule penetration over the 5 μm wide barrier takes around 2.5 s for as-fabricated roof. Speed ~2 μm/s (>160 ms per base).”. 160 milliseconds would probably be enough time to get a single molecule optical readout. But the numbers don’t quite make sense to me… and suggest that this should be 160 microseconds. That would be faster than any single molecule optical imaging system that I’m aware of.

Beyond this we can make some guesses about what a platform built around this approach would look like. Taking the full spectra of each pore would be problematic. Hamamatsu do some really neat spectrometer modules, but I suspect having a spectrometer for each pore would be unrealistic.

I’d therefore imaging that you’d pick a number of peak locations which uniquely identify base types (as shown by the vertical bars in the Raman spectra above). Traditionally for these kinds of single molecule experiments you’d use sCMOS cameras. For the locations shown in the plot above you’d need 6 cameras, simultaneously monitoring pores during translocation. Typically these cameras cost >$10000 (for the cheapest models). Cameras will need to be fixed, monitoring a single region in real time. This will limit throughput.

As we’ve seen with PacBio however, it’s possible to integrate much of this onto a chip based platform.

Conclusion

Oxford Nanopore showed base identification in solution back in 2009. The first external data from that platform was publicly shown in 2014. So I’m skeptical of the statement that “If someone gave us $20 million tomorrow, we’d have a working product in two years.” as suggested in the Genomeweb article.

The platform being developed seems somewhat similar to PacBio’s, with an optical readout and realtime monitoring of strands. So at a first pass looking at PacBio numbers, a ~$1000 run cost and ~$500K instrument cost. I can imagine there’s an order of magnitude potential for cost reduction, but difficult to see it being cost competitive with Illumina.

Being able to detect base modifications is certainly interesting, particularly as there appears to be a consensus that base modifications are important for early stage cancer detection. The market for long reads is currently less clear. In the GenomeWeb article they propose that “the company will try to provide long reads with epigenetic information to guide therapy selection”. Which is certainly an important application, but not as large a market as NIPT/cancer screening.

DNA Sequencing and Japan

A Short History of Sequencing in Japan

Japan played an important role in the human genome project. Being the top contributor outside of the UK/US.

They also developed instrumentation, and technology used in the human genome project, for example multi-capillary sequencing, “The multi-capillary DNA sequencer equipped with Kambara’s device was commercialized in 1998 in alliance with Hitachi and Applied Biosystems“.

But after the human genome project, Riken and other sequencing centers in Japan, didn’t really transition into “genome centers”. In short, no “production scale” sequencing core facilities where maintained in Japan.

Japan was slow to take on next-gen sequencing, and made some significant mis-steps. For example, acquiring a number of the commercially failed Helicos instruments. This was likely a significant investment (probably enough to buy 8 Illumina instruments). While some interesting work may have been done on the Helicos instruments, it was’t really suitable for the kind of large scale projects (1000 Genomes project etc.) that other centers were engaging in.

So, large sequencing projects fell by the wayside in Japan… and both research and instrumentation seemed to stagnate for many years.

Japanese Components Used in DNA Sequencing

At the same time, Japanese companies have continued to play an important role in DNA sequencing instrumentation, ABI contracted Hitachi to develop their (failed) next-gen sequencing instruments:

But more importantly Illumina instruments used (and likely still use) a number of critical components developed in Japan.

The Illumina GA2 used at least the following components from Japanese companies:

  • Nikon x20 Objectives (and likely other optical components)
  • An ASI stage, incorporating IKO linear guides (and possible other components)

The Miseq uses two Sony DSLR grade CMOS image sensors (IMX038?), and likely other components of Japanese origin. The Hiseq instruments use Hamamatsu TDI linear CCD imaging sensors, which it seems are likely still used on the Novaseq and other instruments.

While I have no direct evidence, many other critical components (laser diodes, filters) are commonly made in Japan and may be used in Illumina instruments. So while much of DNA sequencing instrumentation is built upon Japanese technology, Japan has failed to capitalize on the added-value produced by DNA sequencing instrumentation.

Japanese DNA Sequencing Startups

In part because there have been few sequencing startups in Japan. Elsewhere I’ve listed ~40 startups engaged in the development of new approaches to DNA sequencing. ~30 of these are based in the US. 5 in the UK. And in Japan, 1 (of which I was previously the CTO).

This lack of sequencing startups has contributed to Japan’s lack of capability in this area…