Archive for the ‘Uncategorized’ Category.

Nooma Bio

This post previously appeared on my substack.

I was recently asked about Nooma Bio. This is a company and approach I’ve not looked at for some time. But it appears they published a couple of papers at the end of 2019. So here I’ll review the company and their 2019 publication.

I recently wrote about Solexa and much like that company, Nooma’s journey has taken a somewhat meandering coarse. The company was originally founded as TwoPoreGuys by Daniel Heller and William Dunbar in 2011. Dan came from a background in tech, having previously founded email software company Z-code which was acquired for ~9.4MUSD in cash and stock back in 1994.

Dan left TwoPoreGuys in 2018, and in 2019 the company turned into Ontera. Murielle Thinard-McLane took over as CEO. Nooma.bio was then founded in 2020, as a spinout of Ontera, and retains the same CEO and CTO.

Both companies are pursuing ionic solid state nanopores detection platforms. But from what I can gather, it’s Nooma that’s taking forward the original two pore approach. And this is what I’ll be discussing here.

Two Pores are better than One?

As originally presented the TwoPoreGuys’ approach used two nanopores and a small difference between two larger bias voltages across these pores. The original website, is unfortunately long gone, but the youtube videos are still up:

As presented, this never made much sense to me. The above seems electrically the same as setting a 20mV bias voltage. The two pores in this original approach were also purely for motion control, the idea was that there would be other sensors (they give the example of tunneling current electrodes) in the gap between the pores.

Ontera’s (pre-Nooma) 2019 paper takes basic approach in a slightly different direction. Here they use two adjacent pores on a planar substrate, in a three electrode system:

A double stranded (negatively charged) DNA translocates toward the most positive electrode. By changing the bias voltage at V₁ we can the most positive point either V₁ or V₂. All our ionic currents on the other hand will flow between V₁ / V₂ and our ground point which is in the chamber between the two pores.

Electrically the system is pretty simple. In the diagram below R1 and R2 represent our two pores, and in reality are in the Teraohm range:

This approach gives us three principal advantages over a single pore system:

  1. We can sense the DNA twice (once as it passes through each pore).
  2. We can flip the voltages around to reverse the DNA translocation.
  3. We might get better motion control by confining the DNA between two pores.

In particular the ability to “floss” a single strand backward and forward through the pores gives multiple observations of the same molecule. You can of course also do this with single pore systems, but the author’s suggest that the two pore approach helps maintain the strands orientation.

To demonstrate this in their 2019 paper they bind a couple of Streptavidin protein tags to Lambda DNA. As these tags pass through the pore they show sharp dips in current as they block the pore. As soon as two tags have been detected, the voltage is flipped and the DNA will translocate in the back through the pores, in the reverse direction.

They can floss the same strand hundreds of times. However, it seems like there’s a fixed probability a tag will be miss registered, and the strand being ejected from the pore. This results in an exponential distribution of events/per strand (37% of the events had less than 5 scans):

While the individual scan duration (which I think is a rough proxy for tag-to-tag dwell time), looks roughly poisson:

The paper shows a number of different experiments, with up to 7 tags at nick sites on Lambda DNA. 

Thoughts

The approach described might be usable for mapping applications. By averaging the multiple “flossed” observations you can get better estimates of the tag to tag distance. The resolution here seems like it’s on the order of a few hundred nucleotides.

So, you can imagine a platform where you nick and tag DNA and read it on a 2 pore platform. The problem is, that we already have pretty reasonable mapping tools. And the market (~$10M?) may not justify the development costs.

Unfortunatley, to me the approach doesn’t seem to be compatible with DNA. The method doesn’t slow translocation sufficiently to be able to detect single bases. In the paper, they use a 10KHz bandwidth, and Lambda strands seem to translocate in ~10ms. Which is less than 1 data point per base.

And if DNA is problematic, using this as a nanopore protein sequencing platform is likely even more challenging.

Increasing the bandwidth much beyond 10KHz isn’t very practical, and it’s not clear that you can slow the translocation (particularly of single stranded DNA) much more using this approach.

In any case, solid state pores have not yet reached feature sizes where DNA sequencing becomes practical, and this 2pore approach doesn’t seem like it would be compatible with protein nanopores and enzymatic motion control.

I do like the fact that you’re precisely stretch the strand between two points, and that you can obtain information on the strand from a first pore, before it translocates through a second. There’s one patent from Nooma discussing Material Sorting Applications which seems like an interesting idea, that could take advantage of this unique feature.

In any case, I’ll be keeping an eye on Nooma. It seems like they’ve developed an attractive technique in search of a compelling application.

Nautilus Prospectus Review

This post originally appeared on my substack.

My previous post on Nautilus covered their published IP and was originally released to paid subscribers. After it was released publicly, someone forwarded me the Nautilus’ prospectus. While the information in the prospectus is largely in line with that presented previously, it provides more concrete information on their implementation.

In the prospectus they describe “scaffolds” these are likely the SNAPs (nanoballs) from their patents. The scaffold prep process sounds non-trivial as they state flowcell/sample prep takes ~2 hours.

They also make it clear that their current chips have 10 billion sites. This is interesting because it helps frame the value of their arraying approach. To put this in context the Hiseq 2500 was able to reach 4B reads without a patterned flowcells/arraying. So, while the arraying technology is interesting it still doesn’t seem of fundamental importance to getting the platform working. The arraying approach seems to work well though, with >95% of sites being active:

There are a couple of statements on cycle requirements in the document that appear slightly contradictory. The first is “A typical 300 cycle run will generate approximately 20 terabytes of data” the second is “it takes roughly 15 cycles of multi-affinity probe biding events to uniquely identify a protein”. If you only need 15 cycles, why do you typically run to 300?

They present this plot showing classification of a protein using multi-affinity reagents:

My guess is that there are multiple sets of multi-affinity reagents. So it’s more like out of this set of 300 reagents we can find 15 that will give a unique signal for a particular protein. If this is true, then the 15 cycles statement isn’t very meaningful. And it sounds like you need to do 300 cycles in general. This implies a complex reagent cartridge and fluidics system. They state that “Nautilus intends to utilize over 300 complex reagents and various antibodies” which backs this up.

But it seems like proteins stay attached though these long runs (<1% loss):

Beyond this, they don’t say anything regarding affinity reagents other than they can use a “wide variety of “off-the-shelf” affinity reagents” and “we have developed a proprietary process for high throughput generation and characterization of multi-affinity probes”. I’d guess the high throughput generation approach is likely the sequencing based aptamer evolution approach described in their patents.

On the commercial side, they appear to be targeting a 2023 launch. So it will likely take some time for us to find out how well the platform works in practice. 

Pricing is less clear but they say it will be “in-line with mass spectrometry system budgets allocated for broad scale proteomics applications, and thus with a premium instrumentation average selling price”. However I suspect the consumables pricing will look pretty different to mass spec. That 300 reagent cartridge and patterned flowcell doesn’t sound like it’s going to be cheap.

QuantumSi Prospectus Review

This post originally appeared on the substack.

previously looked at QuantumSi’s protein sequencing approach back in September. But recently someone forwarded me their prospectus. Having recently reviewed Nautilus it seems like a good idea of revisit QuantumSi. In this post I provide an update based on my previous thoughts but you may want to refer to that post for details from their patents.

Technology

The basic process QuantumSi use to sequence proteins can be briefly described as follows:

  1. Fragment proteins into short peptides, and isolate in wells.
  2. Attach a label to the terminal amino acid, and detect the label.
  3. Remove a single terminal amino acid.
  4. Go to step 2 to identify the next amino acid.

At a high level is not unlike single molecule sequencing-by-synthesis, in that monomers are detected sequentially. The difference here being that rather than incorporating monomers, in this approach they are cleaved. QuantumSi appear to fragment the proteins prior to sequencing. I assume this is to avoid secondary structure issues. But it does mean they are getting fragmented sequences rather than an end-to-end sequence for the entire protein.

When I reviewed their patents, it was reasonably clear that you’d be unlikely to get an accurate protein sequence. It’s more likely to be a fingerprint. This means that rather than being able to call a “Y”, you’d likely be able to say this amino acid is one of “Y,W or F”.

The Prospectus suggests that can resolve these ambiguities by looking at transient binding characteristics. The “affinity reagents” they use don’t bind and stay attached. Rather they have on/off binding. So you’ll see them attach, generate a signal, then detach, then another one bind etc. Ideally a reagent that binds to “Y,W or F” might bind more strongly to one (e.g. Y) than another (e.g. W) and you can use that information to infer the amino acid type.

As mentioned in my previous post, they use fluorescent lifetime determine which affinity reagent is bound. So for every detection event they have two pieces of information, the affinity reagent type (from fluorescence lifetime, and intensity) and the binding kinetics (from the on/off rate). They call this 3 dimensional data (fluorescence life time, intensity, and kinetics).

The nice thing about this is that while you will need various reagent types, you don’t need a complex fluidic system and you are observing, and classifying them in real time. 

However, I’ve not seen anything that suggests the classification works well enough to give the full sequence. And they state that this “will ultimately enable us to cover all 20 amino acids”. Suggesting that they currently can’t.

Overall, the above approach is in line with my previous speculation based on their patents.

Chips

Like Ion Torrent, they make a big deal out of using semiconductor fabrication for their sensor: “similar to the camera in a mobile phone, our chip is produced in standard semiconductor foundries”. I generally take issue with this argument. Semiconductor fabrication is great. But if you can’t reuse the sensors it’s more like buying an expensive camera, taking one picture, then throwing the camera in the trash.

This isn’t to say that semiconductor sensing isn’t interesting… but there are other issues that need to be considered. They also talk about Moore’s law, suggesting that if “Moore’s Law remains accurate, we believe that single molecule proteomics…will allow our technology to run massively parallel measurements”. Aside from Moore’s law clearly being in trouble, this doesn’t make much sense, as there are other physical limits involved here.

From various public images, I’d guess the chip is ~15 to 20mm². PacBio chips (which use a very similar approach) have ~8M wells on a chip that appears to be roughly the same size. I didn’t see an explicit statement on read count, other than “parallel sequencing across millions of independent chambers”. But my best guess would be in the 10M range.

This puts them at the low end of throughput as compared with Nautilus and other next-gen proteomics approaches.

Product

The product has 3 components. A sample prep box (Carbon) the sequencing instrument (Platinum) and a Cloud based analysis service. Unlike Nautilus they suggest that primary data analysis happens on instrument. The instruments combined pricing is supposed to be in the $50,000 range, which is relatively cheap.

Commercial Stuff

QuantumSi say they have already initiated their early access program, but I’ve not heard of anyone else talking about this publicly. They are aiming for a commercial launch in 2022. And say that their addressable market is $21 billion. This breaks down as follows:

Of this, I think the true addressable market is closer to the $5B legacy proteomics segment. It doesn’t seem realistic to use the proposed approach for health care/diagnostics in its current form. Partly because the per-sample COGS is likely pretty high, and partly because for these applications you may want a higher throughput instrument.

They also suggest that in the future they will release lower cost instrument for at home testing:

Conclusion

QuantumSi’s approach is closer to sequencing than Nautilus, but I suspect the platform will still not give a true amino acid sequence when initially released. For the reasons highlighted in my previous post protein sequencing is just much much harder than DNA sequencing. So, like Nautilus what they’re developing may be more of a protein fingerprinting device, where traces are compared against a database of known proteins.

This begs the question: what’s the value in a relatively low throughput protein fingerprinting instrument? Where exactly the throughput spec needs to be set to be useful, particularly for diagnostic applications isn’t clear to me. But 10 million reads would certainly seem to be on the low end. I’ll try and address this in a future post.

Nautilus Biotechnology

This post previously appeared on my substack.

Company Background

Nautilus (originally Ignite Biosciences) was founded in 2016 by Parag Mallick and Sujal Patel. Sujal was previosly CEO of storage company Isilon. Isilon storage gained huge popularity in genomics for storage next-gen sequencing data. They had deal with Illumina at one point, and were probably the easiest way of getting scalable storage up and running. More recently Isilon’s popularity in genomics seems to have wained, with users switching over to cloud based solutions. 

youtube interview with Sujal covers Nautilus’ background and how Sujal got involved in Biotechnology coming from a tech background. Parag, like many others was using Isilon’s platform for genomic applications. In his interviews Sujal draws parallels between Nautilus’ proteomics platform and the explosive growth of next-gen DNA sequencing.

However, Sujal also positions Nautilus as most appropriate for Pharma applications. This is very different than next-gen DNA sequencing where Pharma was not an early driver of growth (and still isn’t).

They raised a $76M series B in May 2020. And like seemingly everybody else, are doing a SPAC.

Technology

Nautilus Biotechnology is building a high throughput single molecule protein fingerprinting platform. There are a few other companies doing this (Encodia, QuantumSi, Dreampore, Erisyon (disclosure, I’ve worked with Erisyon in the past, but hold no equity)).

Looking over their patents, there seem to be 3 areas of innovation:

  1. A method for arraying single proteins on a surface.
  2. A method for identifying/fingerprinting proteins.
  3. A method for developing libraries of affinity reagents for use in fingerprinting.

I’ve cover each of these in turn below and then review the complete approach.

Arraying Single Proteins

If you randomly stick proteins (or anything else) to a surface there will be some probability that two of more proteins will be right next to each other. If that happens you wont be able to resolve the single proteins and a mixed, unusable, signal will be generated.

So, random attachment limits throughput. Many platforms run into this problem, in Solexa/Illumina sequencing on the Genome Analyzer 40% of reads were from “mixed” clusters and were discarded. On Oxford Nanopore’s device you have bilayers/wells with multiple pores which are not easily usable. In general such approaches are “Poisson limited” in a well based system this means 37% of wells will have single occupancy.

Most platforms attempt to solve this at some point. Illumina introduced patterned flowcells and ExAmp. Genia have worked on a pore insertion approach, to ensure single occupancy.

But in general, it’s not something that makes or breaks a platform. It only limits throughput, doesn’t effect data quality. So it’s usually addressed in a second generation product.

Nautilus however have been working on this issue for proteomics. Why they are focusing on this at an early stage isn’t clear. But one possibility is that mixed signals are particularly problematic in protein fingerprinting. That is to say, they can’t easily be classified as mixed. This could cause a significant fraction of proteins to be misclassified.

The Nautilus arraying approach works by creating a kind of adapter molecule which they call a SNAP (Structured Nucleic Acid Particle). This is a DNA nanoball (created using rolling circle amplification, similar to MGI in their sequencing platform). But it’s structured such that there’s a single site on the nanoball to which a protein can attach. The advantage here is that a SNAP can be relatively large and sit on a lithographically fabricated site on a surface, likely of a few 100nm. The result is an array of easily separable sites on a surface, each of which presents a single protein attachment site.

Their patents suggest a number of methods for making SNAPs or similar structures. But the DNA nanoball approach seems like the most obvious and the only one which appears to have experimental support. It looks like they need to do size selection on the nanoballs, which complicates the process somewhat. But they seem to have some images showing single dyes attached to the nanoballs.

How well this might work in a complete platform, with a complex sample is unclear.

Identifying/Fingerprinting Proteins 

To me the patents relating to this part of the Nautilus approach felt the weakest. Essentially they say use a number of different affinity reagents (aptamers, or antibodies) to generate binding signals. Then use those binding signals to determine which protein is present.

So, you’d flow in one reagent, get a binding signal, flow in a second, get another signal, etc. All this binding information is then compared to a database of known protein binding fingerprints. The patent seems to refer to between 50 and 1000 reagents. In a youtube interview, Sujal suggested this generates 10 to 20Tb of image data.

They also suggest that you can also use affinity reagents that specifically bind to trimers or some other short motif. This almost then beings to look similar to a sequencing-by-hybridisation approach. Where you generate short reads and overlap them to recover the original sequence.

The patent I’ve looked at is completely theoretical, all the examples are simulations. The idea itself seems relatively obvious. The claims and specification make a big deal out of the process being iterative. But this doesn’t seem to be hugely significant to me, and somewhat obvious. The patent has a single claim. This claim in framed in terms of comparing binding measurements against a database. This suggests to me that they’re not seriously looking at sequencing-like applications.

Developing Libraries of Affinity Reagents

third patent refers to methods of developing affinity reagents for use in the above process. Here they talk about methods for generating aptamers and other affinity reagents. The aptamer generation process seems to be a relatively standard aptamer evolution approach:

They seem to have performed a slightly smarter aptamer selection process than that described in the flowchart above. In this process they create candidate aptamers then sequence them on an Illumina sequencer. This gives them the sequence and location on the flowcell of each aptamer. They then wash a fluorescently labelled protein over the flowcell and measure binding. This gives them a high throughput way of measuring aptamer-protein binding efficiency. I suspect they’re not the first to do this however. The approach will likely mostly be complicated by the fact that Illumina have made it harder to modify the sequencing protocol on recent instruments.

They present some data from this approach, and show binding versus protein concentration for a few aptamers:

However, the aptamers they’ve discovered don’t seem to be covered in the patent. Perhaps there’s an unpublished filling which covers specific aptamers or other affinity reagents in more detail.

Elsewhere in the patent they discuss generating affinity reagents (likely antibodies) that specifically bind to 5mers. They propose doing this by first creating 2mer and 3mer specific affinity reagents and combining them.

The aptamer stuff in this patent is the most convincing. And I suspect they’re working on an aptamer based solution, however aptamers have a somewhat troubled history. It seems by no means easy to get an aptamer based platform working well. While the patent is interesting because it discloses some details of their approach. The patent itself doesn’t seem very strong. It originally had 131 claims. 111 of these have been cancelled, leaving a single independent claim.

Conclusion

In summary, they seem to be building an optical single molecule protein fingerprinting platform. Proteins are arrayed on a surface, exposed to a number (perhaps 100s) of fluorescently labeled affinity reagents. These are probably a combination of aptamers or antibodies.

By combining the binding information from all these different reagents, they can produce a unique fingerprint for a single protein. And by comparing this to a database of known proteins they can calculate the abundance of each protein in a sample. Because they’re using an optical approach this should be relatively high throughput. They also have IP on a chip based (QuantumSi/PacBio-like) platform, but to me this seems less scalable…

There are a number of applications for such a platform, but they mostly talk about Pharma (drug design, evaluation). Where such an approach would provide a more sensitive method of evaluating the performance of a drug, and how it effects protein expression.

For me, the most developed part of the approach is the arraying technology (using SNAPs). But this isn’t really required to get the platform up and running. It also doesn’t create any kind of IP barrier. It helps push throughput, but it isn’t clear to me that it’s of fundamental importance in building a protein fingerprinting platform.

The other parts of the approach (from the published patents) seem less well developed. I’d also note that while Sujal draws parallels with DNA sequencing, this approach is closer qPCR or DNA microarrays. Where you’re comparing detection events against a known database.

I’ll be watching with interest, but at the moment I’m more excited about approaches that provide “de novo” information that’s a little closer to sequencing than fingerprinting.