Archive for the ‘Uncategorized’ Category.

Dreampore – Nanopore Protein Sequencing

This post was originally published on substack.

Company

Dreampore is a French Nanopore Protein sequencing company based in Paris.

There’s not much information available on Dreampore, most of it comes from a Genomeweb review from December 2019. In this article they state that Dreampore has raised €600,000, and they have four employees. As far as I can tell from their company registration they were founded in 2018 and currently have 3 to 5 employees. According to LinkedIn, the CEO (Luc Lenglet) is also leading two other companies. I could only see one current employee on LinkedIn who appeared to be fully dedicated to the company.

Technology

Surprisingly I wasn’t able to find a patent covering the work presented in their Nature paper on protein sequencing. So this review is based on the publication only.

The work uses a protein nanopore, and detects molecules as they pass through and block a bias current. This is much like other forms of nanopore DNA sequencing. 

The publication builds on a previous work where they detect translocations of >7mer arginine (R) homopeptides. I’ll be covering this in a future post, because it’s kind of interesting in its own right. But essentially these RRRRRRR peptides block the aerolysin pore for a detectable duration. In the sequencing paper they use xRRRRRRR peptides where the x position varies. The poly-arginine region helps the peptide stick around in the pore long enough to be detected. But the idea is that the blockage current varies enough based on the single differing position.

And histograms suggest that in most cases it does:

When you look at the full set of amino acids, current blockages are less well separated:

The plots above use Ib/I0. This appears to be the signal normalized against the baseline current. It’s not super common to do this, and I wonder why have normalize against the baseline, rather than just measuring the offset against the baseline in pA. Possibly their measurements vary significantly with buffer concentration…

The raw ABF files (which suggests measurements were taken on a Axopatch) are available. So it’s possible to confirm this. But the scaling makes the plots a little harder to interpret. From example traces it looks like blockages are probably between 60 and 70pA (they all appear to be 0.3 and 0.4 in scaled units, and a typical baseline current appears to be ~100pA). So, you’re cramming 20 states into ~10pA. The best you’re like to do in terms of noise is likely ~1pA RMS at 10KHz.

They have a plot in the supplementary information which shows that in practice, they get about 10 pA of peak-to-peak noise on blockages.

From the supplementary information the average dwell time seems to be ~5ms (which remember is for 8 amino acids). So, let’s say 1ms per AA. So if we average down to 1KHz, we can probably get this to ~1pA of noise. 

It seems likely that if they attempted sequencing, multiple positions are likely contributing to the signal. Let’s be conservative and say 3 positions. For 20 AAs that means 8000 possible combinations. So I’d speculate this comes down to:

0.001pA difference between each state and 1pA of noise

Which seems like a very hard problem to solve. Certainly one or two orders of magnitude harder than nanopore DNA sequencing.

Conclusion

The positive side of this paper, is that they’ve clearly shown differences between most amino acids. In practice, I don’t think these differences are good enough to clearly differentiate between all 20 AAs. But it does indicate that if you had a way of sufficiently slowing the translocation of a protein you might be able to show some kind of characteristic signal.

The remaining problems are however two fold:

  • How do you slow the translocation of proteins sufficiently.
  • How to you deal with contributions from adjacent bases. 

Both these problems are pretty tough. On the plus side, we likely only need to generate a characteristic fingerprint for a protein to be able to address useful applications. But even to get to that point, the above problems likely need to be addressed.

This paper suggests that with further work, it might just be possible. I’ll be keeping an eye on this and other nanopore protein sequencing approaches, as any kind of usable data from such a platform would be pretty exciting.

SBIR – America’s Training Program?

I’ve been thinking about the US SBIR (Small Business Innovation Research) grant program. The SBIR program gives grants, generally to what would be called “deep tech” companies. The grants are supposed to fund research and product development. I’m mostly familiar with the genome sequencing technology grants (which come via the NHGRI). These are grants on the order of ~$200K to a few million.

The SBIR program is also pretty unique. Elsewhere, government grant funding usually takes the form of matched funding. To a seed stage startup, matched funding is largely useless… after all if you’ve got no money what are you going to “match” it with. So, US SBIRs are unique in this regard.

SBIRs always look appealing to someone looking at the US funding ecosystem from the outside. Illumina received an SBIR in 1999 when it was only a year old. But at the same time, while probably $10M+ has been given out to DNA sequencing companies via SBIRs, the dominating technology was developed in the UK and acquired by Illumina. So if the purpose of these grants was to promote technological innovation… it doesn’t really seem to have worked.

Recently I’ve been thinking about SBIRs in a different light. Perhaps it’s better to think about the SBIR program as America’s technology training program. Illumina could have built out research and development pretty much anywhere. After the Solexa acquisition they could have built out operations in the UK. And to an extent they did. But the bulk of Illumina’s growth has occurred in the US.

Some of this is because they’re a US company and that’s what US companies do… but I’d also suggest that it’s just easier to hire scientific staff in the US. It’s easier in part, because there’s a community of (lets face it often failing) SBIR funded research companies, which have trained up staff in skillsets useful to companies like Illumina.

I decided to test this hypothesis with a quick LinkedIn search. I looked at 20 employees at Illumina who work in a scientific roles, in the US. Of these 50% had previously worked at SBIR funded companies, often quite early in their careers.

It was likely this previous scientific experience that made them attractive to Illumina. And is no doubt one of the reasons that Illumina would find it easier to build out research and operations in the US.

So.. the SBIR program while might in some respects look like a failure… it probably does a lot of help the US maintain technological dominance in certain industries.

Centrillion Update

After the last Centrillion blog post, Centrillion contacted me with some observations. The full text of these is included at the end of this post for your reference. So I thought I’d do an update post addressing some of their comments, and some thoughts after looking over their website.

There are a few points made in these emails/tweets:

  • Their service is $99.
  • Illumina based sequencing services they say typically cost $200-$300.
  • That labs could run the Centrillion service for <$30.
  • That Illumina costs $100-$150 per sample in terms of reagents.

I think the Illumina costs above for SARS-CoV-2 sequencing are probably out by an order of magnitude. If you like at Illumina’s own guideline pricing. They have costs per sample for multiplexed runs of $18 and $24. Different protocols, but for multiplexed runs I think you’re looking at the $10 to $20 run range for Illumina sequencing of SARS-CoV-2. There are examples which put the total cost at ~$100. And RNASeq has been available from service providers for $100 to $200 per sample for a while. So in summary my estimate on Illumina SARS-CoV-2 sequencing would be $15 per sample, sold at ~$100 including sample prep/labor.

They don’t give accurate pricing, but at $99 I don’t think the Centrillion approach is competitive with Illumina. If it’s much cheaper in volume, <$10 perhaps? But I can’t see this as being a more versatile replacement for qPCR, for example…

Someone mentioned their website. I hadn’t actually looked over the site before, so was surprised by the following:

“The variant output file is automatically generated using VirusHunter™ software. It enables rapid sequence analysis and strain or clade determination using published variant data. Whole genome FASTA and FASTQ files are output at the same time to enable deeper sequencing analysis using custom pipelines.”

Generating fasta/fastq files from microarray data feels a bit off. Most of the data in those fastq’s will be derived from the probes/known sequence. I don’t think you can really use this is accurately assign quality scores. And personally feel like it somewhat misrepresents the data.

Keith Robison also asked on the twitter thread, if they had data for closely spaced changes/deletions. I think that’s an important question, they don’t seem to have replied, but perhaps there’s some further data they’ve released.

Overall, I don’t really feel that microarray data and sequence data are comparable. And find characterizing the Centrillion array as “sequencing” inaccurate… though it possibly has its uses.

Twitter Messages:

Centrillion: Hi! Saw your blog on our chip: https://41j.com/blog/2021/04/the-centrillion-virushunter/… Wanted to say that the cost per sample is around 5-10x lower than the cost of sequencing with Illumina (including the sample prep and actual sequencing costs). We’re actually offering sequencing services starting at $99 per sample which goes down with increased sample number. Illumina based sequencing services typically cost $200-300 per sample for SARS-CoV-2. Prices for services are higher than raw reagent costs, of course. I can’t disclose the actual chip cost but we are marketing them currently so you can always send a request into our website for a quote if you want to find out yourself. Your estimated price of >$30 is too high. 🙂

new299: Thanks! The costs you suggest for SARS-CoV-2 sequencing are higher than I would expect. Do you have a good reference for this or cost breakdown? Do you mean you would sell your service for <$30 in volume? Are you ok for me to update the blog with this information?

Centrillion: Labs could run this for <$30 per sample without high volume required. To offer services, you have to account for overhead and the costs of employing people, which means labs can’t offer the services for the reagent cost

Centrillion: I have a cost breakdown but I don’t know if I’m allowed to share it. We’re putting the protocol up on http://protocols.io so it should be easy to calculate once that’s up.We’re also planning to kit reagents which should make it super easy to breakdown.We can also make our chips 1/4 of the size with just the first core 0. I think pricing is somewhere under $10/sample at that pointWe just haven’t been doing that yet because we haven’t had the volume to run 384 samples at a timePlease feel free to update your blog with any information here. My boss said he was emailing you as well with more details that I wouldn’t know if I could share.

I think the cost we calculated for illumina was $100-150 per sample in terms of reagent costs meaning labs providing illumina sequencing would charge $200-300 to cover their overhead

Emails

Centrillion:

Hi, 

thank you for reading our press release and the paper and your analysis about the chip.  I agree that the press release is brief and I would like to provide a bit more information in case you are interested. The product description and product sheets are now at our website www.centrilliontech.com.

The chip set has four cores (we called it QuadCore). The published paper uses the first Core of the set and is a bit outdated, because it was submitted a while ago. The chip set has been in field testing since last June. The other cores contain probes for variant validation and for sequencing other respiratory viruses including all important coronaviruses. Manuscripts about other cores are under review.

We developed new chemistry for spatial genomics (long probe chemistry requires higher efficiency). We used the same chemistry to create these chips. As a result, the chips produce excellent signal, and our fast protocol uses just one hour hybridization vs traditional arrays (16-24 hours). The entire artic work-flow from sample to result can be done in a single working day. A related diagnostic version (regional sequencing for detection instead of whole genome) can be done with a 15 min hyb or about 2 hours from sample to result. Wafer scale manufacturing makes the chips very affordable (much lower than $30) so the overall sequencing cost is similar to or just a bit higher than most RT-PCR based tests. We currently offer services at $99 per sample, but pricing is reduced significantly when ordering in bulk.

There is the option to use only core 0 on our chips, which enables 384-well packaging at an even lower cost to enable larger scale applications. 

There are occasional drop out regions because of ARTIC primer issues and sample prep sensitivity. The paper touched on this a little bit; it is an issue in NGS seq as well for samples prepared using ARTIC primers. It is not due to the chip or algorithm. Our updated sample prep method produces better coverage and accuracy than the earlier version used in the paper. We see coverage and accuracy of up to >99.9% and 99.99%. 

Overall, the workflow is simple and faster than NGS. It has actually been used in some interesting situations where results generated using the chip had already been acted upon, before NGS based sequencing result was made available. Our scientists know a lot more about it and they love using it. If you are interested, I can connect you with the team about the details or address your questions.

We were frankly a bit surprised at and amazed at the performance of these chips for sequencing viral and human genes. These are much better than earlier generation of resequencing arrays and we are considering other applications with similar chips and would love to collaborate with the genomics community.

Centrillion:

I saw that my response to your twitter msg was bounced.  I am not sure whether you received our earlier response.  Basically, the core 0 of the chip set used to sequence SARs-CoV-2 genome costs few dollars.  Most of the cost is in RT-PCR reagents, plastics, and labor.  Our preference is to provide chips, since service capacity is always limited in a single facility, but at large volume, <$30 per sample is certainly possible through savings in labor and reagents.  
However, if one only has only few samples, the chip method actually costs much less than NGS methods, because NGS sequencing cost is very much dependent upon how many samples can be multiplexed in a flow cell.   
the chip set can also be used with other sample prep methods such as random priming amplification.  The cost analysis above is for the more standard Artic method.
If you have any further questions, please do not hesitate to contact us.

My reply:

I saw the earlier response, sorry I’ve been busy.
I plan to write an update blog post. Are you ok with me incorporating what you wrote below?

Centrillion:

Sure.  Thank you for writing about it!  

MiSeq Cost Analysis

I’m curious to understand how cheap Illumina’s run cost (COGS) can get (as opposed to the cost per base). Illumina broadly have 4 classes of instrument. The iSeq, Miseq, NextSeq and NovaSeq. The NextSeq 550 seems like a Miseq++ and the NextSeq 1000/2000 a NovaSeq– (as the former lacks patterned flowcells, and the later doesn’t have the throughput of the NovaSeq).

The iSeq flowcells embed a CMOS image sensor. It’s likely difficult to get to a really low COGS here. The Novaseq uses patterned flowcells which they sell for >$10000 and likely have associated costs. So that leaves the Miseq.

The Miseq uses likely relatively inexpensive glass flowcell, and reagent cartridge. So I decided to take a look at that platform. The overall summary is that I’d guess at a lower bound a Miseq could be made for $5000, reagents likely cost at least $50.

Instrument

I purchased an old MiSeq camera from eBay. I can’t find my notes but I remember it used a Sony image sensor from a DSLR camera. Likely a monochrome version of the IMX038. This is built into a custom camera module with a Xilinx FPGA and associated memory. This is the same sensor used in the Nikon D90, which retailed for $900. I think it’s unlikely that Illumina could get this sensor the relatively low volume and integrate it with an FPGA and memory for less than this.

FCC reports provide a block diagram of the instrument:

You can get a rough idea of what’s in the platform from this. There’s a PI Z stage. A Y-stage. Illumination uses LEDs rather than lasers. A bunch of TECs for temperature control. And some embedded compute (I’d guess x86/AMD64). Using this information my best guesses at the BOM cost for the most expensive components would be:

  • Cameras 2x Sony IMX038: Estimate $1800
  • Z-Stage (PI): $1000
  • Y-Stage: $100?
  • HDD: $50
  • Compute: $200
  • LEDs/photodiodes: $50
  • TECs: $50
  • Nikon x20 Objective: $400
  • Fluidics System: $300

Which gives us a total of $3900, for an instrument that costs the region of 50 to 100K (if you have exact pricing let me know). $3900 is a lower bound in my view, putting the instrument together in low volume likely adds significant overhead. Still I don’t see any reason why you couldn’t put together a MiSeq like instrument for around $5000 in high volume.

Flowcell and Reagent Cartridge

The Miseq flowcell seems to a simple glass substrate, with covalently bound oligonucleotides for capture. I’d be curious about accurate costings here, but I suspect it’s pretty cheap. Probably <$10. There are various companies you can buy flowcells from in relatively small quantities for not much more than this.

Which leaves us with the reagent cartridge:

Digging through various bits of documentation it seems like we have 14 independent reagents in this cartridge:

1IMTIncorporation Mix
2USMScan Mix
3CMSCleavage Mix
4AMS1Amplification Mix, Read 1
5AMS2Amplification Mix, Read 2
6LPMLinearization Premix
7LDRFormamide
8LMX1Linearization Mix, Read 1
9LMX2Linearization Mix. Read 2
10RMFResynthesis Mix
11HP10Primer Mix, Read 1
12HP12Index Primer Mix
13HP11Primer Mix, Read 2
14PW1Water

Most of these are either oligos (primers), polymerases, or nucleotides. None of which I imagine are particularly expensive. But it’s substantially more complex than the flowcell, and I imagine that QCing and putting this cartridge together along with the logistics around shipping it to customers is the dominating cost.

It’s hard to get accurate volume pricing on these reagents. But I suspect that it costs >$40. Which would put the cheapest MiSeq runs (~$500) within Illumina’s overall 90% margins on consumables.