Apton Biosystems Update

I’ve previously written about Apton Biosystems. When I wrote that post there wasn’t much to go on. However, a patent [2] has recently been published which reveals a bit more information.

The motivation stated in the patent is that “to reach a $10 30× genome”…”the amount of data per unit area needs to increase by 100 fold”. Elsewhere in the patent they mention that the prior art is a pitch of 1 micron. HiSeq wells were ~500nm. So they want decrease well size to ~100nm.

This premise, is slightly shaky as Illumina flowcells and reagents are sold at significant profit. I imagine a large part of Illumina’s costs are related to logistical issues, rather than consumables themselves.

In any case, the patent proposes a massive cost reduction by more densely packing DNA on the flowcell. The patent mostly refers to ordered arrays, and many examples refer to a single molecule approach. The basic chemistry however seems to be pretty standard Illumina style sequencing-by-synthesis.

The figure below shows a simulation of DNA attached to a surface, at varying pitch (spacing). The right-hand images are de-convoluted versions of the left. It’s clear that as the pitch gets smaller, the image gets more crowded, and it’s harder to identify individual spots.

When imaging using a standard optical microscope, you would expect your density to be diffraction limited. Essentially, you can’t clearly identify features smaller than the wavelength of light (~200nm)… normally.

However, a number of recent techniques have broken the diffraction limit. These have allowed optical microscopes to resolve features down to 10s of nanometers. In this patent, Apton apply some “super-resolution”-like approaches… but in a limited scope (we’ll revisit what Illumina might be doing here later).

A basic super-resolution approach is shown below (not from Apton):

From [1]. The images above show the signal detected from a individual fluorophores. Each pixel is 13um, using 150x magnification this covers ~86nm on the surface. To generate super-resolved locations they do a Gaussian fit/find the fit of the intensity registered from a single fluorophore.

Each “peak” in part A of the figure above is the signal from a single fluorophore. Because the peaks are well separated we can extract each one and look at its distribution. In part B we see a single distribution. This is a 2D Gaussian. If we just took the pixel of highest intensity as the location of the fluorophore our resolution would be diffraction limited to ~200nm. However, by performing a Gaussian fit over the distribution we can determine the location to sub-pixel resolution. In this case, they could identify fluorophore locations at a final resolution of 1.5nm.

The above approach only works because the flurophores are well separated. If the Gaussians overlapped, the fit wouldn’t work. In the image above you can see the FWHM of the Gaussian is about 3 pixels, this represents ~250nm on the surface. I’d imagine if flurophores were any closer than this you’d have issues.

In their patent, Apton use the above approach to identify positions of single DNA strands on the surface to a sub-diffraction limited resolution of “10 nm RMS or less”. Apton appear to use essentially the above approach. However they have a problem, they want to pack the molecules as closely as they can to improve density. This means they are not well separated like those in the figure above.

To get round this Apton seem to use a couple of approaches. The first is that they use signals from multiple cycles to identify molecule positions. If the oligos attached to the surface have a fairly random distribution of bases (like the human genome) this should help a lot. For each molecule, you can select a cycle where it is illuminated, but none of its neighbors are. This means there is no crosstalk at this position and you should be able to get a good estimate of its position.

The figure below shows what appear to be images from a real single molecule experiment:

“right panel, shows each peak from each cycle overlaid. The left panel is the smoothed version of the right panel. Each bright spot represents a molecule. The molecule locations are resolvable with molecule-to-molecule distances under 200 nm.”

The right image shows positions identified from different cycles piled up (super-resolved positions I assume). In the left image they’ve used these to create another Gaussian. I would guess they then take the peak of this second Gaussian to give the final location of the molecule. This way, they can incorporate information from multiple cycles to give themselves the best estimate of the molecules location.

Using the above process I’d imagine they can get pretty good spot locations. They also mention the use of a crosstalk correction algorithm during the location identification step. But I’d imagine just filtering out bad looking Gaussian would work reasonably well.

While they may now have good spot/molecule locations this doesn’t mean they can pack DNA at ~10nm on the surface. This is because in any given cycle, there will be adjacent molecules which are fluorescing. The resulting Gaussian PSFs as imaged will overlap meaning that spots can’t be resolved. This is essentially crosstalk between adjacent spots.

Apton appear to be trying to use there super-accurate spot locations as the input to their crosstalk correction algorithm. The crosstalk correction process isn’t described in detail. But I can see that with very accurate spot locations, you can parameterize a model to which you can fit your observed signal.

This sounds great, but crosstalk seems to increase exponentially:

The examples say “molecule locations are resolvable with molecule-to-molecule distances under 200 nm” and elsewhere they say “acceptable crosstalk levels” … “occurs for pitches at or above 210 nm”.

So it seems based on this, a pitch of ~200nm is viable, but it’s not clear that you can go lower than this. This seems unfortunate, as it’s only about a quarter of the size of Illumina’s wells.

What About Illumina?

As mentioned above, super-resolution has been around for a while. In fact, the Genome analyzer 2 used super-resolution-like techniques. Cluster locations could be identified to sub-pixel resolution. Rather than just picking the “brightest” pixel in a cluster, adjacent pixel intensities could be fitted to a PSF to give a more accurate cluster location.

Illumina appear to have now filed a bunch of patents on various approaches to increasing density. One patent uses a DNA-PAINT [3] approach, which they suggest can increase the packing density to “may be less than about 20 nm”. Another describes a STED approach [4] (200nm).

There was however one patent that I found quite fascinating. This approach appears to be for an iSeq-like [6] platform where clusters sit over a CMOS image sensor. The device incorporates additional electrodes which allow an electric field to be created under the cluster.

This field can then be use to electrically deactivate a fluorophore (in one example, by pulling a quencher down on to it):

This seems very neat. Essentially it would be that you could image one cluster while quenching all its neighbors. This would remove any crosstalk. Giving you good separation of signals while retaining density. Theoretically you could push clusters very close together.

While the Illumina patents describes some interesting approaches, I didn’t see anything that looked like a real experimental setup or real datasets. So, maybe much of this is theory at the moment. I guess we’ll have to wait an see!

References and Notes

[1] Myosin V Walks Hand-Over-Hand:Single Fluorophore Imaging with1.5-nm Localization.

[2] http://www.freepatentsonline.com/10378053.pdf

[3] http://www.freepatentsonline.com/y2019/0276886.html

[4] http://www.freepatentsonline.com/y2019/0219835.html

[5] http://www.freepatentsonline.com/9193998.html

[6] I don’t see any reason why a similar setup might not be used with a normal (patterned or otherwise) flowcell with embedded electrodes. But the patent seems to focus on a iSeq-like apporach.

Scripts to download SARS-CoV-2 replacements

I wanted to download a set of mutations in SARS-CoV-2. CoV-GLUE seems to be a reasonable database of mutations in SARS-CoV-2. However the web interface doesn’t seem to have an option to download a dataset. And there isn’t a published API. So I threw together some ugly bash/awk to get what I wanted. I don’t imagine this will work for long, as the website appears to be under active development. But here are my notes anyway.

The website works off a (undocumented?) JSON API. I used the follow JSON template to get replacements (non-synonymous substitutions) which occur in 2 or more sequences:

{"multi-render":{"tableName":"cov_replacement","allObjects":false,"whereClause":"(true) and  (((num_seqs >= 2)))","rendererModuleName":"covListReplacementsRenderer","pageSize":500,"fetchLimit":500,"fetchOffset":FETCHOFFSET,"sortProperties":"-num_seqs,+variation.featureLoc.feature.name,+codon_label_int,+replacement_aa"}}

The above goes in a file called templ. I then just modify “FETCHOFFSET” using sed and download the first 4500 mutations (at the time of writing there are 4000 odd mutations. You’d want to stick this all in a loop… but I didn’t bother:

rm all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/0/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/500/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/1000/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/1500/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/2000/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/2500/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/3000/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/3500/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all
rm index.html;cp templ c;sed -i 's/FETCHOFFSET/4000/g' c;wget http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/ --post-file=./c --header "Cookie: _cle=accepted" --header "Content-Type: application/json";cat index.html >> all


The we extract all the mutation IDs:

awk  'BEGIN{RS="id\":\"";FS="\""}{print $1}' all > allmuts

And then fetch them from the server, this will create 4000 odd files, which we can then parse further:

#awk '{print "wget --header \"Cookie: _cle=accepted\" --header \"Content-Type: application/json\" --post-file=./info.json http://cov-glue.cvr.gla.ac.uk/gluetools-ws/project/cov/custom-table-row/cov_replacement/" $1}' allmuts  > allmuts.get

The CoV-GLUE database seems like a great resource. I hope they add a feature to download sequences/results soon. I’ve seen database results presented in a few preprints. It would be nice it those papers could also include the raw data, otherwise they’re unfortunately going to end up being difficult to replicate…

Singular Genomics Systems

Company

Singular Genomics Systems was founded in 2016. Pitchbook lists them as having raise 45.5MUSD and as being at Series B. They list Coatue Management, Domain Associates, F-Prime Capital, Revelation Partners and Arch Ventures [1] as investors [3]. They are a member of JLabs [2]. There appear to be ~60 employees listed on LinkedIn.

Technology

It appears that Singular Genomics have license technology from Jingyue Ju’s lab [4]. Jingyue Ju’s lab has generated a huge amount of IP around various approaches to DNA sequencing, so this doesn’t really narrow things down very much.

Singular’s patents also describe two different optical sequencing approaches. One is a single molecule “real time” sequencing approach (the closest similar commercial platform would be PacBio). The other is a ensemble approach (with an example showing amplified DNA on beads). I’ll briefly review these two patents below. But the main takeaway is that they appear to be working on a optical approach. I suspect it’s slightly more likely that they are working on an ensemble approach (as these are more common, and easier to get working).

The ensemble approach also shows the closest to what could be real data. So let’s look at this first:

Ensemble Approach

From 20200102609 – Represents the first 10 cycles of four color SBS data for a fragment of the PhiX 174 DNA immobilized on beads in a flow cell. The graph shows fluorescence emission intensity obtained by using a mixture of 4 labeled, blocked dNTPs: dCTP-Bodipy, dTTP-R6G, dATP-AF568, dGTP-AF647. The fluorescence images were taken during the chase step, as dark, blocked dNTPs were being incorporated into any remaining previously unextended complementary DNA strands.

The above figure from [6] shows one innovation they describe on the basic sequencing-by-synthesis approach. Essentially what they’re suggesting is that after flowing in your standard labelled reversible terminators you flow in a “chasing mix”. This chasing mix is a set of all 4 nucleotides, with reversible terminators, but no labels. What this means is that you give all the strands a second chance to incorporate a nucleotide. This “second chance” doesn’t give you any more signal. But it does hopefully mean that unextended strands are kept “in step”.

So called “phasing” (strands failing to incorporate a base, and getting out of step) is a major source of error in sequencing-by-synthesis. I guess the idea here, is that an unlabelled nucleotide might incorporate with better efficiency than an labelled one.

Beyond this, the patent discusses methods of speeding up imaging, potentially by taking images during the “chasing step”. This is interesting in the sense that were otherwise the imaging time would be wasted, you can use it here to help extend unextended strands, without otherwise altering the signal.

The graph above shows the first 10 cycles of a fragment of PhiX. To properly understand this data it would need normalization, but there doesn’t seem to be much in the way of phasing. This work appears to have been preformed on beads. This seems to suggest that they have something up and running. Unfortunately it doesn’t tell us much about their proposed amplification/cluster/polony generation approach. I’d guess they are using a bead based platform to evaluate the chemistry and have other ideas around amplification. But it’s also possible that they are designing a bead based platform (like Ion Torrent/454).

Single Molecule Approach

Another patent [5] discusses a single molecule approach. In this approach they’re watching a polymerase incorporate nucleotides in realtime. Here they’re suggesting detection through FRET one option appears to be have a couple of FRET acceptor/donor sites on the polymerase. As the polymerase incorporates a nucleotide a conformational change occurs and your get a FRET. You also use a label on the nucleotide to then observe incorporation using FRET.

The paper suggests observation via grating style TIRF is desirable. Where the grating could be incorporated into the flowcell. There are a few other bits and pieces of interest in the patent, such as attachment methods, but nothing that looked like an experimental setup or data to me.

Overall these patents don’t give a clear picture of where Singular Genomics is heading. It seems that the approach is likely optical, and I suspect not single molecule based on the lack of experimental data. Will be interesting to see how things develop!

Notes

[1] https://www.archventure.com/portfolio/

[2] https://jlabs.jnjinnovation.com/sites/jlabs/files/JLABSPortfolioSocial.pdf

[3] https://pitchbook.com/profiles/company/226128-25#investors

[4] https://www.cbinsights.com/company/singular-genomics

[5] Single molecule approach: https://patents.google.com/patent/US20180258472A1

[6] http://www.freepatentsonline.com/y2020/0102609.html

In order to decrease SBS cycle times, in embodiments of the present disclosure the identity of distinguishable, blocked dNTP analogue incorporated into the labeled, blocked extension product(s) generated in the sequencing reaction of such cycle is assessed while the sequencing reaction is running, i.e., before the chasing reaction is initiated. In other embodiments, such assessment is conducted during the chasing reaction. In embodiments, such assessment is conducted about less than 60 seconds before termination of the sequencing reaction, about less than 60 seconds before initiation of the chasing reaction, about less than 300 seconds after initiation of the chasing reaction, or about less than 60 to about less than 10 seconds before termination of the chasing reaction. In embodiments, such assessment is conducted substantially simultaneously with initiation of the chasing reaction. In embodiments, such assessment is conducted at the conclusion of or after the chasing reaction.

chasing conditions (i.e., conditions under which an unlabeled, blocked dNTP analogue species can be incorporated into a primed template DNA molecule that was not extended to include a distinguishable, blocked dNTP analogue species), thereby forming the unlabeled, blocked extension product(s).

DNA Sequencing with Simultaneous Imaging and Chase Steps

Provided here is an example of the embodiment of a sequencing-by-synthesis method where the DNA bases were identified during the chase step. In this example, identical DNA fragments derived from the PhiX 174 genome were immobilized on 1 micron beads. The beads were tethered to a glass coverslip which was part of a flow cell. All necessary reagents for SBS were sequentially delivered into the flow cell. At first, four distinguishable, blocked dNTP analogues were presented into the flow cell. Each dNTP was labeled with a different fluorophore as follows: dCTP-Bodipy, dTTP-R6G, dATP-AF568, dGTP-AF647. A sequencing polymerase was used to incorporate these dNTPs into the complementary strand. A small volume of buffer was then used to remove any excess dye-labeled, blocked dNTPs. As a second step, dark, blocked dNTPs were introduced into the flow cell. During this second step, a set of four images was taken, one for each of the colors corresponding to each dye-labeled dNTP, while the dark dNTPs continued to be incorporated into any unextended DNA templates on the bead. The images were obtained using a Nikon microscope, with a 20×0.75 NA objective, and standard filter sets corresponding to each of the dyes. Note that the images were taken simultaneously with the chasing step, at a temperature of 60° C., demonstrating the compatibility of the two processes in terms of reaction conditions. The excess dark, blocked dNTPs were then washed out and a deprotection reagent was brought in. This reagent cleaved the blocking group and the dye from the incorporated dNTPs. The cycle was then repeated as many times as desirable. The results from the first 10 cycles obtained using this method are shown in FIG. 1. Each bar represents the fluorescent signal from the corresponding base. Spectral cross-talk correction, which compensates for the spectral bleed through from one emission channel into another, was applied to these data. No additional corrections have been applied. We have shown sequencing read lengths of >75 bases in this manner.

http://www.freepatentsonline.com/y2019/0077726.html

http://www.freepatentsonline.com/y2020/0102609.html

http://www.freepatentsonline.com/y2019/0352508.html

https://www.indeed.com/cmp/Singular-Genomics-1/reviews

https://patents.google.com/patent/US20180258472A1/en?inventor=eli+glezer&oq=eli+glezer&sort=new&page=2

Single Technologies

Company

Single Technologies is based in Stockholm and was founded in 2012 [1]. Crunchbase lists them has having raised 4.7MUSD from JovB Holding, Sciety, and KTH Holdings [3]. There are ~10 employees on LinkedIn. The majority of the co-founders appear to come from a optical background.

Technology

Single have a few patents which largely refer to the optical system. The patents do not explicitly mention sequencing. What follows is based on one of their patents, and I’ll then try and frame this based on what they say on their website.

Single Technologies imaging setup from [2].

In The Single Technologies imaging system [2], the sample sits on a rotating sample holder. Essentially, it appears to be a drum that rotates under the objective lens. They suggest rotation means that the the sample is only subjected to constant forces. In a sense, this is similar to the TDI imaging that Illumina does on their instruments. The sample moves at a constant speed under the optical system and you essentially “scan” it. I can see that a curved sample surface is however a big departure from a traditional flat flowcell moving on an XY/XYZ stage.

The rotation trajectory appears to need to be very well defined, and they suggest using air bearings could help, and talk about precision (between laps) of 100nm. The patent describes the use of confocal microscopy, so rather than using a line scan imager (as in TDI) they will likely be scanning point-by-point. The instrument is called the “Theta” so it seems like a Confocal Theta Microscopy may also be a possibility.

From the patent we get the sense that they are innovating around the imaging system. The website more explicitly says that they are looking at single molecule detection. From the patent, and explicit mention of confocal microscopy on the site, I would not expect this imaging configuration to be compatible with “real-time” observation of nucleotide incorporation (PacBio-style).

The website mentions a patterned flowcell, but I didn’t see a patent referring to this. It would be interesting to better understand how this works, particularly in a single molecule context. One of the big issues with Illumina chemistry has been avoiding “mixed clusters”. These are clusters which are formed from more than one than template. Because the templates (DNA to be sequenced) randomly attached to the flowcell, there is some probability that two templates are very near each other and form a single “mixed” cluster. On the Genome Analyzer 2, these “mixed clusters” accounted for ~50% of data (from memory). Significantly limiting throughput.

If you have a number of wells/sites and are trying to optimize for “single occupancy”. Randomly flowing templates into wells limits you to ~36% [4] of wells containing a single template (with many containing none, or multiple templates).

Illumina solved this issue with their “exclusion amplification” chemistry [5]. In this approach as templates attach they are rapidly amplified quickly fill the well. This means that there isn’t any space left for a second template to enter a well. My understanding is that with this approach, most of the “mixed clusters” disappear, and you get a dramatic increase in reads.

Getting back to Single Technologies. My question is are you limited to only being able to use 36% of sites/wells? Or is there a single molecule approach to ensuring that each site only contains a single template? This would be quite interesting.

Outside of the optical system there doesn’t seem to be much to say about Single Technologies. They state that their approach “can be applied to almost any fluorescent based sequencing chemistry”. Which strongly suggests to me that they don’t have any innovation around chemistry. Looking at their team, I also don’t see employees with the background required to develop a new sequencing chemistry either.

So I suspect their plan is to innovate around the optics/flowcell only. Perhaps they can partner with someone else for the chemistry, or be acquired by an existing sequencing company [6], where they would provide a throughput advantage with improved optics/flowcells.

Notes

[1] https://www.singletechnologies.com/about-us

[2] http://www.freepatentsonline.com/y2019/0049382.html

[3] https://www.crunchbase.com/organization/single-technologies#section-funding-rounds

[4] https://books.google.co.jp/books?id=AJm7CwAAQBAJ&pg=PA43&lpg=PA43&dq=poisson+limit++single+occupancy+well&source=bl&ots=b33KVRG8fO&sig=ACfU3U2jGUbwzdGwzAqZ_LTHSiO5DZwJMg&hl=en&sa=X&ved=2ahUKEwj77dej68zpAhVVA4gKHcT0AJUQ6AEwCnoECAcQAQ#v=onepage&q=poisson%20limit%20%20single%20occupancy%20well&f=false

[5] https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/patterned-flow-cells.html

[6] I suspect of current players, only Illumina and BGIs chemistry would be compatible. In both cases this would mean removing amplification from their workflow so that they became single molecule approaches. The question then is if the Single Technologies approach can sufficiently reduce issues around photo bleaching to make this worth it. Moving to single molecule might help increase read length in some cases (by removing phasing issues) but if reduces average read length/throughput through photo bleaching, this might not be worth it.

Website Quotes

Left: DNA fragments located in SINGLE’s patterned flow cells. Middle: the Theta Sequencer. Right: sequencing data generated.

“The sequencer consists of the Worlds fastest scanner, a new type of fluidics adopted for large areas and sequencing chemistry.”

“Theta contains the world’s fastest single molecule-sensitive confocal imaging system”

“Single Technologies is pushing the limits of genomics by combining single molecule imaging, fast large area confocal scanning, grating techniques, fluidics, nanotechnology and a large portion of out of the box thinking. Our technology is being explored by leaders and industry in the genomics field who cares about Big Data generation.”

“Single Technologies was founded by Johan Strömqvist, Bengt Sahlgren, Annika Bolind Bågenholm and Raoul Stubbe in 2012/2013. The origin of the company is a unique combination of PhD research in single molecule imaging and biotechnology at the Royal Institute of Technology and R&D in the fiber optical grating industry by the founders of Proximion in Stockholm, Sweden.”

SCANNER

Theta contains the world’s fastest single molecule-sensitive confocal imaging system. The technology digitizes the samples simply and intuitively at diffraction limited resolution with negligible bleaching. And it’s ludicrously fast, a 15×15 mm area can be scanned in just a few seconds, and a total area of 125×65 mm could be scanned without any compromises.

FLUIDICS

Theta contains a new type of automated fluidics which avoids micro channels, allowing rapid exchange of liquids over large areas and using less reagents, setting new standards for optimized reactions. It is enabled by a combination of Single’s revolutionary scanning technology and unique approaches to effective flow and diffusion.

CHEMISTRY

Theta’s fast large area scanning and automated effective fluidics can be applied to almost any fluorescent based sequencing chemistry, supported by Single’s new patterned surface methods, giving the benefits of speed and capacity compared with other systems, backing the increased need of sequenced data.