Lingvitae AS

Image from [1].

With the Stratos acqusition I wanted to write up a few notes on a much less well known (and now I think inactive) startup called LingVitae. LingVitae was a Norwegian single molecule sequencing startup. In many ways, their original “mission” was similar to Stratos’. They were working on a way to replace a single nucleotide with a magifying tag, or what is sometimes called a “Design DNA Polymer”.

The approach to generating these “expanomer-like” strands, is rather similar one approach suggested by Stratos in their patents. Essentially loop/hairpin like oligos are hybridized to the template and ligated:

FIG. 7 shows adjacently aligned adapters which carry magnifying tags and which hybridize to the target and self-hybridize;

I’d guess there are a number of issues getting this to work. In particular, hybridization of such short (8mer) oligos might not be very specific. Will they really hybridize adjacently, specifically? Doesn’t the loop/bend cause a bunch of issues?

I couldn’t find anything that looked like experimental data in the patents. The closest I got was a 2007 paper describing the technique from Amit Miller [2]. This paper really just describes the concept and says “for proof-of-concept we designed and synthesized DNA oligonucleotides that encode”…”up to 8 bits of information”. Seems like they’re just synthesizing oligos, based on what LingVitae might have been able to produce. But there doesn’t seem to have been a followup paper showing any experimental work.

In their patents LingVitae, proposed a number of different read out methods, these include nanopore and optical approaches:

FIG. 25 shows examples of how signal chains may be used to obtain both sequence information (left) and positional information (right) in which
A) shows a DIRVISH based method using fluorescence labelled probes that bind the target molecules in a characteristic pattern,
B) shows an optical mapping based method in which the restriction pattern is used to give the position of the sequence,
C) shows a method in which a characteristic pattern of DNA binding proteins are registered as they pass through a micro/nano-pore and
D) shows a method using fluorescence labelled probes, proteins or the like which are registered as they pass a fluorescence detector.

LingVitae’s 2007 era website is shown below. This is from a time when they were patenting, and promoting work related to the design polymer idea. Their focus is on “high quality single molecule sequencing”.

This was before PacBio launched their instrument. So there were no “single molecule sequencing” platforms on the market at this time. LinkedIn shows about 30 former employees, so I imagine they had a reasonable team working on this.

Slowly LingVitae seem to have moved away from DNA sequencing. The website, and PR shifted toward the development of a cheap cellular imaging platform [1]. The idea was to use a DVD drive as a microscope. This was always part of the sequencing play, but sequencing gets downplayed from 2012 onward:

www.lingvitae.com started redirecting to discipher.co sometime in 2017? But it seems like the site itself was offline before that… then sometime in 2014 the website went offline completely. In 2017 it started redirecting to discipher.co:

Which as of ~May 2019 also appears to be offline. LinkedIn doesn’t show any active employees so I assume this is the end of the LingVitae story.

It’s a shame really, it would have been interesting to see preliminary data from the design polymer approach. If anyone knows what happened to LingVitae please get in touch!

Notes/References

[1] https://www.theverge.com/2013/4/14/4223500/lab-on-a-dvd-blood-analysis-hiv-testing-fast-affordable

[2] Amit Miller paper: https://academic.oup.com/clinchem/article/53/11/1996/5627340

[3] Expandomer patent: https://patents.justia.com/patent/20090053699

From http://www.freepatentsonline.com/20070254280.pdf

Text from a version of their website, via the waybackmachine:

Physical Magnification
The units to discriminate in a biological DNA molecule are bases or base pairs with a size of 0,34 nm each. The units to discriminate in a Design Polymer are blocks of up to 25 bases or base pairs with a size of up to 10 nm each.

Maximalisation of Unit Differences
The difference between the units to discriminate in a biological DNA molecule are only represented by a few atoms on a purine or pyrimidine skeleton attached to an identical backbone structure. The difference between the units to discriminate in a Design Polymer can be very significant and will be tailor made to achieve maximum resolution power on the read-out platform in question.

Binary Code
There are 4 units to discriminate between in a biological DNA molecule and the read-out platform must thus be able to distinguish between 4 different levels or states. There are 2 units to discriminate between in a Design Polymer and the read-out platform must thus be able to distinguish between only 2 different levels or states or alternatively even easier use a very simple on-off approach

Removal of secondary structures
A biological DNA molecule can take all forms of sequences and shapes and has a natural tendency to form secondary structures which can influence on the read-out process. A Design Polymer can be designed to avoid secondary structures and to ensure a reproducible behavior during the read-out process

Labels
It is difficult to label every individual base in biological DNA molecule due to sterical hindrance. The repertoire of labels that can be used is thus limited. Incorporation errors, quenching of neighbor labels, etc. adds to the challenge. It is easy to label every individual unit in Design Polymer as the spacing between labels, unit sequence, and more can be designed. The repertoire of labels that can be used are thus almost endless. Incorporation errors, quenching of neighbor labels, and other challenges can easily be solved by smart design of unit sequences, spacing, and other Design Polymer parameters.”

The whole purpose of the Design Polymer Concept is to enable read out technologies to perform rapid DNA analysis with superior quality and resolution, and after years of extensive research LingVitae is now developing its first series of Design Polymer products -DNA EXPLORER SYSTEM.

DNA EXPLORER SYSTEM is going to be a lab-on-a-disk based kit designed to convert biological DNA into synthetic Design Polymers. The intention is to provide read out companies with a powerful tool that enables them to obtain advanced “DNA molecules” designed solely for the purpose of single molecule sequencing. 

The first generation of the kit will consist of a conversion jig and a set of four conversion disks. It is intended to be very user-friendly, so that anyone with basic lab skills should be able to perform the conversion of nucleic acid in less than 24 hours.  

DNA EXPLORER Fluidics Station 500

Fluidics Station 500 will be the 1st generation of fluidics stations for processing the Conversion Disks.

The Fluidics Station 500 will incorporate advanced design that provides improved ease-of-use and true walk away freedom to dramatically improve efficiency in the end-users genetic analysis. The system will run unattended until completion of a Conversion Disk, freeing the operator to attend to other responsibilities, thereby helping to improve the workflow and operation of the laboratory.

Total processing time per Conversion Disk will be 6 hours and the Fluidics Station 500 is designed to operate in environments running 2-4 daily runs per system. A total of 4 runs will be needed for whole genome conversion of a eukaryotic genome. No dedicated or special power requirements.”

DNA EXPLORER Conversion Kit

The Conversion Kit will contain a series of 4 Conversion Disks (Disk 1-4) for the Conversion of 100 Gigabases of 24mers from a target material as well as reagents needed for DNA purification and initial handling of the sample material.

The Conversion Kit reagents will formulate into the fewest number of individual components possible, reducing preparation and handling steps. All of the reagents will be ready-to-use solutions.”

Video linked from site: https://www.youtube.com/watch?v=zLb9ip2lOos

Picture from their site, describing the “lab on a dvd”:

Are there mutations in SARS-CoV-2 CDC qPCR Primer Sites?

I was curious to know if there were any documented mutations which cover CDC Primers/Probes [1]. There’s work that has shown that mismatches in qPCR assays can “completely abolish PCR amplification” [2]. For diagnostic applications, mutations could mean that a qPCR based test would fail to detect SARS-CoV-2 or result in reduced sensitivity.

So, I downloaded all replacements (amino acid substitutions) from CoV-GLUE [3] [4]. I then extracted the nucleotide location identified in each replacement [5]. I then removed any duplicate locations. this resulted in a total of 3527 locations.

I then extracted the CDC primer sequences [6]. I wrote a small tool to do the following:

  1. Load in the reference sequence, create a new sequence indicate mutation locations on the reference.
  2. Find the location of the primer sequences on the reference [7].
  3. For each primer, note where on the primer sequence there are mutations in the reference.
  4. Report mutation location on primer.

Mismatches near the 3′ end appear to be more significant, I’ve therefore plotted the number of mutations, based on there distance from the 3′ end of the primer. The plot below looks at primers only:

Three mutations are in locations which “may result in a 658-fold underestimation of initial copy number” [2] [8]. But there are no mutations on the 3′ terminal base, where a mismatch is likely to “abolish amplification”.

There does appear to be one mutation in the 3′ terminal base of one of the probes. However, I suspect terminal probe mutations are less significant than those in primers.

This analysis excludes far more common non-synonymous changes. I would expect these to be an order of magnitude higher. I would imagine this data is available somewhere, but I couldn’t see it in CoV-GLUE. Most likely it can be extracted from GISAID which seems to be the data source used for CoV-GLUE. If someone would like to work on an analysis of non-synonymous mutations, please get in touch.

Also, I’d warn again drawing any strong conclusions from the analysis presented here. This is very much a first look at the data and an attempt to feel out the issue. I think it would be interesting to replicate/build out this work however, and would love to hear any comments.

Tarball of the (bad) code used here: Analysis.tar.gz

References/Notes

[1] https://www.cdc.gov/coronavirus/2019-ncov/downloads/rt-pcr-panel-primer-probes.pdf specifically the primers are:

GACCCCAAAATCAGCGAAAT
TCTGGTTACTGCCAGTTGAATCTG
TTACAAACATTGGCCGCAAA
GCGCGACATTCCGAAGAA

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2797725/

[3] cov-glue.cvr.gla.ac.uk

[4] Some messy scripts were required: http://41j.com/blog/2020/06/scripts-to-download-sars-cov-2-replacements/

[5] I used some awful awk to do this: for i in *; do awk ‘BEGIN{n=0;RS=”referenceNtCoord\”:\””;FS=”\”,\””;}{if(n==1) print $1;n++;}’ $i;done > mutlocs

[6] These are stored in the file called “primers”, in the tarball at the end of this post.

[7] This does a brute force alignment, looking for exact matches only on the forward and reverse strand. SARS-CoV-2 is only ~30Kb so computationally this is no problem.

[8] Within 5 bases of the 3′ end.

Stratos’ Other Approach…

Reading through some of Stratos’ more recent patents I came across a non-expandomer sequencing approach which I found quite interesting. The patent shows that Stratos had been thinking about other approaches to nanopore based sequencing. This suggests that the Roche acquisition may not just have been for their expandomer IP…

The Approach

Schematic of my understanding of the approach presented in [1]. Also see their figure [6].

My understanding of the approach from [1] is summarized in the schematic above. Essentially you construct a polymerase [2] that has a “tether” attached to it. The tether is composed of a PEG repeat, which threads through a nanopore [3]. The PEG region has a short oligo on the end. Once it’s threaded through the pore, another oligo can be hybridized to it. This secures the tether in the pore.

With polymerase-tether complex in the pore, the system should look something like the diagram above. The polymerase is secured on the top of the pore. Due to its charge, I guess the polymerase would be pulled toward the pore, but it’s too big to pass through.

As in a standard Ionic/Protein nanopore setup, there’s a bias voltage, and ionic current passing through the pore. The polymerase is now blocking the pore. This causes a significant reduction in current flow. One of the figures in the patent illustrates this (and looks like experimental data):

FIG. 10A shows a signature electrical trace of an open nanopore and the nanopore partially occluded by a molecular tether. FIG. 10B shows a signature electrical trace of an open nanopore and the nanopore occluded by a DNA polymerase conjugated to a molecular tether.

Now, a template to be sequenced is introduced. Single stranded DNA enters the polymerase, and synthesis of the complementary strand occurs. However, the template DNA doesn’t interact with the pore directly.

The idea is that as the polymerase incorporates bases it will undergo confirmational changes. The patent suggests that these confirmational changes can be up to about 1nm.

The idea here is that different confirmational changes will cause the polymerase to block the pore with varying efficiency. So, for example when incorporating a base it might go through a series of confirmational changes, which result in less current flow, then more current flow, then back to baseline.

Ideally, each base would induce a distinct conformational change, and current blockage. The current trace can then be used to infer the sequence of the DNA template.

However, if you’re only able to detect incorporation/non-incorporation then nucleotides could be sequentially introduced.

The approach is somewhat reminiscent of that proposed by Roswell. Both of which seem to be trying to detect confirmational changes in the polymerase. In comparison to the Roswell approach, this seems like a simpler setup. The polymerase is also closer to the “sensor” which might make detecting confirmation changes easier.

Contrasting this with other ionic nanopore approaches, it’s kind of nice that the strand doesn’t go through the pore. Ideally in the Stratos approach there would be at most 4 different signal types, one for each base. Rather than the signal being some combination of all the nucleotides currently in the pore. Theoretically this could result in lower error rates.

Overall the idea seems at least plausible, and I’ll be curious to see how it plays out.

References and Notes

[1] http://www.freepatentsonline.com/20170159115.pdf

[2] A number of enzymes that process DNA could work (for example an exonuclease). And the patent states this, but a polymerase seems the most likely option.

[3] They talk about and show various protein nanopores. But I imagine a solid state nanopore may also be viable, particularly as the dimensions of the pore maybe slightly less critical in this approach.

[4] “The tethers were constructed of three domains (i.e., “segments”): 1) a polyethylene glycol (PEG) repeat region, located proximal to the polymerase and designed to span the nanopore channel; 2) a short oligonucleotide, designed to hybridize to a single-stranded oligonucleotide on the opposite side of the nanopore relative to the polymerase to anchor the assembly; and 3) a negatively charged phosphoramidite tail, located most distal to the polymerase and designed to facilitate threading of the tether through the nanopore. FIG. 9 is a SDS/PAGE gel that shows the size of the unmodified KF polymerase (lane 1), the KF-tether 1 conjugate (lane 2) and the KF-tether 2 conjugate (lane 3). As expected, the conjugates show an increase in mass compared to the unmodified polymerase.”

[5] “When this polymerase complexes with a nucleotide that is the complement to the template base in the next extension position the polymerase reconfigures into what is referred to in the art as a “closed” conformation. At a more detailed structural level, the transition from the open to closed conformation is characterized by relative movement within the polymerase resulting in the “thumb” domain and “fingers” domain being closer to each other. In the open conformation the thumb domain is further from the fingers domain, akin to the opening and closing of the palm of a hand. In various polymerases, the distance between the tip of the finger and the thumb can change up to 10 angstroms between the “open” and “closed” conformations. The distance between the tip of the finger and the rest of the protein domains can also change up to 10 Angstroms. It will be understood that this change will be exploited in a method set forth herein.”

[6] This figure in the patent is supposed to show the threading/polymerase. But I find it a bit confusing.

Other quotes….

“The present disclosure relates to methods and constructs for single molecule electronic sequencing of template nucleic acids. The constructs are molecular sensor complexes which comprise a processive nucleic acid processing enzyme localized to a nanopore. Conformational changes in the enzyme induced by single nucleic acid processing events are transduced into electric signals by the nanopore, which are used to identify individual nucleotides. The methods can include the steps of providing a membrane with the nanopore and the enzyme complexed with a template nucleic acid localized proximal to an opening in the pore, contacting the enzyme with an ion conductive reaction mixture including the reagents required for nucleic acid processing, providing a voltage drop across the pore that induces ion current through the pore that is modulated by conformational changes in the enzyme, measuring current through the pore over time to detect nucleotide-dependent conformational changes in the enzyme, and identifying the type of nucleotide processed by the enzyme using current modulation characteristics, thus determining sequencing information about the nucleic acid molecule.”

“The polymerase is secured to the pore by hybridizing a short oligonucleotide anchor to the tether construct on the distal side of the nanopore.”

“These results indicate that a tether and a tether-polymerase conjugate can be anchored to a nanopore and, moreover, that the resulting complex can generate reproducible electrical signals. Polymerase-nanopore complexes are thus capable of modulating current flow through the pore and show promise as useful sensors to transduct mechanical events into electrical signals.”

“One or more of the transitions that a polymerase undergoes when adding a nucleotide to a nucleic acid can be detected using a molecular sensor complex as described herein.”

“FIG. 4C depicts the polymerase in a second, ™, “closed” configuration, which is induced, e.g., by binding of incoming nucleotide 605 to form a correct base pair with the template nucleic acid. In this second configuration, the degree to which the enzyme physically occludes the pore is reduced, and consequently the flow of current through the pore will increase. Such modulation of current flow generates an electronic signal specific for nucleotide species”

“In one embodiment, each of the four nucleotides induces a different polymerase conformation, as illustrated in FIG. 4C. . The movement of the polymerase during the incorporation of a nucleotide will modulate the ion current through the pore in a characteristic and reproducible manner, generating a signature electric signal.”

“In another embodiment, the average amplitude of the current modulation doesn’t change, but rather the noise in the current modulation changes as a single nucleotide is bound and incorporated. In yet another embodiment, the current modulation system only indicates an incorporation event but does not discriminate the base type. In this embodiment, the sequence information about a nucleic acid is obtained by sequentially flooding the senor complex with one of four reaction mixtures containing one of the four nucleotides and detecting the presence or absence of an electric signal.”

“For proper function of the molecular sensor complexes of the present invention, it is necessary that the enzyme be stably localized to the pore in sufficiently close proximity to reliably influence, or modulate, current flow through the pore. Several alternative localization and/or attachment structures or compositions are contemplated by the present invention, some which are illustrated schematically in FIGS. 5-8. FIG. 5A depicts one embodiment in which enzyme 500 is localized to pore 220 by covalent attachment to tethering structure 325, herein referred to simply as a “tether”. Tethers may be designed to thread through the lumen of the pore, from one side of membrane 100 to the other. Tethers may comprise one or more structural domains, or “segments”, designed to perform one or more functions.”

Apton Biosystems Update

I’ve previously written about Apton Biosystems. When I wrote that post there wasn’t much to go on. However, a patent [2] has recently been published which reveals a bit more information.

The motivation stated in the patent is that “to reach a $10 30× genome”…”the amount of data per unit area needs to increase by 100 fold”. Elsewhere in the patent they mention that the prior art is a pitch of 1 micron. HiSeq wells were ~500nm. So they want decrease well size to ~100nm.

This premise, is slightly shaky as Illumina flowcells and reagents are sold at significant profit. I imagine a large part of Illumina’s costs are related to logistical issues, rather than consumables themselves.

In any case, the patent proposes a massive cost reduction by more densely packing DNA on the flowcell. The patent mostly refers to ordered arrays, and many examples refer to a single molecule approach. The basic chemistry however seems to be pretty standard Illumina style sequencing-by-synthesis.

The figure below shows a simulation of DNA attached to a surface, at varying pitch (spacing). The right-hand images are de-convoluted versions of the left. It’s clear that as the pitch gets smaller, the image gets more crowded, and it’s harder to identify individual spots.

When imaging using a standard optical microscope, you would expect your density to be diffraction limited. Essentially, you can’t clearly identify features smaller than the wavelength of light (~200nm)… normally.

However, a number of recent techniques have broken the diffraction limit. These have allowed optical microscopes to resolve features down to 10s of nanometers. In this patent, Apton apply some “super-resolution”-like approaches… but in a limited scope (we’ll revisit what Illumina might be doing here later).

A basic super-resolution approach is shown below (not from Apton):

From [1]. The images above show the signal detected from a individual fluorophores. Each pixel is 13um, using 150x magnification this covers ~86nm on the surface. To generate super-resolved locations they do a Gaussian fit/find the fit of the intensity registered from a single fluorophore.

Each “peak” in part A of the figure above is the signal from a single fluorophore. Because the peaks are well separated we can extract each one and look at its distribution. In part B we see a single distribution. This is a 2D Gaussian. If we just took the pixel of highest intensity as the location of the fluorophore our resolution would be diffraction limited to ~200nm. However, by performing a Gaussian fit over the distribution we can determine the location to sub-pixel resolution. In this case, they could identify fluorophore locations at a final resolution of 1.5nm.

The above approach only works because the flurophores are well separated. If the Gaussians overlapped, the fit wouldn’t work. In the image above you can see the FWHM of the Gaussian is about 3 pixels, this represents ~250nm on the surface. I’d imagine if flurophores were any closer than this you’d have issues.

In their patent, Apton use the above approach to identify positions of single DNA strands on the surface to a sub-diffraction limited resolution of “10 nm RMS or less”. Apton appear to use essentially the above approach. However they have a problem, they want to pack the molecules as closely as they can to improve density. This means they are not well separated like those in the figure above.

To get round this Apton seem to use a couple of approaches. The first is that they use signals from multiple cycles to identify molecule positions. If the oligos attached to the surface have a fairly random distribution of bases (like the human genome) this should help a lot. For each molecule, you can select a cycle where it is illuminated, but none of its neighbors are. This means there is no crosstalk at this position and you should be able to get a good estimate of its position.

The figure below shows what appear to be images from a real single molecule experiment:

“right panel, shows each peak from each cycle overlaid. The left panel is the smoothed version of the right panel. Each bright spot represents a molecule. The molecule locations are resolvable with molecule-to-molecule distances under 200 nm.”

The right image shows positions identified from different cycles piled up (super-resolved positions I assume). In the left image they’ve used these to create another Gaussian. I would guess they then take the peak of this second Gaussian to give the final location of the molecule. This way, they can incorporate information from multiple cycles to give themselves the best estimate of the molecules location.

Using the above process I’d imagine they can get pretty good spot locations. They also mention the use of a crosstalk correction algorithm during the location identification step. But I’d imagine just filtering out bad looking Gaussian would work reasonably well.

While they may now have good spot/molecule locations this doesn’t mean they can pack DNA at ~10nm on the surface. This is because in any given cycle, there will be adjacent molecules which are fluorescing. The resulting Gaussian PSFs as imaged will overlap meaning that spots can’t be resolved. This is essentially crosstalk between adjacent spots.

Apton appear to be trying to use there super-accurate spot locations as the input to their crosstalk correction algorithm. The crosstalk correction process isn’t described in detail. But I can see that with very accurate spot locations, you can parameterize a model to which you can fit your observed signal.

This sounds great, but crosstalk seems to increase exponentially:

The examples say “molecule locations are resolvable with molecule-to-molecule distances under 200 nm” and elsewhere they say “acceptable crosstalk levels” … “occurs for pitches at or above 210 nm”.

So it seems based on this, a pitch of ~200nm is viable, but it’s not clear that you can go lower than this. This seems unfortunate, as it’s only about a quarter of the size of Illumina’s wells.

What About Illumina?

As mentioned above, super-resolution has been around for a while. In fact, the Genome analyzer 2 used super-resolution-like techniques. Cluster locations could be identified to sub-pixel resolution. Rather than just picking the “brightest” pixel in a cluster, adjacent pixel intensities could be fitted to a PSF to give a more accurate cluster location.

Illumina appear to have now filed a bunch of patents on various approaches to increasing density. One patent uses a DNA-PAINT [3] approach, which they suggest can increase the packing density to “may be less than about 20 nm”. Another describes a STED approach [4] (200nm).

There was however one patent that I found quite fascinating. This approach appears to be for an iSeq-like [6] platform where clusters sit over a CMOS image sensor. The device incorporates additional electrodes which allow an electric field to be created under the cluster.

This field can then be use to electrically deactivate a fluorophore (in one example, by pulling a quencher down on to it):

This seems very neat. Essentially it would be that you could image one cluster while quenching all its neighbors. This would remove any crosstalk. Giving you good separation of signals while retaining density. Theoretically you could push clusters very close together.

While the Illumina patents describes some interesting approaches, I didn’t see anything that looked like a real experimental setup or real datasets. So, maybe much of this is theory at the moment. I guess we’ll have to wait an see!

References and Notes

[1] Myosin V Walks Hand-Over-Hand:Single Fluorophore Imaging with1.5-nm Localization.

[2] http://www.freepatentsonline.com/10378053.pdf

[3] http://www.freepatentsonline.com/y2019/0276886.html

[4] http://www.freepatentsonline.com/y2019/0219835.html

[5] http://www.freepatentsonline.com/9193998.html

[6] I don’t see any reason why a similar setup might not be used with a normal (patterned or otherwise) flowcell with embedded electrodes. But the patent seems to focus on a iSeq-like apporach.