Sequencing with Mixtures of Three Bases

A previous post discussed Cygnus’ approach to sequencing, using mixtures of bases and multiple reads of the same template. Centrillion also have a patent that appears to cover a related approach.

The Cygnus approach, as described in their paper uses mixtures of 2 bases. I thought it might be interesting to work through corrections using mixtures of 3 bases. It’s possible this is covered somewhere in their supplementary info, or huge 200+ page patent. I’ve not checked and this is just for fun.

There are 4 possible sets of 3 different base types: ATG, ATC, TGC and AGC. The difference between each of these sets is clearly a single base (3 bases out of ATGC in the set, and 1 left out).

To recap on the previous post, a template is exposed to alternating sets (mixtures) of bases, and we measure incorporation intensity and learn how many bases incorporate (as in the same for a normal single channel unterminated sequencing chemistry). In order to process the entire strand the sets we alternate between must contain all base types. For the sets of 3 base types this is no problem, any pair of sets will contain all four base types and differ by only a single base type.

There are 6 possible pairings:

a ATG,ATC

b ATG,TGC

c ATG,AGC

d ATC,TGC

e ATC,AGC

f TGC,AGC

We could vary the order of the pairs. But we don’t really need to. Working through all possible 2bp repeats [1] it’s clear that we can accurate resolve all sequences using 3 out of the 6 alternating pairs.

In all cases, one pairing supplies the base transition information. For example for the repeat ATATAT this is group f above. This is the only pairing that blocks incorporation between A and T transitions. Each pairing blocks on transitions between one of the six possible transition types (G<->C A<->T A<->G A<->C T<->G T<->C). To accurately resolve all sequences, all pairings are therefore required. In the example 2bp repeats, one pairing provides the “transition” information and 2 other pairings are required to resolve the sequence to one of the four bases.

You therefore need to sequence each template six times. However, at any given base information from only 3 of the “mixture sequences” is required to resolve the strand. The other 3 sequences provide redundant information for error correction. This information could be used in a number of ways (either masking likely errored bases, taking a majority vote, or using this information in a more complex error correction model).

How much sequencing does this require as compared to standard single base sequencing?

Well, there will always be degenerate sequences, both in this scheme and the Cygnus approach. These sequences will require very slightly more sequencing than using a normal single base incorporation system.

However we can simulate the number of cycles required (a cycle being the incorporation of a single base type, or a single mixture type). I quickly threw some code together to do this [2]. Assuming this hastily thrown together code is correct the single base incorporation scheme requires 1.481 cycles per base (or ~2.7 bases incorporated per set of 4 bases). The mix of 3 scheme described above requires 1.4905 cycles per base.

So, if you just go by this, there’s very little overhead.

One downside of the base mixture incorporations is that the sequencing system has to cope with longer homopolymers (or rather runs of 1 of 3 different base types). Again this is true of the approach described here, and the Cygnus system. What issues this causes, will depend on the error profile of the underlying technology.

While I’ve discussed mixtures of 3 bases here, it might also be interesting to look at combinations of mixtures of 2 and 3 bases. For example you might have set pairs of ATG, and ATC. Then a set of CA and GT to resolve the ambiguity (this could be extended to create a complete sequencing system).

Maybe that’s another fun project for another time.

Notes

[1]

[2]

#include <iostream>
#include <vector>
#include <math.h>
#include <stdlib.h>

using namespace std;

// Multiple base incorporations
string s1 = "ATG";
string s2 = "ATC";
string s3 = "TGC";
string s4 = "AGC";

int mix_incorp(string temp,vector<string> pair) {

  int p=0;
  int cycles=0;
  for(int n=0;n<temp.size();) {
  
    for(;;) {
      bool ad=false;
      if(temp[n] == pair[p][0]) {n++; ad=true;}
      if(temp[n] == pair[p][1]) {n++; ad=true;}
      if(temp[n] == pair[p][2]) {n++; ad=true;}
      if(ad==false) break;
    }

    cycles++; 
    if(p==0) p=1; else p=0;
  }

  return cycles;

}

int main() {

  string temp;

  // generate random sequence
  for(int n=0;n<10000;n++) {
    int r = rand()%4;
    if(r == 0) temp += "A";
    if(r == 1) temp += "T";
    if(r == 2) temp += "G";
    if(r == 3) temp += "C";
  }

  cout << "Sequence: " << temp << endl;

  // Single base incorps
  string order="ATGC";
  int pos=0;
  int cycle_count=0;
  for(int n=0;n<temp.size();) {

    for(;temp[n] == order[pos];) n++;
   
    pos++;
    cycle_count++;
    if(pos == order.size()) pos=0;
  }
  cout << "Average cycles per base, single base incorps: " << ((float)cycle_count)/((float)temp.size()) << endl;

 
  // Super ugly code, but functional...
  vector<vector<string> > pairs(6);
  pairs[0].push_back(s1); 
  pairs[0].push_back(s2); 
  pairs[1].push_back(s1); 
  pairs[1].push_back(s3); 
  pairs[2].push_back(s1); 
  pairs[2].push_back(s4); 
  pairs[3].push_back(s2); 
  pairs[3].push_back(s3); 
  pairs[4].push_back(s2); 
  pairs[4].push_back(s4); 
  pairs[5].push_back(s3); 
  pairs[5].push_back(s4); 
  
  int total=0;
  for(int n=0;n<6;n++) {

    int count = mix_incorp(temp,pairs[n]);
    total+=count;
  }
  cout << "Average cycles per base, mixture incorps: " << ((float)total)/((float)temp.size()) << endl;
}

Read/Write DNA Devices

Yesterday I wrote up my notes on Iridia. I think one of the things I find so fascinating about the concept is the potential to create a system that can both read and write DNA. This move to read/write DNA devices seems like a step change from discrete sequencers and synthesizers.

To briefly recap, the Iridia system used nanopore to selectively expose DNA strands to enzymes. The enzymes can’t make their way through the pore, but you can drive the charged DNA strand through the pore under a bias voltage.

The use of a nanopore also clearly lends itself to using this same aperture for sequencing the DNA. This could be through the detection of the blockage of an Ionic current, or embedded electrodes (like some solid state nanopore systems). Either solid-state, or protein nanopores could be used. Apertures might even be constructed in other ways (like Armonica’s tortuous nanopores).

So potentially, you have a DNA synthesis platform, that can also QC the strand at every base incorporation as it passes through from one chamber to another.

One issue with at least some embodiments of the Iridia concept from their patent, is that single nucleotides can make their way through the pore as well was the strand under synthesis. This means they need to remove the terminator from the base already incorporated into the strand only (I think they may have a way of doing this). I guess it can also increase the potential for misincorporation and might complicate the fluidic system.

However it feels like if the correct components can be assembled, you might be able to create a chip system that can read and write DNA and has no external fluidic components.

I thought it might be fun to play around with these ideas a bit…

The diagram below shows what this (Computer Scientist) imagines such a system might look like schematically:

In the above diagram I’ve shown a system with multiple chambers. Each chamber has two apertures shown. One is a nanopore/nanoscale aperture through which only DNA can pass. The other is a valve, which stops all flow into the chamber. The valve can be much bigger than the nanopore, these are all normally closed. Potentially the valve could be part of the pore (for example a voltage gated ion channel [2]) or might not be required at all. But this valve could be large and should be easy to fabricate.

The strand under synthesis sits in the input DNA chamber. You then open the value leading out of this chamber (1) and going to one of the nucleotides, (2) for example. A bias voltage is applied between these chambers to drive the DNA into the correct nucleotide chamber. In this chamber there’s a template independent polymerase. The nucleotides in this chamber are terminated, such that only a single nucleotide is incorporated into the strand.

The bias voltage is then reversed, and the strand flows back into the “input” chamber. A number of single nucleotides come along for the ride. However in the input chamber the strand is captured. This could be by hybridising to an immobilized complementary strand, though there might be other methods. With the strand captured you reverse the bias voltage again and the single nucleotides flow back into their original chamber. You can then close the valves to keep them contained. The strand is released by heating the strand, melting it.

Next you need to cleave the terminator. This will depend on the type of terminator used. One possibility might be to use photo-cleavable terminators such as those developed by LaserGen [1]. If this is the case, then you would just need to turn on a light source to remove the terminator. Another chamber could be used however, particularly if there is some other process (perhaps there’s an enzymatic process?) that is used to remove the terminator.

The process would then continue as above cyclically. Depending on the quality of the pores/apertures, you can also measure the ionic current as the strand passes through the pore. This may be sufficient to determine the sequence during synthesis.

In any case, once synthesis is complete you can open another valve (7) and set bias voltages between the chambers (input and holding for QC chambers) such that the DNA will flow through an exit pore. This pore could be specifically designed for sequencing and might have additional embedded electrodes to enable this.

You could also QC the strand (did synthesis complete correctly?) and sort the strand accordingly.

Of course, the same system could also be used as a read system only, just insert complete strands at the start of the process.

Potentially all the nucleotides and reagents could come preloaded, giving a simple, almost solidstate system. While initially it might be desirable to fabricate only certain parts of the system using nanofabrication, ultimately it might be possible to integrate all components onto a single chip.

Notes

[1] https://webcache.googleusercontent.com/search?q=cache:hkUhGFAhmmgJ:https://www.genomeweb.com/sequencing/lasergen-says-its-new-reversible-terminators-could-improve-several-sequencing-pl+&cd=15&hl=en&ct=clnk&gl=jp

[2] https://en.wikipedia.org/wiki/Voltage-gated_ion_channel

Iridia (was Dodo Omnidata)

While putting together my list of synthesis companies, one particular stood out. Not least because of its original name, Dodo Omnidata (which is awesome) [3]. But also because the technology is significantly different from anything else on the list being inherently single molecule. The company also seems to be relatively unknown.

For these reasons, I’m writing up some quick notes.

Business

Dodo Omnidata was founded in 2016. They seem to have raised ~400K in seed funding in 2017. An SEC filing shows they raised ~2MUSD this June. Jay Flatley (ex-Illumina CEO) is on the board. The initial 400K came from Tech Coast Angels according to Crunchbase. It’s not clear where the most recent raise came from, but with Jay on the board, it seems possible there’s a connection to Illumina Ventures.

Technology

There’s not much on the website, but there is a 134 page patent. I’ve barely skimmed it but what’s clear is that they suggest using nanopores for DNA synthesis:

From by quick skim, it appears that what they suggest is driving a strand of DNA through a nanopore with a bias voltage. In this way they can move it between two chambers. In itself I don’t believe that is particularly novel. What’s neat is that because enzymes are too big to go through the nanopore they can selectively expose the strand to different enzymes under electrical control.

They use this for synthesis by having one chamber containing a template independent polymerase (a polymerase that just adds any base you give it) and a base with a terminator on it (so only a single base is added). My guess is that you’d flow bases in cyclically. If you want to incorporate a base into the strand, you flip the voltage and pull the strand through the nanopore. Leave it for a while to incorporate the base, then pull it back out.

Back on the other side of the pore, another enzyme comes in and removes the block on the strand. As single nucleotides can also pass through the pore, it’s desirable to have an enzyme that only removes terminators on bases incorporated into the strand.

In practice I would imagine the whole system can be arrayed. And you’d be flowing bases onto one side of an array. How competitive this system is with other enzymatic approaches is something I don’t know. But it seems pretty neat!

Notes

[1] 2018 SEC Filing: https://www.sec.gov/Archives/edgar/data/1708118/000170811818000002/0001708118-18-000002-index.htm

[2] http://www.freepatentsonline.com/WO2017151680A2.pdf

[3] In case you’re curious about the binary encircling the old Dodo Omnidata logo it converts to Data Vida in ASCII. Vida is Spanish for life, and I assume is a reference to the tagline “Data for Life” also on their banner.

DNA Synthesis Companies (August 2018)

Below is a list of DNA Synthesis Companies, to complement my list of sequencing companies. It’s not quite as complete, I’ve missed out some seemingly established players who didn’t seem particularly entertaining and/or only run service businesses.

There’s a great list here which includes some defunct companies, and other approaches.

Name Further Info Blog post Status Method Location
Ansa Biotechnologies Company Website Pre-seed? Enzymatic Bay Area
CustomArray Inc. Company Website Acquired Electrochemical Seattle
DNA Script Company Website Series A Enzymatic Paris
Evonetix Company Website Series A Thermal Cambridge, UK
Agilent Company Website IPO Printing Int.
Iridia (was dodo omnidata) Company Website Blog Series A Nanopore Carlsbad, California
Kilobaser Company Website Seed/Series A? Fluidic Austria
LabGenius Company Website Seed/Series A? Assembly? London
Molecular Assemblies Company Website Series A Enzymatic San Diego
Nuclera Nucleics Company Website Seed Enzymatic Cambridge, UK
SGI DNA Company Website Established Fluidic La Jolla
Synthomics Company Website Seed Fluidic Bay Area
Twist Biosciences Company Website Series E Printing Bay Area