Archive for the ‘Uncategorized’ Category.

QuantumSi’s Protein Sequencing Approach

I’ve previously written about QuantumSi’s DNA sequencing work. Recently, QuantumSi have been promoting their protein sequencing platform. This may be a device that incorporates both DNA and protein sequencing functionality [1]. In this post I’m going to take a look at one of their protein sequencing patents [0] and review the approach.

Expectation Management

Before we dig into the technical details, let’s review the approach at a high level, in the context of DNA sequencing.

The basic process used to sequence proteins can be briefly described as follows:

  1. Isolate single proteins
  2. Attach a label to the terminal amino acid, and detect the label.
  3. Remove a single terminal amino acid.
  4. Go to step 2 to identify the next amino acid.

At a high level is not unlike single molecule sequencing-by-synthesis, in that monomers are detected sequentially. The difference here being that rather than incorporating monomers, in this approach they are cleaved.

While the basic process is similar to DNA sequencing, building the machinery to sequence proteins is far more complex. In DNA sequencing we have a bunch of tool (proteins) developed by nature which we can harness to develop sequencing approaches.

DNA’s complementary nature provides a simple approach to both amplifying polymers and introducing labels. We have a vast array of proteins that incorporate nucleotides, degrade nucleotides, and modify DNA sequences. For the most part, none of this machinery exists when working with proteins.

As we can’t amplify proteins, we are stuck with single molecule approaches. From DNA sequencing, we’ve seen that this alone limits our accuracy, and in general single molecule approaches have an error rate of >10% whereas amplified approaches (Illumina) have error rates significantly less than 1%.

Not only this, but the “alphabet” of proteins is an order of magnitude greater than for DNA sequencing. Beyond the ~20 standard amino acids, number of modified variants also exist further complicating labeling and identification.

Our base line expectation is therefore that initial data quality will be worse than DNA sequencing. This may not matter if the applications are compelling, as there’s not as much competition in the protein sequencing space.

Technical Approach

There are two technical approaches described in detail in the patent [2]. Both approaches use single molecule optical detection of a protein under sequencing attached to a surface (one example shows 18% occupancy [3]). The readout system appears to be similar to that mentioned in my previous post.

Sequencing Approach 1 – Label + Cleavage Enzyme

The first approach uses a labeled recognition enzyme. In this approach a (fluorescently) labeled recognition protein is used to detect the terminal base. From what I can tell these proteins don’t bind strongly, so they are transiently binding on and off.

At the same time, a cleavage enzyme is in the mix. The cleavage enzyme, at some appropriately low concentration comes in and removes terminal amino acids.

Example 6 shows what appears to be experimental data for this process. The example uses ATTO 542 label ClpS2. and an aminopeptidase (VPr). The protein they are attempting is sequence is YAAWAAFADDDWK.

ClipS2 binds to Y,W and F and apparently doesn’t bind to other terminal amino acids.

Two raw traces for this sequencing experiment are shown:

These two figures (20A and 20C) show two independent sequencing runs. Transient binding of ClpS2 to Y, W and F is shown. The experiment starts with the Y exposed as the terminal animo acid. As such we see transient binding of ClpS2 to Y from time point zero. At some point the cleavage enzyme comes in and the “Y” gets chopped off. “A” is now exposed at the terminal animo acid. ClpS2 shows no binding to “A” and we therefore don’t see any binding.

“A” then gets cleaved, exposing yet another “A” (still no binding). This is cleaved revealing W as the terminal base and we see transient binding again etc.

As it goes this is interesting, but it doesn’t really tell us anything about the sequence, which will naturally be runs of “F or W or Y” and not “F or W or Y”. It’s clear from the traces that the length of the transient binding period provides little informative information. For example, the time taken to cleave two “A” differs by a factor of 5 between the two experiments.

So in order to determine which of F, W, and Y was detected we need to look more closely at the transient binding. Figures 20B and 20D show histograms of the pulse duration. The variation and average durations seem to differ significantly for each animo acid.

This provides a basic proof of concept for animo acid detection. But it’s quite limited. The issues are as follows:

  • There’s no mechanism for detecting runs of identical amino acids.
  • The demonstration shows detection of 4 out of > 20 amino acids.
  • While the durations are somewhat distinct for these amino acids, it seems likely that 20 such distributions would show significant overlap.
  • The sequence is quite short (so we don’t know how long we can go without damaging the protein/other issues occurring).

Table 1 of the patent lists 33 amino acid recognition proteins, with 13 different binding patterns. These appear to cover 16 of the >20 amino acids ambiguously. A combination of there recognition proteins (and ideally a few more) might bring you closer to a full sequencing approach.

But, overall this work seems to provide a basic proof of concept of the detection process. With some additional work I could see this working as a protein fingerprinting technology. Where an ambiguous protein sequence is compared against a database of known protein sequences. In the above example the fingerprint might be something like:

“One or more Y”,”One or more not F,W,Y”,”One or more W”, “One or more not F,W,Y”, “One or more F”

With a long enough sequence, this may unambiguously identify a particular protein/class of proteins. However single point mutations might be more challenging.

Using additional recognition proteins would improve this fingerprinting process. Progressing this toward a sequencing platform seems challenging, and might require development of the technical approach.

Sequencing Approach 2 – Labeled Amino acid specific cleavage enzyme

Some data is also shown for a second approach. Here the cleavage enzyme is specific to certain animo acids. The demonstration shows that they have a method for incorporating labels into these exopeptides [5]. But there doesn’t seem to be any data demonstrating sequencing using this approach.

A list of amino acid specific exopeptides is provided [6]. However these seem to be far more limited than the recognition enzymes in approach 1. Only three types of specific exopeptides are listed those specific to Glu/Asp, Met or Proline.

This approach therefore seems more challenging and less developed.

Summary

Overall this seems like an interesting approach to a problem that hasn’t received that much attention. I’d expect the initial platform to be nearer to a “protein fingerprinting technology” than a full protein sequencing instrument. This seems like an interesting tool in its own right. If the initial instrument is framed as a protein sequencing platform, I would expect the error rate to be far higher than we’re used to seeing for DNA sequencing (probably in the order of >20%).

However, all this speculation is based on a single patent. They may have developed beyond this patent and it will be interesting to see what is finally released.

Notes

That’s it, you can stop reading now…

Ok, well… this section contains a few other notes from the patent. It doesn’t really add much to the discussion above, but they are here in part for my own reference. The footnotes below also support some of the assertions in the text above and may be of interest.

As is often the case, this patent mentions a number of other approaches which could be used. The patent discuss nanopore readout briefly. Indicating that the protein being sequenced could be immobilized on a nanopore. Recognition molecules (labels) are then detected through changes in conductance of the nanopore.

Other sections refer to “conductivity labels”.

Shielding elements. This is essentially a protein (or other element) that shields the recognition molecule from photo damage. Shield proteins are used in other single molecule sequencing approaches, and a number of methods are presented in the patent.

Other approaches to removing terminal amino acids are mentioned… Edman degradation, phenyl isothiocyanate.

Various other detection methods are briefly discussed for example Aptamers.

While I’ve not mentioned it in the text above there are some other nice plots of the pulse duration differences for ClpS2 in example 5:

Example 5: ClpS2 as recognition label. ClpS2 is labeled with dye. Single molecule intensity traces shown in figure 19B.

Similar plots are shown for ClpS, and ClpS1.

Footnotes

[0] https://www.freepatentsonline.com/y2020/0209257.html US20200209257A1

[1] “In some aspects, the application relates to the discovery of polypeptide sequencing techniques that allow both genomic and proteomic analyses to be performed using the same sequencing instrument.”

“Such strategies may require modification of an existing analytic instrument, such as a nucleic acid sequencing instrument, which may not be equipped with a flow cell or similar apparatus capable of reagent cycling. The inventors have recognized and appreciated that certain polypeptide sequencing techniques of the application do not require iterative reagent cycling, thereby permitting the use of existing instruments without significant modifications which might increase instrument size.”

[2] As always, a number of other approaches are also mentioned. But these are the approaches that have the best supporting data. I review some of the alternative approaches at the end of the post.

[3] One example shows proteins attached using a DNA linker, with 18% single protein occupancy. This seems lower than Poisson. Only single wells will be sequencable so this limits throughput. Example 2 (page 127).

[4] Amino acid recognition proteins. Table 1. Lists 33 amino acid recognition proteins and their preferred binding. Many of these appear to prefer the same amino acids. There are therefore 13 different types of “preferred binding” listed: FWY: 4, FWYL: 6, FWYLVI: 1, phosphorus-Y: 5, FWYLI: 1, KR: 1, DE: 1, KRH: 3, P: 2, KRHWFY: 5, PMV: 1, G: 2, A: 1.

[5] They reference the following paper, which uses non-natural amino acids. Chin J. W et al. J Am Chem Soc. 2002 Aug. 7 124(31):9026-9027

[6] Aminopeptidases. Table 3 lists amino peptidases. These should selectively cleave terminal amino acids. There seem to be 3 classes here with limited coverage of the amino acid space (Glu/Asp: 1, Met: 2, Proline: 6). Table 4 provides a much longer list of non-specific Amino-peptidases.

Lingvitae AS

Image from [1].

With the Stratos acqusition I wanted to write up a few notes on a much less well known (and now I think inactive) startup called LingVitae. LingVitae was a Norwegian single molecule sequencing startup. In many ways, their original “mission” was similar to Stratos’. They were working on a way to replace a single nucleotide with a magifying tag, or what is sometimes called a “Design DNA Polymer”.

The approach to generating these “expanomer-like” strands, is rather similar one approach suggested by Stratos in their patents. Essentially loop/hairpin like oligos are hybridized to the template and ligated:

FIG. 7 shows adjacently aligned adapters which carry magnifying tags and which hybridize to the target and self-hybridize;

I’d guess there are a number of issues getting this to work. In particular, hybridization of such short (8mer) oligos might not be very specific. Will they really hybridize adjacently, specifically? Doesn’t the loop/bend cause a bunch of issues?

I couldn’t find anything that looked like experimental data in the patents. The closest I got was a 2007 paper describing the technique from Amit Miller [2]. This paper really just describes the concept and says “for proof-of-concept we designed and synthesized DNA oligonucleotides that encode”…”up to 8 bits of information”. Seems like they’re just synthesizing oligos, based on what LingVitae might have been able to produce. But there doesn’t seem to have been a followup paper showing any experimental work.

In their patents LingVitae, proposed a number of different read out methods, these include nanopore and optical approaches:

FIG. 25 shows examples of how signal chains may be used to obtain both sequence information (left) and positional information (right) in which
A) shows a DIRVISH based method using fluorescence labelled probes that bind the target molecules in a characteristic pattern,
B) shows an optical mapping based method in which the restriction pattern is used to give the position of the sequence,
C) shows a method in which a characteristic pattern of DNA binding proteins are registered as they pass through a micro/nano-pore and
D) shows a method using fluorescence labelled probes, proteins or the like which are registered as they pass a fluorescence detector.

LingVitae’s 2007 era website is shown below. This is from a time when they were patenting, and promoting work related to the design polymer idea. Their focus is on “high quality single molecule sequencing”.

This was before PacBio launched their instrument. So there were no “single molecule sequencing” platforms on the market at this time. LinkedIn shows about 30 former employees, so I imagine they had a reasonable team working on this.

Slowly LingVitae seem to have moved away from DNA sequencing. The website, and PR shifted toward the development of a cheap cellular imaging platform [1]. The idea was to use a DVD drive as a microscope. This was always part of the sequencing play, but sequencing gets downplayed from 2012 onward:

www.lingvitae.com started redirecting to discipher.co sometime in 2017? But it seems like the site itself was offline before that… then sometime in 2014 the website went offline completely. In 2017 it started redirecting to discipher.co:

Which as of ~May 2019 also appears to be offline. LinkedIn doesn’t show any active employees so I assume this is the end of the LingVitae story.

It’s a shame really, it would have been interesting to see preliminary data from the design polymer approach. If anyone knows what happened to LingVitae please get in touch!

Notes/References

[1] https://www.theverge.com/2013/4/14/4223500/lab-on-a-dvd-blood-analysis-hiv-testing-fast-affordable

[2] Amit Miller paper: https://academic.oup.com/clinchem/article/53/11/1996/5627340

[3] Expandomer patent: https://patents.justia.com/patent/20090053699

From http://www.freepatentsonline.com/20070254280.pdf

Text from a version of their website, via the waybackmachine:

Physical Magnification
The units to discriminate in a biological DNA molecule are bases or base pairs with a size of 0,34 nm each. The units to discriminate in a Design Polymer are blocks of up to 25 bases or base pairs with a size of up to 10 nm each.

Maximalisation of Unit Differences
The difference between the units to discriminate in a biological DNA molecule are only represented by a few atoms on a purine or pyrimidine skeleton attached to an identical backbone structure. The difference between the units to discriminate in a Design Polymer can be very significant and will be tailor made to achieve maximum resolution power on the read-out platform in question.

Binary Code
There are 4 units to discriminate between in a biological DNA molecule and the read-out platform must thus be able to distinguish between 4 different levels or states. There are 2 units to discriminate between in a Design Polymer and the read-out platform must thus be able to distinguish between only 2 different levels or states or alternatively even easier use a very simple on-off approach

Removal of secondary structures
A biological DNA molecule can take all forms of sequences and shapes and has a natural tendency to form secondary structures which can influence on the read-out process. A Design Polymer can be designed to avoid secondary structures and to ensure a reproducible behavior during the read-out process

Labels
It is difficult to label every individual base in biological DNA molecule due to sterical hindrance. The repertoire of labels that can be used is thus limited. Incorporation errors, quenching of neighbor labels, etc. adds to the challenge. It is easy to label every individual unit in Design Polymer as the spacing between labels, unit sequence, and more can be designed. The repertoire of labels that can be used are thus almost endless. Incorporation errors, quenching of neighbor labels, and other challenges can easily be solved by smart design of unit sequences, spacing, and other Design Polymer parameters.”

The whole purpose of the Design Polymer Concept is to enable read out technologies to perform rapid DNA analysis with superior quality and resolution, and after years of extensive research LingVitae is now developing its first series of Design Polymer products -DNA EXPLORER SYSTEM.

DNA EXPLORER SYSTEM is going to be a lab-on-a-disk based kit designed to convert biological DNA into synthetic Design Polymers. The intention is to provide read out companies with a powerful tool that enables them to obtain advanced “DNA molecules” designed solely for the purpose of single molecule sequencing. 

The first generation of the kit will consist of a conversion jig and a set of four conversion disks. It is intended to be very user-friendly, so that anyone with basic lab skills should be able to perform the conversion of nucleic acid in less than 24 hours.  

DNA EXPLORER Fluidics Station 500

Fluidics Station 500 will be the 1st generation of fluidics stations for processing the Conversion Disks.

The Fluidics Station 500 will incorporate advanced design that provides improved ease-of-use and true walk away freedom to dramatically improve efficiency in the end-users genetic analysis. The system will run unattended until completion of a Conversion Disk, freeing the operator to attend to other responsibilities, thereby helping to improve the workflow and operation of the laboratory.

Total processing time per Conversion Disk will be 6 hours and the Fluidics Station 500 is designed to operate in environments running 2-4 daily runs per system. A total of 4 runs will be needed for whole genome conversion of a eukaryotic genome. No dedicated or special power requirements.”

DNA EXPLORER Conversion Kit

The Conversion Kit will contain a series of 4 Conversion Disks (Disk 1-4) for the Conversion of 100 Gigabases of 24mers from a target material as well as reagents needed for DNA purification and initial handling of the sample material.

The Conversion Kit reagents will formulate into the fewest number of individual components possible, reducing preparation and handling steps. All of the reagents will be ready-to-use solutions.”

Video linked from site: https://www.youtube.com/watch?v=zLb9ip2lOos

Picture from their site, describing the “lab on a dvd”:

Are there mutations in SARS-CoV-2 CDC qPCR Primer Sites?

I was curious to know if there were any documented mutations which cover CDC Primers/Probes [1]. There’s work that has shown that mismatches in qPCR assays can “completely abolish PCR amplification” [2]. For diagnostic applications, mutations could mean that a qPCR based test would fail to detect SARS-CoV-2 or result in reduced sensitivity.

So, I downloaded all replacements (amino acid substitutions) from CoV-GLUE [3] [4]. I then extracted the nucleotide location identified in each replacement [5]. I then removed any duplicate locations. this resulted in a total of 3527 locations.

I then extracted the CDC primer sequences [6]. I wrote a small tool to do the following:

  1. Load in the reference sequence, create a new sequence indicate mutation locations on the reference.
  2. Find the location of the primer sequences on the reference [7].
  3. For each primer, note where on the primer sequence there are mutations in the reference.
  4. Report mutation location on primer.

Mismatches near the 3′ end appear to be more significant, I’ve therefore plotted the number of mutations, based on there distance from the 3′ end of the primer. The plot below looks at primers only:

Three mutations are in locations which “may result in a 658-fold underestimation of initial copy number” [2] [8]. But there are no mutations on the 3′ terminal base, where a mismatch is likely to “abolish amplification”.

There does appear to be one mutation in the 3′ terminal base of one of the probes. However, I suspect terminal probe mutations are less significant than those in primers.

This analysis excludes far more common non-synonymous changes. I would expect these to be an order of magnitude higher. I would imagine this data is available somewhere, but I couldn’t see it in CoV-GLUE. Most likely it can be extracted from GISAID which seems to be the data source used for CoV-GLUE. If someone would like to work on an analysis of non-synonymous mutations, please get in touch.

Also, I’d warn again drawing any strong conclusions from the analysis presented here. This is very much a first look at the data and an attempt to feel out the issue. I think it would be interesting to replicate/build out this work however, and would love to hear any comments.

Tarball of the (bad) code used here: Analysis.tar.gz

References/Notes

[1] https://www.cdc.gov/coronavirus/2019-ncov/downloads/rt-pcr-panel-primer-probes.pdf specifically the primers are:

GACCCCAAAATCAGCGAAAT
TCTGGTTACTGCCAGTTGAATCTG
TTACAAACATTGGCCGCAAA
GCGCGACATTCCGAAGAA

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2797725/

[3] cov-glue.cvr.gla.ac.uk

[4] Some messy scripts were required: http://41j.com/blog/2020/06/scripts-to-download-sars-cov-2-replacements/

[5] I used some awful awk to do this: for i in *; do awk ‘BEGIN{n=0;RS=”referenceNtCoord\”:\””;FS=”\”,\””;}{if(n==1) print $1;n++;}’ $i;done > mutlocs

[6] These are stored in the file called “primers”, in the tarball at the end of this post.

[7] This does a brute force alignment, looking for exact matches only on the forward and reverse strand. SARS-CoV-2 is only ~30Kb so computationally this is no problem.

[8] Within 5 bases of the 3′ end.

Stratos’ Other Approach…

Reading through some of Stratos’ more recent patents I came across a non-expandomer sequencing approach which I found quite interesting. The patent shows that Stratos had been thinking about other approaches to nanopore based sequencing. This suggests that the Roche acquisition may not just have been for their expandomer IP…

The Approach

Schematic of my understanding of the approach presented in [1]. Also see their figure [6].

My understanding of the approach from [1] is summarized in the schematic above. Essentially you construct a polymerase [2] that has a “tether” attached to it. The tether is composed of a PEG repeat, which threads through a nanopore [3]. The PEG region has a short oligo on the end. Once it’s threaded through the pore, another oligo can be hybridized to it. This secures the tether in the pore.

With polymerase-tether complex in the pore, the system should look something like the diagram above. The polymerase is secured on the top of the pore. Due to its charge, I guess the polymerase would be pulled toward the pore, but it’s too big to pass through.

As in a standard Ionic/Protein nanopore setup, there’s a bias voltage, and ionic current passing through the pore. The polymerase is now blocking the pore. This causes a significant reduction in current flow. One of the figures in the patent illustrates this (and looks like experimental data):

FIG. 10A shows a signature electrical trace of an open nanopore and the nanopore partially occluded by a molecular tether. FIG. 10B shows a signature electrical trace of an open nanopore and the nanopore occluded by a DNA polymerase conjugated to a molecular tether.

Now, a template to be sequenced is introduced. Single stranded DNA enters the polymerase, and synthesis of the complementary strand occurs. However, the template DNA doesn’t interact with the pore directly.

The idea is that as the polymerase incorporates bases it will undergo confirmational changes. The patent suggests that these confirmational changes can be up to about 1nm.

The idea here is that different confirmational changes will cause the polymerase to block the pore with varying efficiency. So, for example when incorporating a base it might go through a series of confirmational changes, which result in less current flow, then more current flow, then back to baseline.

Ideally, each base would induce a distinct conformational change, and current blockage. The current trace can then be used to infer the sequence of the DNA template.

However, if you’re only able to detect incorporation/non-incorporation then nucleotides could be sequentially introduced.

The approach is somewhat reminiscent of that proposed by Roswell. Both of which seem to be trying to detect confirmational changes in the polymerase. In comparison to the Roswell approach, this seems like a simpler setup. The polymerase is also closer to the “sensor” which might make detecting confirmation changes easier.

Contrasting this with other ionic nanopore approaches, it’s kind of nice that the strand doesn’t go through the pore. Ideally in the Stratos approach there would be at most 4 different signal types, one for each base. Rather than the signal being some combination of all the nucleotides currently in the pore. Theoretically this could result in lower error rates.

Overall the idea seems at least plausible, and I’ll be curious to see how it plays out.

References and Notes

[1] http://www.freepatentsonline.com/20170159115.pdf

[2] A number of enzymes that process DNA could work (for example an exonuclease). And the patent states this, but a polymerase seems the most likely option.

[3] They talk about and show various protein nanopores. But I imagine a solid state nanopore may also be viable, particularly as the dimensions of the pore maybe slightly less critical in this approach.

[4] “The tethers were constructed of three domains (i.e., “segments”): 1) a polyethylene glycol (PEG) repeat region, located proximal to the polymerase and designed to span the nanopore channel; 2) a short oligonucleotide, designed to hybridize to a single-stranded oligonucleotide on the opposite side of the nanopore relative to the polymerase to anchor the assembly; and 3) a negatively charged phosphoramidite tail, located most distal to the polymerase and designed to facilitate threading of the tether through the nanopore. FIG. 9 is a SDS/PAGE gel that shows the size of the unmodified KF polymerase (lane 1), the KF-tether 1 conjugate (lane 2) and the KF-tether 2 conjugate (lane 3). As expected, the conjugates show an increase in mass compared to the unmodified polymerase.”

[5] “When this polymerase complexes with a nucleotide that is the complement to the template base in the next extension position the polymerase reconfigures into what is referred to in the art as a “closed” conformation. At a more detailed structural level, the transition from the open to closed conformation is characterized by relative movement within the polymerase resulting in the “thumb” domain and “fingers” domain being closer to each other. In the open conformation the thumb domain is further from the fingers domain, akin to the opening and closing of the palm of a hand. In various polymerases, the distance between the tip of the finger and the thumb can change up to 10 angstroms between the “open” and “closed” conformations. The distance between the tip of the finger and the rest of the protein domains can also change up to 10 Angstroms. It will be understood that this change will be exploited in a method set forth herein.”

[6] This figure in the patent is supposed to show the threading/polymerase. But I find it a bit confusing.

Other quotes….

“The present disclosure relates to methods and constructs for single molecule electronic sequencing of template nucleic acids. The constructs are molecular sensor complexes which comprise a processive nucleic acid processing enzyme localized to a nanopore. Conformational changes in the enzyme induced by single nucleic acid processing events are transduced into electric signals by the nanopore, which are used to identify individual nucleotides. The methods can include the steps of providing a membrane with the nanopore and the enzyme complexed with a template nucleic acid localized proximal to an opening in the pore, contacting the enzyme with an ion conductive reaction mixture including the reagents required for nucleic acid processing, providing a voltage drop across the pore that induces ion current through the pore that is modulated by conformational changes in the enzyme, measuring current through the pore over time to detect nucleotide-dependent conformational changes in the enzyme, and identifying the type of nucleotide processed by the enzyme using current modulation characteristics, thus determining sequencing information about the nucleic acid molecule.”

“The polymerase is secured to the pore by hybridizing a short oligonucleotide anchor to the tether construct on the distal side of the nanopore.”

“These results indicate that a tether and a tether-polymerase conjugate can be anchored to a nanopore and, moreover, that the resulting complex can generate reproducible electrical signals. Polymerase-nanopore complexes are thus capable of modulating current flow through the pore and show promise as useful sensors to transduct mechanical events into electrical signals.”

“One or more of the transitions that a polymerase undergoes when adding a nucleotide to a nucleic acid can be detected using a molecular sensor complex as described herein.”

“FIG. 4C depicts the polymerase in a second, ™, “closed” configuration, which is induced, e.g., by binding of incoming nucleotide 605 to form a correct base pair with the template nucleic acid. In this second configuration, the degree to which the enzyme physically occludes the pore is reduced, and consequently the flow of current through the pore will increase. Such modulation of current flow generates an electronic signal specific for nucleotide species”

“In one embodiment, each of the four nucleotides induces a different polymerase conformation, as illustrated in FIG. 4C. . The movement of the polymerase during the incorporation of a nucleotide will modulate the ion current through the pore in a characteristic and reproducible manner, generating a signature electric signal.”

“In another embodiment, the average amplitude of the current modulation doesn’t change, but rather the noise in the current modulation changes as a single nucleotide is bound and incorporated. In yet another embodiment, the current modulation system only indicates an incorporation event but does not discriminate the base type. In this embodiment, the sequence information about a nucleic acid is obtained by sequentially flooding the senor complex with one of four reaction mixtures containing one of the four nucleotides and detecting the presence or absence of an electric signal.”

“For proper function of the molecular sensor complexes of the present invention, it is necessary that the enzyme be stably localized to the pore in sufficiently close proximity to reliably influence, or modulate, current flow through the pore. Several alternative localization and/or attachment structures or compositions are contemplated by the present invention, some which are illustrated schematically in FIGS. 5-8. FIG. 5A depicts one embodiment in which enzyme 500 is localized to pore 220 by covalent attachment to tethering structure 325, herein referred to simply as a “tether”. Tethers may be designed to thread through the lumen of the pore, from one side of membrane 100 to the other. Tethers may comprise one or more structural domains, or “segments”, designed to perform one or more functions.”