Thoughts on a new approach to viral testing

One of the issues with testing for COVID19 and similar viral outbreaks is that kits can not be deployed ahead of time. Early in the outbreak the sequence of virus will not be known. Once determined a test specific to this outbreak will need to be designed, and then deployed. This is a significant logistical challenge.

Specifically, one of the more popular and accurate testing approaches to testing for COVID19 is real time PCR. The testing protocol is relatively straightforward. However, before kits can be deployed primers specific to the virus need to be designed and tested. They then need to synthesized and deployed. QCing and shipping potentially millions of oligos is a significant logistical challenge.

A sequencing based approach would avoid these issues. Viral extraction and sample prep kits could be deployed ahead of time. A complete sample could be sequenced, and aligned to detect the presence of a virus. We’d no doubt have contamination/background viral genomes. However, given the huge number of reads generated by current platforms this shouldn’t be an issue. Sequencing however is expensive compared with real time PCR. The cheapest complete runs are on the order of 500USD. It’s likely that we can multiplex samples. But this brings potential contamination issues, additional complexity ,the requirement that testing is centralized, and assumes that we can easily batch samples.

Based on the CDC recommended kits, Real time PCR of COVID19 appears to costs about 10 to 20USD in reagents per sample (as supplied by vendors).

Real time PCR is cheaper. Sequencing is more versatile.

Can we create a platform that is as simple and cheap as real time PCR, but doesn’t require primers specific to our viral sequence. One option would be to build a platform that synthesized primers on device. In such a platform kits could be deployed ahead of time. When a viral outbreak occurs primers would be designed and then then the sequence would be transmitted to instruments (over the Internet). Instruments could then synthesize the primers, and runs tests.

This would require a new approach to synthesis that could be integrated into a small diagnostic device.

However, another approach might be to combine aspects of sequencing-by-synthesis with real time PCR to selectively amplify regions in the target genome. In this approach we’d first want to cut the viral genome. We could use a restriction or nicking enzyme to cut the genome into smaller fragments at specific sites [1].

A procedure like Nick Translation can then be used to selectively amplify sequence around the Nick sites [5]. Normally in PCR we introduce all four bases at once. But in this approach we introduce labelled bases [3] one at a time, in the order that they appear in the target viral genome [2].

If the sample contains the target viral sequence, it will be fully extended by the nucleotides as presented in the correct order to extend this sequence.

If the sample does not contain the target viral sequence, it will be partially but not fully extended.

There will be a difference in fluorescence [3] intensity between the fully extended target viral sequences, and the non-target partially extended sequence.

As in real time PCR, the process would likely occur cyclically though rounds of melting while the fluorescence intensity [3] is monitored to detect amplification.

The above approach would result in a programmable detection system which does not require primers. Essentially the instrument would be a modified real time PCR instrument that could deliver nucleotides to the sample (in a programmable order) [4].


[1] I guess if there’s enough material (or it’s pre-amplified) we could also randomly fragment stick a polyA on the end and put our primers here.

[2] Reversible or other terminators, or another approach could be used here to ensure only a single base is incorporated into homopolymer runs. This approach is similar to that previously described to selectively amplify sequences here:

[3] Other detection methods than fluorescence could be used. We might use labeled, or unlabeled bases (perhaps ISFET detection of the incorporation of natural bases for example).

[4] We might need to attach the sample to a solid support for this to work. However, potentially we could put the sample in a sieving buffer. This would confine the longer fragments of the viral genome, while allowing single nucleotides to enter the matrix (potentially driven via electrophoresis). This concept is described here in more detail:

[5] Or random sites, or sites as selected using a restriction enzyme etc.


CDC COVID19 Protocol:

2USD per sample:

5USD per sample:

5USD per sample:

6USD per sample:

Issues Replicating Bioinformatics Papers

Data Issues

  • The paper does not specify which database was used (for example to obtain a set of sequences).
  • The paper does not specify which release/on which date data was extracted from a database.
  • The database has been updated since publication. Old releases/versions of the database are not available.
  • The database is no longer available.
  • The database is not publicly available.
  • An old revision of the database are available, but the data format is not documented. Parsers are not available.
  • A copy of data extracted from a public database is not provided with the publication.
  • The publication does not provide enough information to verify data extracted from databases (number of sequences etc.)

Software Issues

  • The publication does not specify which software was used.
  • The publication does not specify which version of software was used.
  • The publication uses online tools, which are no longer available or have been updated.
  • The publication uses offline tools which are no longer available.
  • The publication uses offline tools which are available, but have been updated and the version used in the publication is no longer available.
  • The software used was never publicly available.
  • Binaries for the software used are available, but required libraries are no longer available (of the required version).

Quick script to download Uniprot info for proteins from Pfam trees (Newick tree format)

from ete3 import Tree
import sys
import os
import urllib.request

t = Tree(sys.argv[1])

for l in t:
        id ="/")[0]
        print(f"Accession: {id}")
        if os.path.exists(f"{id}.txt") == False:
                page = urllib.request.urlopen(f"{id}.txt")

                out_file = open(f"{id}.txt","wb")

Kodak ES 4.0 teardown pics

This post contains tear-down reference images for a Kodak Es 4.0 camera. From memory, the camera uses a KAI4000 image sensor. This is a 90s/00s era image sensor. The sensor effectively operates as 4 independent image sensors. For the fastest/most consistent readout you probably want to acquire images from all 4 quadrants at the same time. As such this camera duplicates analog and acquisition circuits four times. A AD9225 ADC is used, and four of these are present under shields.

A bunch of Xilinx FPGAs are used to acquire data from the ADCs/send it to a host computer. I do not have the interface card required to operate this camera, so can’t test this.

The CCD is cooled using a TEC. Again from memory, the feedback does not appear to be an RTD. If I remember correctly a thermistor was embedded in the TEC block.

While this camera is branded by Kodak, I think I’ve seen similar units from Princeton/Roper. Possibly these were re-branded. However it seems like there may have been a number of acquisitions in the 2000s and perhaps someone acquired Kodak’s (scientific/industrial) camera unit…