Using an SBS-like approach to selectively amplify

Today I was pondering that fact that there are DNA synthesis approaches that may result in high error rates.

One significant class of errors is insertions. In particular, homopolymer errors. One of the issues with enzymatic DNA synthesis is that so far, it’s been difficult to incorporate bases with reversible terminators.

One approach could be to limit the number of bases incorporated purely through the concentration present. This is likely to result in a highly errored product however. Even if your error rate is 5%, after incorporating 100 bases, less than 5% of your product will be fully correct.

If we could selectively amplify only the correct strands, this might give us more utility out of an inefficient/errored synthesis platform.

Let’s say we get some reasonable fraction of fully correct strands at 20 bases [1]. Size selection might be problematic [2] as many errors will be either the same length, or nearly the same length. We assume that insertion errors dominate, and it’s these errors that we’re mostly interested in removing.

One approach might be to selectively completely amplify only those strands which don’t contain insertions. You can do this, by step-wise synthesis of a complementary strand. By exposing the strand to reversibly terminated [3] bases in the correct order only. The scheme is somewhat similar to sequencing-by-synthesis, but here is used for selective amplification.

To take an example, say we have attempted to synthesize the sequence CGTCCCTAGTCGACTGACGT. We would expose the synthesized strands to complementary bases in the correct order [4] during stepwise synthesis. This stepwise process would be, similar to sequencing-by-synthesis (incorporate, wash/remove, cleave terminators etc.).

A fully correct strand, or one containing deletions only will incorporate a base at every position. A complete complementary strand will therefore be created.

A strand with an insertion however will become out of sync with the correct/desired bases. It will therefore no incorporate a base at every position.

In the example below, we can see how a single insertion error, will result in a strand half the size of the original. Insertion errors are therefore converted to larger fragment size errors (and produce significantly smaller fragments in many cases).

In this example bases are flowed into the pool in the order G,C,A,G etc. and incorporated from the 3′ end of the template.

In the errored strand, bases incorporate correctly until the 6th position. At this point, the synthesis process gets out of sync. An A,T,C, and A fail to incorporate, before another G is encountered. The final synthesised strand is ~50% smaller than the fully correct template.

True sequence
   01234567890123456789
3' CGTCCCTAGTCGACTGACGT 5'
5' GCAGGGATCAGCTGACTGCA 3'

Insertion
   012345678901234567890
3' CGTCCCCTAGTCGACTGACGT 5'
5' GCAGGGGATCA           3'

The process described would most likely need to be performed cyclicly (between rounds of melting), to amplify the pool of strands sufficiently. After this selective amplification process, size selection [6] could take place to select the correct (or a “more correct” subset). This subset might be used for downstream applications, or as a substrate for further synthesis [5].

This amplification process might remove the most problematic errored strands from the synthesis process [6] as well as potentially allowing us to gain more utility for an errored synthesis process.

Notes

[1] I’m selecting 20 bases to keep the examples simple.

[2] Again size selection of short fragments is problematic anyway, but this is just an example.

[3] Or maybe, without terminators if you don’t care so much about homopolymer errors and only interested in removing other insertions.

[4] Appropriated primed+a normal polymerase, suitable for incorporating the base we are using.

[5] Effectively you might try and “reset” the synthesis process periodically, by removing errors from the pool.

[6] The most problematic errored strands might be those that are the same size as the fully correct template. These strands would need to be the result of at least an insertion and a deletion. The above scheme will not completely amplify these strands, and could therefore help mitigate against this issue.