A Note On The Ion Torrent Flow Order

The Ion torrent DNA sequencers don’t just flow in A,T,G and C bases in a 4base cycle but use a more complex 32bp flow order. I was trying to figure out exactly what was going on and found this comment on the ioncommunity forum which somewhat explains what’s going on. Here’s the flow order they use (32 bases long!):

TACGTACGTCTGAGCATCGATCGATGTACAGC

The 32 base sequence, is composed of 2 16bp sub-sequences. These subsequences have the same structure but the bases are transposed. The 16bp sequences contain one flow of every base followed by every 2bp combination [1]. In the general case a string containing every possible substring of length k exactly once is called a de Bruijn sequence [2].

Why they do this is less clear, it might be that hitting bases in order assists with phasing or other characteristic errors. It’s also possible that there are other aspects of the base flow order that improve the sequencing process.

[1] The following is a verification of the statement above:

Original Sequence: TACGTACGTCTGAGCATCGATCGATGTACAGC

The following are the 16bp sequences, below these I’ve labelled each base by the position it first occurs in the sequence. These patterns are identical. The conclusion is that if you swap A with C, and swap C with G you can change the first sequence into the second:

TACGTACGTCTGAGCA
1234123413142432

TCGATCGATGTACAGC
1234123413142432

Taking the 12bp sequence, you can see that it contains every 2bp combination exactly one:

TACGTCTGAGCA
new@navlaptop:~/biotech/dnae/floworder$ ./a.out | sort | uniq -c
1 AC
1 AG
1 CA
1 CG
1 CT
1 GA
1 GC
1 GT
1 TA
1 TC
1 TG

[2] https://en.wikipedia.org/wiki/De_Bruijn_sequence