{"id":4950,"date":"2018-08-07T15:20:24","date_gmt":"2018-08-07T15:20:24","guid":{"rendered":"http:\/\/41j.com\/blog\/?p=4950"},"modified":"2018-08-08T01:18:57","modified_gmt":"2018-08-08T01:18:57","slug":"sequencing-with-mixtures-of-three-bases","status":"publish","type":"post","link":"https:\/\/41j.com\/blog\/2018\/08\/sequencing-with-mixtures-of-three-bases\/","title":{"rendered":"Sequencing with Mixtures of Three Bases"},"content":{"rendered":"<p><a href=\"http:\/\/41j.com\/blog\/2018\/08\/sequencing-with-mixtures-of-three-bases\/3base-2\/\" rel=\"attachment wp-att-5045\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/41j.com\/blog\/wp-content\/uploads\/2018\/08\/3base-1.png\" alt=\"\" width=\"400\" height=\"401\" class=\"aligncenter size-full wp-image-5045\" \/><\/a><\/p>\n<p>A previous post discussed <a href=\"http:\/\/41j.com\/blog\/2018\/07\/cygnus-biosciences\/\">Cygnus&#8217; approach to sequencing<\/a>, using mixtures of bases and multiple reads of the same template. Centrillion also have a patent that appears to cover a related approach.<\/p>\n<p>The Cygnus approach, as described in their paper uses mixtures of 2 bases. I thought it might be interesting to work through corrections using mixtures of 3 bases. It&#8217;s possible this is covered somewhere in their supplementary info, or huge 200+ page patent. I&#8217;ve not checked and this is just for fun.<\/p>\n<p>There are 4 possible sets of 3 different base types: ATG, ATC, TGC and AGC. The difference between each of these sets is clearly a single base (3 bases out of ATGC in the set, and 1 left out).<\/p>\n<p>To recap on the previous post, a template is exposed to alternating sets (mixtures) of bases, and we measure incorporation intensity and learn how many bases incorporate (as in the same for a normal single channel unterminated sequencing chemistry). In order to process the entire strand the sets we alternate between must contain all base types. For the sets of 3 base types this is no problem, any pair of sets will contain all four base types and differ by only a single base type.<\/p>\n<p>There are 6 possible pairings:<\/p>\n<p>a ATG,ATC<\/p>\n<p>b ATG,TGC<\/p>\n<p>c ATG,AGC<\/p>\n<p>d ATC,TGC<\/p>\n<p>e ATC,AGC<\/p>\n<p>f TGC,AGC<\/p>\n<p>We could vary the order of the pairs. But we don&#8217;t really need to. Working through all possible 2bp repeats [1] it&#8217;s clear that we can accurate resolve all sequences using 3 out of the 6 alternating pairs.<\/p>\n<p>In all cases, one pairing supplies the base transition information. For example for the repeat ATATAT this is group f above. This is the only pairing that blocks incorporation between A and T transitions. Each pairing blocks on transitions between one of the six possible transition types (G&lt;-&gt;C A&lt;-&gt;T A&lt;-&gt;G A&lt;-&gt;C T&lt;-&gt;G T&lt;-&gt;C). To accurately resolve all sequences, all pairings are therefore required. In the example 2bp repeats, one pairing provides the &#8220;transition&#8221; information and 2 other pairings are required to resolve the sequence to one of the four bases.<\/p>\n<p>You therefore need to sequence each template six times. However, at any given base information from only 3 of the &#8220;mixture sequences&#8221; is required to resolve the strand. The other 3 sequences provide redundant information for error correction. This information could be used in a number of ways (either masking likely errored bases, taking a majority vote, or using this information in a more complex error correction model).<\/p>\n<p>How much sequencing does this require as compared to standard single base sequencing?<\/p>\n<p>Well, there will always be degenerate sequences, both in this scheme and the Cygnus approach. These sequences will require very slightly more sequencing than using a normal single base incorporation system.<\/p>\n<p>However we can simulate the number of cycles required (a cycle being the incorporation of a single base type, or a single mixture type). I quickly threw some code together to do this [2]. Assuming this hastily thrown together code is correct the single base incorporation scheme requires 1.481 cycles per base (or ~2.7 bases incorporated per set of 4 bases). The mix of 3 scheme described above requires 1.4905 cycles per base.<\/p>\n<p>So, if you just go by this, there&#8217;s very little overhead.<\/p>\n<p>One downside of the base mixture incorporations is that the sequencing system has to cope with longer homopolymers (or rather runs of 1 of 3 different base types). Again this is true of the approach described here, and the Cygnus system. What issues this causes, will depend on the error profile of the underlying technology.<\/p>\n<p>While I&#8217;ve discussed mixtures of 3 bases here, it might also be interesting to look at combinations of mixtures of 2 and 3 bases. For example you might have set pairs of ATG, and ATC. Then a set of CA and GT to resolve the ambiguity (this could be extended to create a complete sequencing system).<\/p>\n<p>Maybe that&#8217;s another fun project for another time.<\/p>\n<h2>Notes<\/h2>\n<p>[1]<\/p>\n<p><a href=\"http:\/\/41j.com\/blog\/2018\/08\/sequencing-with-mixtures-of-three-bases\/workthrough\/\" rel=\"attachment wp-att-5038\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-5038\" src=\"http:\/\/41j.com\/blog\/wp-content\/uploads\/2018\/07\/workthrough.png\" alt=\"\" width=\"404\" height=\"547\" \/><\/a><\/p>\n<p>[2]<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n#include &lt;iostream&gt;\r\n#include &lt;vector&gt;\r\n#include &lt;math.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n\r\nusing namespace std;\r\n\r\n\/\/ Multiple base incorporations\r\nstring s1 = &quot;ATG&quot;;\r\nstring s2 = &quot;ATC&quot;;\r\nstring s3 = &quot;TGC&quot;;\r\nstring s4 = &quot;AGC&quot;;\r\n\r\nint mix_incorp(string temp,vector&lt;string&gt; pair) {\r\n\r\n  int p=0;\r\n  int cycles=0;\r\n  for(int n=0;n&lt;temp.size();) {\r\n  \r\n    for(;;) {\r\n      bool ad=false;\r\n      if(temp&#x5B;n] == pair&#x5B;p]&#x5B;0]) {n++; ad=true;}\r\n      if(temp&#x5B;n] == pair&#x5B;p]&#x5B;1]) {n++; ad=true;}\r\n      if(temp&#x5B;n] == pair&#x5B;p]&#x5B;2]) {n++; ad=true;}\r\n      if(ad==false) break;\r\n    }\r\n\r\n    cycles++; \r\n    if(p==0) p=1; else p=0;\r\n  }\r\n\r\n  return cycles;\r\n\r\n}\r\n\r\nint main() {\r\n\r\n  string temp;\r\n\r\n  \/\/ generate random sequence\r\n  for(int n=0;n&lt;10000;n++) {\r\n    int r = rand()%4;\r\n    if(r == 0) temp += &quot;A&quot;;\r\n    if(r == 1) temp += &quot;T&quot;;\r\n    if(r == 2) temp += &quot;G&quot;;\r\n    if(r == 3) temp += &quot;C&quot;;\r\n  }\r\n\r\n  cout &lt;&lt; &quot;Sequence: &quot; &lt;&lt; temp &lt;&lt; endl;\r\n\r\n  \/\/ Single base incorps\r\n  string order=&quot;ATGC&quot;;\r\n  int pos=0;\r\n  int cycle_count=0;\r\n  for(int n=0;n&lt;temp.size();) {\r\n\r\n    for(;temp&#x5B;n] == order&#x5B;pos];) n++;\r\n   \r\n    pos++;\r\n    cycle_count++;\r\n    if(pos == order.size()) pos=0;\r\n  }\r\n  cout &lt;&lt; &quot;Average cycles per base, single base incorps: &quot; &lt;&lt; ((float)cycle_count)\/((float)temp.size()) &lt;&lt; endl;\r\n\r\n \r\n  \/\/ Super ugly code, but functional...\r\n  vector&lt;vector&lt;string&gt; &gt; pairs(6);\r\n  pairs&#x5B;0].push_back(s1); \r\n  pairs&#x5B;0].push_back(s2); \r\n  pairs&#x5B;1].push_back(s1); \r\n  pairs&#x5B;1].push_back(s3); \r\n  pairs&#x5B;2].push_back(s1); \r\n  pairs&#x5B;2].push_back(s4); \r\n  pairs&#x5B;3].push_back(s2); \r\n  pairs&#x5B;3].push_back(s3); \r\n  pairs&#x5B;4].push_back(s2); \r\n  pairs&#x5B;4].push_back(s4); \r\n  pairs&#x5B;5].push_back(s3); \r\n  pairs&#x5B;5].push_back(s4); \r\n  \r\n  int total=0;\r\n  for(int n=0;n&lt;6;n++) {\r\n\r\n    int count = mix_incorp(temp,pairs&#x5B;n]);\r\n    total+=count;\r\n  }\r\n  cout &lt;&lt; &quot;Average cycles per base, mixture incorps: &quot; &lt;&lt; ((float)total)\/((float)temp.size()) &lt;&lt; endl;\r\n}\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A previous post discussed Cygnus&#8217; approach to sequencing, using mixtures of bases and multiple reads of the same template. Centrillion also have a patent that appears to cover a related approach. The Cygnus approach, as described in their paper uses mixtures of 2 bases. I thought it might be interesting to work through corrections using [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-4950","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1RRoU-1hQ","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/posts\/4950","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/comments?post=4950"}],"version-history":[{"count":6,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/posts\/4950\/revisions"}],"predecessor-version":[{"id":5046,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/posts\/4950\/revisions\/5046"}],"wp:attachment":[{"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/media?parent=4950"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/categories?post=4950"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/41j.com\/blog\/wp-json\/wp\/v2\/tags?post=4950"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}