BGI – Complete Genomics

Following on from my previous article on the BGI and list of sequencing companies this post contains my notes on Complete Genomics (a BGI acquisition). If you have any further insights on Complete, I’d love to hear them. Please email at new at sgenomics dot org.

Business

Complete Genomics was founded in 2006 to develop a DNA sequencing platform based on a sequencing-by-hybridisation (SBH) approach. One of the founders (Dr Drmanac) has a long history of academic work in SBH going back to the 1980s. The commercial history of Drmanac’s SBH work also starts before Complete with a company called Callida Genomics  which was founded in 2001 as a subsidiary of HySeq.  There are Callida Genomics patents referring to methods currently used by Complete [6], which are now assigned to them. Drmanac also cofounded Hyseq, and it seems likely that some SBH work went on there. There’s therefore a commercial history behind the Complete Genomics approach extending back 10 to 15 years now.

Complete Genomics IPO’d in 2010. They were then acquired in 2013 by the BGI.  The BGI seems to have transferred most technology development work to their Shenzhen site, cancelling new projects at the Complete Genomics Mountain View office [1] and making substantial staff cuts.

The BGI have however continued to develop the platform. Releasing sequencers for use in China under the BGISEQ brand. Competitively the Complete Genomics approach does not appear to have fared well against Illumina, but it remains and interesting technological approach.

Technology

Technology overview from [3].

 

The Complete Genomics chemistry, as described in their 2010 paper, starts with the formation of nanoballs of DNA (DNBs) [8]. They appear to have a neat chemistry which allows them nanoballs in solution which are not entangled with neighboring DNBs [7]. This is in contrast to other platforms which either require amplification to be performed on beads, and/or in droplets (emulsion PCR) or on a surface (Illumina clusters, polonys).

The DNBs are flowed onto a substrate (flowcell). This flowcell is patterned with an array of aminosilane features. The DNBs only bind to these features, resulting in a regular array. From what I can tell, only a single DNB can bind to each site. Without this they could have overlapping nanoballs (which would unusable), in general such systems are limited to about a third of site containing a single read (with multiple occupancy sites being unusable). Potentially this gives them a density advantage over other DNA sequencing platforms.

With the DNBs arrayed on the chip sequencing can begin.

[9] Chemistry overview from Revolocity document.

 

 

The image above gives an overview of the cPAL sequencing chemistry used in Complete Genomics instruments. This hybridisation/ligation sequencing process is probably my least favorite part of the Complete Genomics system. The process uses fluorescently labelled degenerate 9mers with a single known position [11]. So for example, to interrogate the first position the probes NNNNNNNNA NNNNNNNNT NNNNNNNNG and NNNNNNNNC might be used, each labeled with a different dye. After these are flowed in, ligated and imaged they are then removed and the next set of probes comes in. These would then interrogate the next position NNNNNNNAN etc.

I’d guess 9mers are used because the stability of shorter oligos isn’t good enough. My understanding is that they only label the first 5 positions. They then have a process for creating extended anchor probes “by ligation of two anchor probes allows decoding of positions 6–10 adjacent to the adaptor” [10]. This results in 10mer reads. They perform a number of 10mer reads at different adaptor sites and merge everything together. From memories of early datasets, reads have the potential for gaps because of this.

Un-captioned image from [3] Supplementary info. I assume the image on the left is the combined output of 4 images of a single cycle. Image on the right possibly represents crosstalk between dyes.

The process is complex and potentially error prone. However there is one possible advantage over the Illumina SBS approach. That is that each sequencing cycle in the cPAL system resets the template by removing the probes. In Illumina sequencing there is the potential for accumulated error (phasing error) as templates get out of sync (which ultimately limits read length). This does not exist here.

I’ve not looked at Complete Genomics raw data (I don’t believe any has been released?). But I’d guess there is a potentially high raw read error rate (due to non-specific hybridisation among other things). It’s unlikely standard short read aligners would work well with Complete Genomics reads (being short and possibly containing gaps). For this, and other reasons Complete Genomics only ran a service business for many years. As I recall, they would only process human genomes, and delivered called SNPs to the customer rather than read data itself.

While the Complete Genomics systems were only used in house for a long time, since the acquisition of Complete by the BGI a line of commercial sequencers has been released under the BGISEQ brand. The BGISEQ-500 spec sheet suggests they are now generating 50bp reads [11] (the approach seems to be similar). It looks like fastq files may now be available. But not much data seems to have made its way into the various public archives yet.

The approach is technologically interesting, but it’s difficult to see how it can complete directly with Illumina. However, it seems likely the BGISEQ instruments have a significant cost advantage in China.

Notes

[1] https://www.genomeweb.com/sequencing-technology/bgi-halts-revolocity-launch-cuts-complete-genomics-staff-part-strategic-shift

[2] Callida Genomics, Inc: http://www.evaluategroup.com/Universal/View.aspx?type=Story&id=13022

[3] http://science.sciencemag.org/content/327/5961/78

[4] Revolocity Video https://www.youtube.com/watch?v=WuS_RY8Zy38

[5] https://www.youtube.com/watch?v=DfaMOTcwcjs

[6] Callida Genomics Patent, referring to nanoball approach: https://patents.google.com/patent/US8440397

[7] “Short palindromes in the adaptors promote coiling of ssDNA concatamers via reversible intra‐molecular hybridization into compact ~300 nm DNBs, thereby avoiding entanglement with neighboring replicons” Science 2010 [3], supplementary info. Documentation for the now cancelled Revolocity states that “>95% occupancy of flow cell spots occupied by a single DNB”. http://www.completegenomics.com/documents/revolocity-tech-overview.pdf

[8] Using a controlled, synchronized synthesis, we obtained hundreds of tandem copies of the sequencing substrate in palindrome-promoted coils of single-stranded DNA, referred to as DNA nanoballs (DNBs).

[9] http://www.completegenomics.com/documents/revolocity-tech-overview.pdf

[10] The process is described well here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472021/

[11] https://www.bgi.com/us/wp-content/uploads/sites/2/2017/04/BGISEQ-500-ChIP-Service-Overview_linear.pdf

Minispin teardown pics

I picked up a couple of minispin lab centrifuges on eBay. One was working, one faulty. I pulled apart and took some pictures for reference (below). Unfortunately it looks pretty dead. In particular IR2136J 3-phase driver is completely blow. I’ve ordered a replacement and we’ll see how that goes.

However in case they’re of use to anyone else, reference pics below:

           

Corning PC-420D Hotplate/Stirrer repair

I picked up a Corning 420D on eBay, the device was listed as not powering up and was relatively cheap so I figured worth a risk. It turned out to be a very simple repair, but I figured I’d write it up anyway… perhaps the reference images will be of use to someone.

The device indeed showed no power LEDs, but the hotplate was heating up… Taking a look inside you can see that the design is pretty simple. All through-hole, there’s what looks like an 8bit MCU of some description. The stirrer has a optical interrupter sensor on it, which is attached directly to the PCB.

There were no obvious issues on the top side of the PCBs so I removed the main PCB and discovered the ugly mess below… I guess there had been some lab spills…

Cleaned things up with some IPA and touched up some of the joints, but the device still didn’t work (unsurprising). So I started probing around. There’s a 74HCT7541 on the front which was a convenient location to measure digital supply voltage levels… VCC was less than 2V and fluctuating…

I probed the transformer this was putting out ~10v AC and was being rectified… But there was nothing coming out of the regulator (probably a 7805 but I didn’t even get round to checking). Turns out the trace was broken just before the regulator, when I bridged this everything started working… Sorry not a very exciting repair!

You can see the bodge in the picture below (I used a small piece of wirewrap wire). I could do with cleaning the PCB up a bit more, but I’ll leave things as they are for the moment.

 

 

 

BGI Part 1 – Business

I never quite know where to place the BGI (originally Beijing Genomics Institute). Are they a sequencing service? Are the a instrument vendor? Are they a business at all? Or are they a research institute? At times they seem like all these things, and none of them. My clearest personal memory is from one of the Cold Spring Harbor conferences. The moderator announced that before our next talk a representative from the BGI would like to make a brief statement. The BGI representative stood up and read a pre-prepared statement announcing that the BGI would sequence 1000 plant an animal genomes… it was an ambitious project but clearly one that the BGI were capable of accomplishing (using Illumina machines at the time). But the delivery seemed weird. Usually these large projects are coordinated through an international consortium of researchers. The BGI just stood up and stated that they were going to do all this themselves… it kind of sounded like a declaration of war, and indicated to me that the BGI is something other than a traditional research institution.

At present they a massive fleet of Illumina machines, their own commercial instruments employing two different sequencing technologies, a sequencing as a service business, a clinical NIPT test, 1000s of researchers, and offices in 4 countries. So what exactly is the BGI anyway?

bgi.com in 1998

I figured I’d start by trying to go back to the beginning. The BGI was founded in 1997, and appears to have been developed out of China’s desire to take part in the human genome project. bgi.com wasn’t owned by the BGI in 1998. genomics.cn was most likely, but the earliest capture in the waybackmachine is from 2008, it’s quite sparse and talks of them having Illumina Genome Analyzers, ABI Solids and 454FLXs (it’s notable that only the Illumina range of instruments still exists). But it’s otherwise not very instructive.

The BGI history page, tells us a little more stating that “On July 14, 1999, BGI was founded with the mission of 1% of the human genome for the International Human Genome Project.”. In China research funding comes from the Ministry of Science and Technology . I would imagine funding came from the “973” program. Which was China’s basic research program until 2017, when it was supposed to be replaced by something else. 

Early versions of the Chinese wikipedia page on the BGI state that after the completion of the human genome project the BGI relocated to Hangzhou in exchange for local government funding, and then in 2007 they announced that they were to relocate to Shenzhen to establish China’s first private non-profit research institution, in 2008 BGI Shenzhen was approved by the Shenzhen Municipal Government to become a public institution. So from what I can tell they started off firmly as a non-profit research institution and remained so until at least 2008. What’s less clear to me is what kind of entity this was (non-profit? state owned? private company?).

2010 seems to have been the turning point for the BGI. They received a 1.58B USD line of credit from the China development bank [1], and funds from Shenzhen Capital Group [2]. It also looks like Taikang Life Insurance [2] and Sequoia may be an investors [4]. Then in 2013 after a long and seemly slightly painful process they acquired US DNA sequencing startup Complete Genomics [3]. Finally on the 14th July 2017 they were listed on the Shenzhen stock exchange (company 300676) completing their transition in to a somewhat surprisingly highly commercial research institution.

The Chinese wikipedia page now lists 6 business units: BGI Research Institute, BGI Technology Service Co., Ltd. , BGI Health, BGI Agriculture, BGI Genetics Institute and BGI Cloud Computing.

Glassdoor reviews for BGI Shenzhen seem pretty reasonable (mostly complaining about the pay). Glassdoor reviews for Complete Genomics post acquisition seem to indicate that much development work was transitioned to China, and tantalizingly that the “omega sequencer project” was killed.

That about wraps it up for the business side of things. I will continue to add information as I come across it (not sure what information Chinese companies need to make available, but if anyone knows I’d be most interested in revenue projections etc.). I’ve covered the technology they acquired from Complete Genomics and plan to cover other aspects of their business too.

Notes

[1] https://www.technologyreview.com/s/511051/inside-chinas-genome-factory/

[2] https://www.crunchbase.com/organization/bgi-2/funding_rounds/funding_rounds_list

[3] http://www.genomics.cn/en/news/show_news?nid=99460

[4] https://www.sequoiacap.com/china/en/companies/