Daily

800 shredded phonebooks, and shrinking next-gen data analysis sequencing times from 24 hours to 18 minutes

A San Diego company is bringing nex-gen gene sequencing times from 24 hours to just 18 minutes, saying it can cut cost and time without compromising on quality. Edico Genome is developing what it calls the Dragen Bio-IT processor, a bioinformatics chip that is “solving the genomics data analysis bottleneck.” The Dragen chip will be particularly effective in areas […]

A San Diego company is bringing nex-gen gene sequencing times from 24 hours to just 18 minutes, saying it can cut cost and time without compromising on quality.

Edico Genome is developing what it calls the Dragen Bio-IT processor, a bioinformatics chip that is “solving the genomics data analysis bottleneck.” The Dragen chip will be particularly effective in areas like noninvasive prenatal testing, cancer diagnostics and drug treatment selection – in fact, it just added NIPT player Sequenom to its growing list of otherwise unnamed customers.

“A lot of data is being generated in sequencing today, but it’s being processed by a large number of servers,” President and CEO Pieter van Rooyen said. “What we do is we get rid of those servers, and then we put all that processing power required to do sequencing onto our bioinformatics chip.”

This could cut a company’s need for a heavy IT infrastructure, as most of the work would be done not by a network of servers or an infotech team, but a single chip.

The company’s rising fast: It raised $10 million in July, with funding coming from Qualcomm Ventures, Axon Ventures and former Life Technologies president and CEO Greg Lucier. The 2-year-old company has an all-star cast of advisers that includes Eric Topol over at The Scripps Research Institute and Nils Homer at the Broad Institute.

The company’s value proposition is pretty solid:

“For example, just one person’s genome is a 200-gigabyte file,” van Rooyen said. “To do a large population’s whole genome sequence, and quite a few institutions are doing that now, it becomes completely unfeasible to upload this data.”

presented by

The company has found that a single Dragen processor can analyze the 18,000 genomes-worth of data from Illumina’s HiSeq X Ten system in one year. With standard tech, the same data can only be crunched with more than 50 high-end servers – quite the costly endeavor, van Rooyen said. Edico’s chip could bring a company some $6 million in savings per year when processing a data pool of that size.

Edico essentially cuts down the size of the data file – reducing that 240 gigs to 240 megabytes. This is how van Rooyen described it:

“If we look at the data file that comes out of the sequencer, the data’s completely jumbled,” van Rooyen said. “It’s like taking 800 phone books and putting them in a paper shredder – then trying to reconstruct them all perfectly.”

Edico reassembles the phone directories, and places the individual genomic data against a set of reference genomic data to help pinpoint any outstanding variants – a factor that will help position it for the clinical genomics space, van Rooyen said. This, more so than research and academia, will prove to be the driving force in market uptake, as clinical genomics is very much on the rise.

As van Rooyen said, the company’s bioinformatics chip improves upon the current systems that rely solely on either servers or cloud computing. Here’s its own, nitty-gritty descriptor of the technology:

The Dragen Bio-IT Processor is integrated on a PCIe form factor card and is provided with accompanying software as a Platform-as-a-Service (PaaS) that can be integrated directly into sequencing instruments and NGS bioinformatics servers. The processor is loaded with highly optimized algorithms for mapping, aligning, sorting, deduplication and variant calling. Dragen can accurately analyze over 50 whole human genomes (from FASTQ to VCF) in less than a day, and lowers costs by reducing large servers, storage space and IT infrastructure.

Dragen is also able to compress and decompress large genomic datasets rapidly with standard CRAM-based compression, requiring only about five or six minutes for the process.