Health IT, BioPharma

Genomics: Our most data-hungry industry?

Genomics will emerge as the leading user of big data, a new PLOS paper projects.

It’s estimated, after all, that between 100 million and 2 billion human genomes will be sequenced by 2025 – to say nothing of the genomic sequencing used on plants, animals and microbes.

Big data is an inextricable component of genomics, but it’s an important player as well in a number of fields – take social media, or astronomy. So what’s the projected growth rate of big data needs in genomics, and how does it compare to other data-hungry industries?

A new paper in PLOS Biology examines exactly this, comparing genomics with three other major generators of big data: Astronomy, YouTube and Twitter. Here’s how each bit uses the data:

The paper concludes that “now is the time for concerted, community-wide planning for the ‘genomical’ challenges of the next decade.” Take this figure that projects the genomic data growth rate through 2025:

presented by

Sequencing capacity will increase rapidly over the next decade – though it’s difficult to project growth rates, the PLOS piece says. This spans plant, animal and microbial genome analysis – on top of studying human genomics for disease research. The paper continues:

We therefore estimate between 100 million and as many as 2 billion human genomes could be sequenced by 2025, representing four to five orders of magnitude growth in ten years and far exceeding the growth for the three other Big Data domains.

Of course, the paper says that genomics pose some of the most “severe computational challenges facing us in the next decade.” It calls it a “four-headed beast” – given that it has high level demands in the areas of acquisition, storage, distribution and analysis. PLOS beseeches the scientific community for open-sourcing their work so as to promote more efficient big data use in genomics:

In human health, the major needs are driven by the realization that for precision medicine and similar efforts to be most effective, genomes and related ‘omics data need to be shared and compared in huge numbers. If we do not commit as a scientific community to sharing now, we run the risk of establishing thousands of isolated, private data collections, each too underpowered to allow subtle signals to be extracted. More than anything else, connecting these resources requires trust among institutions, scientists, and the public to ensure the collections will be used for medical purposes and not to discriminate or penalize individuals because of their genetic makeup.

Read the entire PLOS article here.