BioPharma

Tumor mutation or DNA damage? New study calls genomic workflows into question

A new study to be published Friday in Science suggests a huge amount of genomic data may be impacted by oxidative stress and mutations that occur during DNA extraction and library preparation. This could confound scientists’ ability to identify low-frequency variants.

DNA data

There are well-known limitations when it comes to sequencing old or previously frozen DNA samples. Damage inevitably occurs with age, handling and storage, decreasing the accuracy and sensitivity of the sequencing data.

Fresh, high-quality samples, on the other hand, were believed to be relatively intact. But a new study to be published Friday in Science suggests that they may be routinely damaged through the standard DNA extraction and library prep protocol.

These steps add noise to the data and decrease scientists’ ability to identify rare genetic mutations within a diverse sample.

The findings have broad implications for the genomics field, the study authors from New England Biolabs said in a phone interview, particularly when researchers are looking for low-frequency variants that may not be present throughout the sample.

“We’ve shown that when you have variants within this 1-5 percent frequency range, those variants are confounded by the signal that comes from damage,” said Laurence Ettwiller, a research scientist at New England Biolabs and coauthor on the paper.

This is often the case with tumors, which can have many subpopulations of cells. Some of the cells may contain important mutations, such as those that confer resistance to certain drugs.

presented by

“What we rely on for calling those rare variants, especially in the case of cancer, is accurate sequencing,” said Ettwiller. “What we have discovered is that the sequencing has a lot of noise or errors that are due to damage. This interferes with the identification of those variants that are not common to all of our cells.”

The “artifacts” generated by the oxidative stress are indistinguishable from true mutations that occur at a low-frequency, the authors said. While rare, identifying such variants is critical for understanding the genetic diversity of a given cancer and the likelihood that it will reoccur.

To perform the study, the authors first created and validated an algorithm that could identify the damage present in data from fresh samples. Dubbed the Global Imbalance Value (GIV), the model registers the imbalance between variants detected in read 1 and read 2 in paired-end sequencing — a sign of mutagenic damage.

The algorithm was then applied to several public genomic databases to determine if the oxidation and damage they observed were widespread.

When the authors applied the GIV model to the 1000 Genomes Project and a subset of The Cancer Genome Atlas (TCGA) dataset, they found a lot of the same DNA damage.

Ettwiller estimates some 41 percent of the data sets in the 1000 Genomes Project and 73 percent of the sequencing runs in the TCGA subset show signs of DNA damage. As a result, one-third or more of G-to-T mutations may be false positives. Other variant calls are affected too, but less often, she said.

While the authors believe this damage arises through the DNA preparation, they couldn’t directly test this without having access to the sequencing preparation protocol used for the major studies. However, they said the damage is consistent with the oxidative stress that occurs through “acoustic shearing,” a standard method for DNA fragmentation.

If that’s the case, why has such a systemic issue not be flagged until now?

“In a normal sequencing run, it doesn’t just jump out. You have to look closely, it’s a rare event,” said Tom Evans, second author and scientific director for New England Biolabs’ DNA Enzymes Division. “Even for us, we were surprised to see the prevalence of the problem in the databases. I think for most people who don’t consider it, they wouldn’t even think to look.”

The authors also point to a 2013 paper in Nucleic Acid Research that raised some similar questions.

“We focused on cancer in the databases because we found it surprising,” Evans said. “But as you extrapolate this forward because the damage seems to be caused by the way the DNA is handled, you would expect damage to occur, say, in microbiome sequencing as well.”

As a result, the number one message the study authors have is simply to be aware. Ettwiller has established an open-source algorithm that allows researchers to estimate what damage may have occurred during library preparation.

The issue and the limitations of old, degraded or FFPE samples are well-recognized and controlled for, she noted. Scientists need to apply the same critical lens to high-quality samples.

Genomics tools are also moving forward in a way that could minimize the DNA degradation.

A spin-out from the Scripps Research Insitute, iGenomX has developed a library preparation workflow that bypasses many of the suspected DNA-damaging steps. CEO Keith Brown said the company’s kits require no chemical or physical fragmentation, no A-tail, no blunt end repair and no ligation.

“Any processing step that affects the integrity of the molecule being sequenced should be avoided if possible,” Brown said via email. “Fragmentation, in general, whether acoustic or enzymatic, will cause a loss of information, particularly when looking at the low frequency (needle in a haystack) information of interest.”

Photo: Big Stock 

Topics