Health IT

Case, UCLA, Harvard look to data provenance to reconcile multiple sources of big data

The hope is that this NIH-funded data provenance project will help standardize and reconcile inconsistencies so clinicians have better data to bring to the bedside and run through clinical decision support systems.

In June, the National Institutes of Health awarded nearly $7 million in grants for its Big Data to Knowledge (BD2K) initiative, funding three projects on data provenance, led by Case Western Reserve University, the University of Pennsylvania and Duke University. The smallest grant, slightly more than $900,000, went to Case Western Reserve University School of Medicine in Cleveland to build and test an engine for managing data provenance.

Healthcare data often is inconsistent, making it difficult to integrate information from disparate sources. “Many research projects are multidisciplinary and multi-institutional,” noted principal investigator Satya Sahoo, an informatics specialist in the faculty of the Department of Epidemiology and Biostatistics at Case. The hope is that this provenance project will help standardize and reconcile inconsistencies so clinicians have better data to bring to the bedside and run through clinical decision support systems, he explained.

Sahoo, who holds a Ph.D. rather than a medical degree, will be leading a team that includes researchers from Harvard University and UCLA to build a provenance engine based on a Web technology standard called PROV. The researchers then will test the provenance engine with de-identified data of real patients with sleep disorders, epilepsy and/or lung cancer, from multiple sources.

“We want to make sure the quality data is reliable and consistent,” Sahoo said. Metadata should help the researchers trace back the data to its origin.

At Case, research will focus on one of Sahoo’s deepest interests, sudden unexpected death in epilepsy (SUDEP). “I’ll be collaborating with a clinician who’s an expert in SUDEP,” he said.

That clinician, Dr. Samden Lhatoo, director of the Epilepsy Center at University Hospitals Case Medical Center, will be working with Sahoo and other clinicians at Harvard and UCLA to integrate data from medical images, electroencephalograms, patient discharge summaries and sleep studies. UCLA’s William Hsu, a medical imaging informaticist specializing in lung cancer, will provide imaging reports to the provenance engine.

The technology eventually will feed standardized data back to the point of care. NIH funders “want to see the quality of data in real time,” Sahoo said.

“We will be using cloud computing infrastructure to scale it,” Sahoo added. He stated a belief that the scope of the project and the use of the cloud make this research unique in medical informatics.

Photo: BigStock Photo

Shares0
Shares0