MedCity Influencers

Addressing Covid-19 is a data problem – here’s how the data community is harnessing healthcare data to solve it

Data can allow us to improve our public health response to the pandemic, but only if we enable data scientists with the right tools to harness those datasets.

With a global health crisis such as the Covid-19 pandemic comes an enormous amount of rapidly changing, important healthcare data – from the number of confirmed cases by region to hospital ventilator and PPE inventory. Leveraging data-driven insights in real-time is imperative for leaders making critical decisions at the frontlines of this pandemic, but many healthcare organizations are struggling to rapidly and efficiently harness the overwhelming streams of data to meet the demands placed on the healthcare industry.

In response to this challenge, data teams worldwide are mobilizing to help solve the most pressing problems of the pandemic. Modern data platforms are providing powerful processing tools that enable researchers, clinicians and administrators at hospitals, government and pharmaceutical organizations to aggregate and analyze diverse datasets to provide actionable insights for decision-makers.

Sponsored Post

Physician Targeting Using Real-time Data: How PurpleLab’s Alerts Can Help

By leveraging real-time data that offers unprecedented insights into physician behavior and patient outcomes, companies can gain a competitive advantage with prescribers. PurpleLab®, a healthcare analytics platform with one of the largest medical and pharmaceutical claims databases in the United States, recently announced the launch of Alerts which translates complex information into actionable insights, empowering companies to identify the right physicians to target, determine the most effective marketing strategies and ultimately improve patient care.

Hospital systems
Digital transformation has been a slow burn for most hospital systems, but Covid-19 has ignited an accelerated effort, especially in the move toward analytics of consolidated health records. Hospitals use multiple different Electronic Health Record (EHR) systems that have complex data storage and analytics architectures. These complex architectures mean that these systems can’t easily interact, making it difficult to collect all the data that will provide a complete picture of a patient, which is necessary for building accurate machine learning models. Ultimately, data scientists working with legacy EHR architectures spend more time freeing data from their EHR, and less time building innovative models that can improve patient outcomes.

Data scientists are helping push healthcare systems to open interoperable systems that enable seamless analytics across hospitals. Experts advocate for hospital data teams to load EHR data using the open HL7 and FHIR APIs into open source technologies built for analytics like Apache Spark and Delta Lake. As data from EHRs flows more instantaneously, hospitals can build applications to streamline and even automate processes based on real-time information. For example, the data team at a hospital in South Carolina has harnessed the power of AI to build an app that helps caregivers predict a patient’s risk for sepsis and treat them accordingly.

This approach can be generalized to a number of use cases: other hospitals use streaming data from EHRs to predict patient surges, ER overcrowding, ventilator inventory and other important operational considerations. By using a unified data platform to blend EHR data with staffing data, a major multi-state hospital system is producing overcrowding statistics by department within less than five minutes of patient intake, which has been crucial in handling patient surges caused by Covid-19.

As the novel coronavirus first began to surge in the U.S., one of our partners noticed that most hospitals didn’t know whether a patient surge was likely, or whether they had enough supplies to care for a sudden influx of patients. Data engineers created a specialized AI live streaming app now in use at several hospitals that takes EHR data and builds out predictive dashboards showing how many patients they were likely to receive, ventilator capacity and peak ventilator usage. As healthcare system needs evolve throughout the pandemic, so will the volume of data being generated and how it can be utilized to meet challenges head-on.

Government agencies
Data analytics technology is also helping government agencies at the national level rapidly generate up-to-date data sets and run predictive models. This allows them to optimally allocate resources, provide data to public health research efforts and, ultimately, curb the spread of Covid-19. Because much of the data important to government response is being generated in hospitals, the data must be easily communicated between healthcare systems and governments for policy development.

On a truly unified platform, government data teams can take internal data sets and prepare them for consumption by approved researchers and the public. Internationally, we’ve been working with a national public health service to build an environment for researchers to access. All relevant health data within this country lands in this environment where government-approved data researchers can work on the data directly without having to worry about setting up the environment themselves or ensuring the system is properly secured. These tools are enabling public health leaders to make data-driven policy decisions as their best and brightest data scientists and researchers work together on pressing public health issues.

Pharmaceutical and diagnostics companies
While hospitals meet immediate healthcare needs and governments focus on controlling the spread, many pharmaceutical and diagnostics companies are focusing their energies on developing diagnostic tests, treatments, and vaccines for Covid-19. These organizations need help analyzing both clinical and real-world observational health data to determine how a drug is being used both in clinical trial settings and compassionate use to treat patients impacted by the virus.

A lot of biopharma and diagnostics companies have been using data about the spread of coronavirus to adjust their clinical trials and manufacturing programs rapidly. With this data, they can make sure that medicines and tests are still getting to patients in need, and they can look at how the novel coronavirus is impacting patients in the disease areas that they are focused on.

The challenge they face, however, is the right teams having access to the right data in a timely manner. In the life sciences, data scientists may be skilled in statistics and coding but aren’t typically experts in cloud computing. It’s logistically hard to share one’s findings with other teams across an organization, which may mean that data scientists are working on data that is delayed by days or even weeks.

Moving forward with data
The pandemic has demonstrated how critical technology modernization is to speed the delivery and analysis of data. Cutting edge computing architectures like cloud data lakes enable healthcare and life science organizations to enable analytics on data in real-time, while notebooks make it easy for data scientists to share information and analysis with their colleagues.

Throughout this crisis, the phrase “we’re all in this together” has been repeated often. The data community across industries has embraced this motto and wants to help, and the only thing holding them back is the need for easier access to larger datasets. I have firmly believed that using open-source software to broadly integrate data across multiple biomedical sources is critical to deepening our understanding of disease. Data can allow us to improve our public health response to the pandemic, but only if we enable data scientists with the right tools to harness those datasets.

Photo: metamorworks, Getty Images



Frank is the Technical Director for the Healthcare and Life Sciences vertical at Databricks. Prior to joining Databricks, Frank was a lead developer on the Big Data Genomics/ADAM and Toil projects at UC Berkeley, and worked at Broadcom Corporation on design automation techniques for industrial scale wireless communication chips. Frank holds a PhD and Masters of Science in Computer Science from UC Berkeley, and a Bachelor’s of Science with Honors in Electrical Engineering from Stanford University.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.