Data-Driven Transformation: The Role of Data Quality in Accelerating Healthcare Performance

Data is one of the main driving forces in healthcare’s ongoing transformation. As the industry continues its progress from fax-based communications and printed out patient forms, the data ecosystem informing advancements in care outcomes and healthcare economics is also evolving.

Yet, as organizations dive further into their datasets and explore how artificial intelligence (AI) and machine learning (ML) can help reveal new insights, one thing must remain at the forefront: data quality.

Understanding data quality and data hygiene

A fundamental aspect of processing large-scale data, particularly in healthcare settings, is data hygiene. Data hygiene refers to the processes that ensure the data is clean, i.e., free from duplicates and follows standardized and consistent formats across different sources.

Data hygiene is a critical aspect of the broader notion of data quality. Data quality is the process of verifying the timeliness and accuracy of a dataset. This process incorporates checks for gaps in the data, confirms relevancy, and ensures the data is complete.

Data quality and hygiene go hand in hand, starting as soon as data is ingested and taking the necessary steps to confirm that data is ready to continue in the data pipeline. After these checks, data can begin to inform a variety of analytics and algorithms, such as quality measurement and risk score analyses, clinical trial recruitment, or revenue cycle management. There are endless applications of data at work in healthcare, and each one is highly dependent on a strong, rigorous data quality and hygiene strategy.

The importance of data hygiene

Gartner reports that poor data hygiene has the potential to cost organizations nearly $13 million a year, which not only impacts the bottom line, but can create further inefficiencies and fragmentation in healthcare. As data travels further along the data pipeline, data management must go from, “is this data accurate?”, to “will this data do what is intended?”.

For example, good data may be considered clean and accurate if all the ZIP codes in the dataset are complete and represent real ZIP codes – there are no missing numbers or incorrect ZIP codes in the set. That data can then become the basis for analytics pertaining to a certain area. This allows health plans to identify potential disparities and focus analyses on areas of high socioeconomic risks, improving their population health efforts while increasing the speed and accuracy at which they address member needs. Or from a life sciences perspective, the ability to run sophisticated analytics on verified ZIP code data can support enhancements in clinical trial recruitment and diversity efforts.

In any use case, good data – and the models it informs – must be tested to ensure the desired outcome is produced with a certain level of reliability, which hinges on the initial checks for data quality, hygiene, and cleanliness. The quality of the outcome is really a function of the quality of the data that’s coming in.

Additionally, models must be checked for bias. While data can be considered clean and accurate, inherent biases may be present in the individuals working on the data and building the models or can occur if the solution is not being used for its intended use case. Increasing our awareness of, and checking for bias, is critical to correct models and combat the systemic inequities in healthcare.

Common challenges in maintaining data quality

Managing data quality never stops – it’s an ongoing process to mitigate the risk of bad data infecting a system’s outputs, especially considering the possibility that data can change as time goes on.

It goes without saying that the stakes are particularly high in healthcare. Bad data can lead to undesired care experiences or missed opportunities for better care. In risk adjustment, bad data may result in corrupt determinations on a population’s risk scores. Getting this wrong means that members won’t be treated for the right condition or may not even present the conditions that are being addressed – all due to actions informed by corrupt data.

Unstructured data can also influence outputs, especially considering the inconsistencies in how data is entered throughout the care continuum. It’s estimated that around 80% of medical data remains unstructured or unused after it is created. The volume of this data creates significant opportunities for leverage, but the unstructured nature of it makes it challenging to use.

Fortunately, the industry is shifting to more universal standards such as HL7® FHIR® and embracing TEFCA, so the organizations who are first to truly embrace data standardization efforts will have much to gain. Standardized data will start its journey through the hygiene and cleanliness process already ahead of unstructured data.

Ultimately, constant monitoring and verification of data as it flows through the data pipeline helps ensure data quality is maintained even as various touches occur and as data travels from different systems.

Applying AI to gain new insights

Once a strong data management strategy is in place, the natural next step is building AI-driven analytics to uncover more innovative insights, faster. Sophisticated data strategies may even use AI throughout the data management process, not just toward the end of the data pipeline, but to clean new data and enhance data hygiene.

However, as the focus on AI-driven processes increases, the human touch must remain in healthcare. Organizations are challenged to find the right balance of adjusting for human error and establishing an “AI copilot.”

On one hand, AI can help catch and correct data entry errors that occur throughout the data pipeline. If front-end eligibility data is used for back-end claims correction but has an incorrect address or last name, AI can help billers make the necessary corrections before the claim is submitted – improving their cash flow and decreasing friction in the claims process for health plans.

Yet, AI must be used to inform healthcare decisions versus have the final say. Especially in patient encounters, AI may raise treatment options or offer a diagnosis suggestion, but it’s the physician in the room who the patient will likely trust more, and who typically has a rapport with the patient. For example, Deloitte’s 2024 Consumer Health Care Survey found patients’ distrust in generative AI is growing, but they are much more comfortable with their physicians using AI as an assistive tool to convey information about treatments, interpret diagnostic results, and help diagnose conditions or illnesses.

Looking ahead in healthcare’s transformation

Clean data and large language models are charting the course for advancements in patient care, resource utilization, managing costs, and improving cash flow.

Data quality is the foundation for innovative discoveries. Paired with the power of AI and guided by human expertise, the opportunity for impact is exponentially greater as these three forces converge to advance healthcare further, faster.

Source: Getty Images, Peter Howell

Rajesh Viswanathan

Rajesh Viswanathan serves as the Chief Technology Officer for Inovalon. In this role, Mr. Viswanathan leads and is responsible for all aspects of the Company’s technology strategy, design, development, testing, production, infrastructure, operation, security, and maintenance.

Prior to Inovalon, Rajesh was Vice President for Cloud Native Services, Oracle Cloud Infrastructure (OCI) at Oracle Corporation, where he was responsible for engineering, product management, architecture, and solution engineering for OCI’s cloud-native platform. Before joining Oracle, Rajesh served as Product and Engineering Leader for Development and Management Services at Amazon Web Services (AWS) where he founded, built, operated, and scaled multiple new services to simplify AWS’ Cloud Management, operated and scaled AWS CloudFormation, AWS’ infrastructure-as-code offering, and owned and operated the data plane for EBS. Prior to AWS, Rajesh served as Principal Group Program Manager, Windows Live at Microsoft, and before Microsoft, he served as a Senior Software Engineer and Architect for Netscape/AOL.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.