Artificial Intelligence and the Data Quality Problem No One Can Ignore

Artificial intelligence has rapidly progressed from a theoretical concept to an operational reality. As of 2025, the healthcare sector captures approximately 43% of all AI investments, representing roughly $1.5 billion in spending.

According to data from Silicon Valley Bank, nearly half (46%) of all healthcare investment is now directed toward AI technology; concurrently, Deloitte reports that organizations are dedicating an average of 36% of their digital initiative budgets to AI-driven projects.

AI-enabled tools are now embedded across clinical and administrative workflows including clinical decision support (CDS) and revenue cycle management (RCM). While these tools promise to transform patient care and financial performance, one foundational truth remains: AI is only as effective as the quality of the data upon which it is built.

The Mechanics of a More Connected Healthcare Ecosystem [Video]

Arbiter’s Anjali Jameson on hospital and payer alignment.

By Arbiter and MedCity News

To prevent scaling “bad data,” health IT and informatics leaders must understand the symbiotic relationship between data integrity and algorithmic output.

Four critical risks of poor data integrity in AI

A Q4 2025 survey of revenue cycle leaders revealed that 74% cite poor data quality as the primary barrier to successful AI adoption. Success is not driven by the complexity of the neural network or vendor promises, but by the integrity, completeness, and accuracy of the underlying data.

1. The scaling of embedded bias – Bias remains a top-tier risk in healthcare AI. Whether models are trained on massive aggregate datasets or narrow, specific ones, they carry the risk of being unrepresentative of the local patient population.

For instance, models trained on data from large urban academic medical centers often fail to translate to rural or community-based settings. When a model encounters unfamiliar clinical markers or documentation norms, it may ignore critical signals or display “algorithmic overconfidence” despite a lack of context.

Furthermore, the absence of a data point does not equate to the absence of a clinical issue — but an AI trained on poor data may interpret it exactly that way.

2. The persistence of invisible documentation gaps – AI is frequently marketed as a solution for closing clinical documentation gaps. However, AI cannot recognize what it hasn’t been trained to see. If specific patient populations or rare care pathways fall outside historical norms, the AI may fail to flag omissions. Over-reliance on automation can create a false sense of security, particularly in non-standard cases that require human clinical intuition.

3. The “industrialization” of errors – The primary value proposition of AI is scale. Unfortunately, scale also amplifies risk.

A documentation or coding error made by a human is typically localized to a single encounter. When that same error exists within a training dataset, the AI propagates that inaccuracy across thousands of encounters at machine speed. Without robust data governance, organizations risk standardizing inaccuracies into system-wide failures that are difficult and costly to remediate.

4. Erosion of clinical and operational trust – Frontline experts lose confidence in new AI solutions when outputs generate hallucinations, false positives, or questionable clinical recommendations. Once clinicians and other end users lose trust in an output, skepticism spreads to all other digital transformation initiatives.

Healthcare has seen this before: early EHR implementations that disrupted workflows without delivering value created years of physician burnout and resistance. AI adoption is at a similar inflection point. Credibility is earned through consistent accuracy.

Case in point: Autonomous coding and the data gap

Autonomous coding systems are typically trained on historical charts originally coded by humans. If historical accuracy averages 90%, the AI cannot reach a 95% “clean claim” standard without significant intervention.

This creates a data quality gap that must be bridged before moving to direct-to-bill automation. Without auditing and correcting historical data, full automation is delayed, requiring organizations to implement costly validation steps and hybrid workflows that offset the original return on investment of the AI.

Autonomous coding is only one example of bad data leading to poor AI outcomes. I expect to see many more as AI solutions are scaled across the healthcare enterprise.

A strategic path to AI scalability

According to McKinsey’s 2025 State of AI report, nearly two-thirds of organizations have yet to scale their AI projects across the enterprise. As we move into 2026, a disciplined, data-first approach is recommended.

To prepare for enterprise-wide implementation, healthcare technology leaders should prioritize these four pillars of data stewardship.

Audit historical records: Identify systemic inaccuracies in the data used for model training or tuning.
Establish accuracy baselines: Define what success looks like in a manual environment before measuring AI performance.
Remediate known gaps: Address inconsistencies in documentation and coding standards prior to automation.
Maintain human-in-the-loop (HITL) oversight: Ensure clinical and technical governance remains central to the deployment lifecycle.

AI is a force multiplier — but it will only multiply the quality of the foundation it sits on. Success belongs to the leaders who treat data integrity as strategic infrastructure rather than an afterthought.

Photo: Galeanu Mihai, Getty Images

Nicholas Raup

Nicholas Raup is Chief Strategy and Innovations Officer at e4health, a healthcare technology and consulting firm specializing in clinical, operational, and mid-revenue cycle solutions. He leads strategy and innovation efforts focused on streamlining health information data and workflows, optimizing coding, quality, and clinical documentation integrity, and addressing complex health IT operational challenges.

With more than 20 years of experience in healthcare technology, Raup has driven the delivery of software and services that reduce administrative burden for both providers and health plans while improving clinical and financial outcomes. Prior to joining e4health, he held leadership roles at Optum, Change Healthcare, and McKesson.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.