
The phrase “garbage in, garbage out” dates back to at least 1957, but it has certainly come back into vogue with the rise of artificial intelligence (AI) and large language models (LLMs).
As with the early computers of the 1950s, AI can produce accurate and trustworthy outputs in a fraction of the time of manual efforts, but only when equally as accurate and trustworthy data is entered to train the algorithms.
That means for AI to truly help healthcare achieve its clinical quality, outcome and efficiency goals, the industry needs also to solve a fundamental challenge that has existed since the days of paper charts: the pervasive issue of poor-quality clinical data. Without addressing the core issue of data integrity, AI cannot deliver on its promise to reduce clinician burnout, ensure compliance, or generate meaningful return on investment.
The data quality crisis: How we got here
Clinical data quality issues have existed since the inception of record-keeping. The digital shift beginning in the first decade of the 2000s, while intended to improve access and legibility, introduced new complications, particularly around how information is recorded, coded, and interpreted.
Likewise, ambient listening technologies and AI-generated documentation have made creating and recording errors faster and easier. Clinicians are increasingly treating these tools as “set-it-and-forget-it” solutions, trusting AI to accurately capture and summarize clinical conversations. However, too often these tools generate incorrect, incomplete or misleading data – often called “hallucinations.” When clinicians relinquish their oversight role, hallucinations can create a ripple effect throughout the entire healthcare ecosystem.
Consider the typical example of tobacco use documentation. There is a significant clinical difference between “never smoked” and “not currently smoking,” yet both may be lumped together or misrepresented in a structured EHR field. This kind of subtle data inaccuracy can have significant downstream implications, from skewed risk assessments to inappropriate treatment recommendations.

What Are Healthcare Organizations Getting Wrong about Email Security?
A new report by Paubox calls for healthcare IT leaders to dispose of outdated assumptions about email security and address the challenges of evolving cybersecurity threats.
The financial and operational fallout
The consequences of flawed clinical data are both personal and systemic in nature. At the individual level, patients may suffer from misdiagnoses, treatment errors, or even denials of life insurance coverage due to inaccurate records. For instance, a patient’s discussion with his physicians about his father’s liver cancer may inadvertently be forever recorded as that patient’s cancer diagnosis. This error could then follow that patient wherever they seek care, causing confusion among clinicians and impacting care decisions.
At the organizational level, inaccurate data directly undermines critical business operations. Medicare Advantage risk adjustment (RAF) scoring, population health analytics, and budgeting all rely on precise clinical documentation. When structured and unstructured data is incorrect, organizations face revenue shortfalls, increased audit risks, and diminished trust among executives and clinicians in the data driving strategic decisions.
Human involvement remains essential
To avoid these consequences, before clinical data enters AI pipelines, it must be validated, cleaned, and optimized. This involves ensuring correct terminology, accurate mappings to coding systems, and eliminating duplicative or contradictory entries. Moreover, organizations must adopt an operational mindset that prioritizes continuous data quality oversight because even the most sophisticated AI systems cannot correct flawed inputs without human guidance.
In a striking paradox, the very AI technologies introduced to streamline clinical workflows are now spawning new challenges, ones that require more sophisticated AI tools to remedy. LLMs, for instance, excel at pattern recognition and cross-referencing. They can be employed to flag discrepancies within medical records, such as mismatches between diagnoses and supporting documentation, or identify inconsistencies like changing genders within a single note.
More sophisticated systems perform pre-processing, also known as “clinical data washing,” to assess the plausibility of clinical data before it is used for decision-making or analytics. These systems alert clinicians to potential errors, enabling human oversight before mistakes propagate throughout the EHR and interoperability networks.
Still, any such approach must maintain clinician involvement. While automation can assist in identifying issues, only a qualified provider can verify and correct the information. This “human-in-the-loop” model is essential to ensuring trust in AI-generated documentation.
Sharing the responsibility
Responsibility for accurate clinical data does not rest solely with providers. In the modern health IT environment, patients are increasingly involved in the data validation loop. With open notes and patient portals now commonplace, individuals can and should review their records for errors. Concurrently, healthcare systems must also establish straightforward mechanisms for patients to identify and correct inaccuracies without encountering bureaucratic delays.
While directly editing historical records is federally prohibited and ethically forbidden, organizations can append clarifying comments to the record that indicate inaccuracies, corrections and the date they were made. This creates a transparent and legally mandated audit trail that also ensures that downstream users, such as clinicians, payers, or emergency room providers, have accurate context for interpreting the data.
Regulatory guidance on the horizon
As AI becomes more integrated into healthcare delivery, governance will be critical. The Department of Health and Human Services (HHS) and other regulators have begun developing guidelines for the responsible use of AI, but these frameworks are still in the early stages of development. Healthcare organizations must proactively establish internal governance structures that define how AI is implemented, audited, and monitored, with data quality as a central pillar.
Ultimately, resolving the data-quality crisis is foundational to addressing all other issues. If healthcare leaders hope to demonstrate ROI on AI investments, reduce clinician burnout, and meet compliance requirements, they must first ensure the integrity of their clinical data.
Before any AI model is trained, any dashboard is built, or any predictive insight is generated, we must be certain the data is accurate – and not full of garbage. If we want to unlock AI’s full potential in healthcare, we must ensure data accuracy.
Photo: marchmeena29, Getty Images
Dr. Jay Anders is Chief Medical Officer of Medicomp Systems. Dr. Anders supports product development, serving as a representative and voice for the physician and healthcare community that Medicomp’s products serve. Prior to joining Medicomp, Dr. Anders served as Chief Medical Officer for McKesson Business Performance Services, where he was responsible for supporting development of clinical information systems for the organization. He was also instrumental in leading the first integration of Medicomp’s Quippe Physician Documentation into an EHR. Dr. Anders spearheads Medicomp’s clinical advisory board, working closely with doctors and nurses to ensure that all Medicomp products are developed based on user needs and preferences to enhance usability.
This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.