Why AI Fails in Healthcare Clinics (And What Actually Works)

A few months ago a clinic executive pulled me aside at a conference. The executive knew some of the clinics we partnered with and said: “We tried an AI Voice Agent. It didn’t bring the results we hoped for. What did we do wrong?”

It’s a question I hear often. Executives know AI should be solving operational problems – but too often it falls short. And the answer almost always surprises them: the problem is rarely the model. The reasons are more operational than technical.

The first question most clinics skip is also the most important: what do I want the AI to do and what context does it need?

The Hidden Administrative Tasks Draining Small Practices

Small practices play a critical role in healthcare delivery, but they cannot continue to absorb ever-increasing administrative demands without consequences.

By Michael Blackman, MD, MBA Chief Medical Officer, Greenway Health®

Not all patient interactions are equal. Some are transactional – checking availability, confirming appointments where the patient has a specific need and the right answer exists in a system somewhere. Others are relational – open-ended, clinically complex, and shaped by context no system fully holds.

AI Agents work well in the first category. I recently reviewed transcripts from a behavioral health clinic using an AI Voice Agent for their medication line. A patient called in, unsure of the name of their medication – they only remembered it was prescribed to help them sleep. The agent pulled up the patient’s chart, identified the medication consistent with the description, and confirmed it with the patient. No human needed. Call closed. That is AI doing exactly what it was built to do – because it had the context.

Clinical intake and follow-up sessions are different. An AI agent can read the last session note. It can surface the diagnosis, the medications, the treatment plan. The challenge is to observe what wasn’t written – the shift in a patient’s affect, the hesitation before answering, the thing the patient has never quite said directly but a therapist of six months would immediately notice. In behavioral health, that unwritten, unstructured signal is often the most clinically significant one.

The challenge isn’t giving AI access to a patient’s chart – it’s giving it the context that was never written down. Patient-clinician rapport is foundational in behavioral health. It shapes what a patient discloses, how they respond to questions, and how a clinician interprets what they’re hearing. Whether AI can meaningfully replicate that over time remains up for debate. As of today, AI deployed into clinical interactions has context on the notes – not the patient relationship. Now, what about when AI is deployed into the right interaction?

The Power of Real World Data to Study Women’s Health at Scale

Veradigm examines key clinical trends, comorbidity profiles, and treatment trends across adolescence, reproductive years, and peri-/post-menopause. Download it today!

By Veradigm and MedCity News

Even when AI is deployed into the right interaction with the right context, the wrong tool can still fail. In behavioral health, the gap between a generalist AI and a specialty-focused one shows up directly in revenue.

Let’s take Ambient AI Scribes as an example. A generalist scribe can identify session duration and might be able to map it to the right time-based CPT code. Determining the session duration and CPT code might be doable for a generalist scribe, but the nuance of identifying the session split is where it breaks.

A classic example is when a psychiatrist does therapy and medication management in the same appointment, they bill separately – a psychotherapy code alongside an evaluation and management add-on. A generalist scribe does not understand the nuance. A behavioral health-specific scribe understands the distinction and structures the note accordingly. That difference in documentation isn’t just compliance – it’s revenue. A generalist scribe undercodes, not because of the model’s underlying intelligence, but because it was not designed to identify the nuance.

The same principle extends beyond scribes. The distinction between a therapist’s workflow and a psychiatrist’s is significant – different documentation patterns, different billing logic, different clinical rhythms. The AI a clinic chooses should have direct experience with every specialty it offers, not just behavioral health broadly.

But even specialty-specific AI Agents can fail. If they can’t operate inside your systems, they can’t do the job.

There is a meaningful difference between an AI Voice Agent that answers a call, collects information, and hands it to a human to action – and one that reads your clinic’s calendar, verifies insurance, creates the patient record, and books the appointment end-to-end. The conversation looks the same. The outcome is entirely different. The first is a smart voicemail which creates a task for your team. The second is an AI worker which does the job.

This distinction matters. When clinics select an AI partner – especially for AI Voice Agents – the expectation is that the tool gets the job done. Without integration into your systems and workflows, that rarely happens. Clinics end up with an incomplete solution that never adds the expected value and fails at the exact use cases AI should be best at.

Before deploying any AI Agent, clinics should ask: how does this tool integrate into our workflows, and can it complete the job in our systems – or does a human still need to pick it up at the end? If the answer is the latter, the AI might handle a step in the process, but it won’t do the job.

That executive at the conference wasn’t alone. What I wish I’d said in the moment was simpler than a framework: before selecting any AI tool, ask what interaction you’re actually deploying it into, whether it was built for your environment and specialties, and whether it can complete the workflow end-to-end – or whether someone on your team will still be picking up where it left off.

The clinics getting AI right didn’t build the smartest model. They’re the ones who asked the right questions.

Image: Getty Images, erhui1979

Jost-Vincent Steiskal

Jost-Vincent Steiskal leads Product at mdhub (YC-backed), where he designs and deploys AI Agents across mental health clinics.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.