Over the past two years, much of the conversation around AI in digital behavioral health has centered on operational copilots. We’ve seen real progress in areas like documentation, administrative workflows, and intake. These tools make a meaningful difference for providers by reducing burden and improving efficiency. For many clinicians, the late-night charting that used to bleed into the evening is finally starting to ease.
But in behavioral healthcare, some of the most important decisions don’t live in documentation or workflows. They live in judgment in terms of how clinicians interpret information, apply knowledge, and make calls in situations that are often ambiguous. Two experienced clinicians can look at the same case and reasonably come to different conclusions. Sometimes that’s appropriate, but often that variation reflects unclear criteria, inconsistent application, or differences in training and experience.
What’s starting to emerge is a shift away from just thinking about AI as an operational tool that helps with the completion of tasks, toward thinking about it as part of a system that helps make decisions more consistent over time. The goal is not to replace clinicians, but for clinicians and agents to work in tandem to make clinical reasoning more explicit and consistent so that every patient receives the benefit of their clinician’s best thinking.
Moving from tools to systems
Most people still think about AI as a one-to-one relationship with one clinician and one AI assistant. But in practice, decisions, especially in areas like intake and clinical fit, are shaped by multiple inputs, different interpretations of criteria, and sometimes disagreement. A more useful way to think about AI in this context is as one layer within a broader decision-making system. This could look like a clinician making an initial judgment, an AI layer that structures and pressure tests that reasoning using standardized clinical criteria and historical patterns, and a clear AI escalation path as well as human oversight that remains accountable for the final call.
Clinical fit as a real-world example
Clinical fit determination is a good illustration of how this can work in practice. In many digital behavioral health settings, clinical evaluators conduct intake assessments and make independent determinations about whether a patient is an appropriate fit for care. A more sophisticated approach introduces AI-supported layers that generate structured outputs based on the information collected— such as recommendations, confidence levels, clearly defined criteria, and prompts to clarify missing or ambiguous details.
The Power Behind Enterprise EHR Software for Large Healthcare Systems
Enterprise EHR boosts scalability, interoperability, and governance for large healthcare systems.
In the majority of cases, the evaluator and the AI agree. That alone is useful in that it reinforces consistency and gives the team a shared reference point. But the more useful cases are the ones where there is disagreement.
Disagreement enhances the system
When the evaluator and the AI disagree, it can trigger a structured escalation where the case goes to a second AI layer, or supervisor agent, that provides another structured perspective, alongside a human supervisor who makes the final decision. What results is a layered decision that incorporates the evaluator’s original judgment, the initial AI output, a second AI perspective at the supervisory level, and the supervisor’s own review all feeding into one final decision.
Over time, these disagreements reveal patterns that improve the whole system in terms of how criteria are applied, how the AI interprets cases, and where the underlying rules need to be sharpened. This means that each disagreement improves the system by helping to calibrate clinicians, refine the AI, and clarify the underlying rules.
Being deliberate about orchestration and where humans sit
The “human-in-the-loop” approach is critical to responsible patient care in digital behavioral health, and the term itself has become a standard way to signal safety. A key question related to this approach is where the human sits in the loop. This is because not every part of a workflow requires the same level of human involvement. For example, data organization and question generation can be heavily AI-supported. These capabilities are increasingly informed by domain-specific clinical patterns and structured data. But with decisions that affect access to care or treatment planning, especially in ambiguous or higher-risk situations, humans remain the ultimate decision-makers while actively partnering with AI to challenge assumptions, surface blind spots, and sharpen their reasoning in real time.
The design choices behind this orchestration matter as much as the technology itself. Which decisions belong to AI, which belong to humans, and which require both working in sequence need to be defined upfront, with clinical and technical leadership working closely together. This is particularly important when the system is learning from real-world clinical inputs; the goal is not just to use the data, but to continuously refine how it is applied in real decisions. And these choices need to be revisited regularly as the system matures, because a strong orchestration model evolves as you learn.
Why governance matters
None of this works without strong guardrails. That means keeping patient data in secure, compliant environments, maintaining clear human accountability, monitoring performance over time, and creating feedback loops that allow the system to improve safely. Done right, this builds trust over time.
What this unlocks
When a system like this functions well, the effects show up across multiple dimensions. Clinically, decisions become more consistent, rationale becomes clearer, and edge cases get handled with more rigor. Operationally, escalations become more focused, fewer in number and richer in detail, and higher rates of agreement between evaluators and AI mean clinicians can spend more of their time on the judgment calls that actually require them. And over time, the system learns from disagreement and patterns.
The next phase of AI in healthcare is about building systems where human clinicians, supervisors, and AI can all contribute, challenge each other, and improve how decisions are made. The organizations that get this right will be the ones that are most deliberate about where AI fits, where humans lead, and how the two sharpen each other over time.
Photo: Irina_Strelnikova, Getty Images
Parker Phillips is dedicated to leveraging technology and AI to scale access to high-quality mental health care for young people. As CTO, he has developed the company’s technology vision and led the adoption of AI to power its innovative, insurance-based virtual care model for anxiety and OCD. The platform is designed to enhance therapeutic impact and drive operational scale, demonstrating that a value-based model can deliver both superior care and strong economics. Drawing on experience building teams and technology at Commure and Palantir, Phillips focuses on creating systems that address the critical need for accessible, evidence-based mental health treatment.
Dr. Kathryn (“Kat”) Boger is a board-certified child and adolescent psychologist dedicated to helping young people with anxiety and OCD through innovative, evidence-based care. She co-founded the McLean Anxiety Mastery Program (MAMP) at McLean Hospital, a nationally recognized intensive treatment program, and served as an assistant professor of psychology at Harvard Medical School. Dr. Boger has published peer-reviewed research, delivered national talks including a TEDx, and trained hospitals, schools, and communities. In 2024, she was named a Top 50 Digital Health Frontline Hero. She also co-founded InStride Health to expand access to timely, effective care for youth and young adults.
This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.
