Artificial Intelligence, Hospitals, Providers

How Mass General Brigham Decides Which AI Tools Are Worth Scaling

Mass General Brigham evaluates AI tools by carefully monitoring real-world performance before scaling them system-wide. At HIMSS earlier this month, Rebecca Mishuris, the health system’s chief health information officer and vice president of digital, explained what this process looks like.

Deciding which AI tools to adopt isn’t about hype — it’s about evidence and real-world impact, according to Rebecca Mishuris, chief health information officer and vice president of digital at Mass General Brigham in Boston.

Productivity tools and secure access to large language models are among the most useful applications of AI at the health system, she said during an interview earlier this month at the HIMSS conference in Las Vegas.

For instance, Mishuris noted Mass General has seen strong uptake of Microsoft Copilot, which helps clinicians draft emails, summarize information and generate presentations. 

She also pointed out that the health system has built secure internal access to large language models, which allows clinicians and researchers to safely experiment with AI while using protected health information. That access has already enabled researchers to build an AI agent that can summarize a new patient’s decades of medical records for clinicians before a visit, Mishuris said.

In general, she said Mass General has “cautious optimism” when it comes to AI. 

“We do see the real transformative ability of a lot of the generative AI applications, but we also work very hard to ensure that we are deploying them in a safe way, one where we are safeguarding the care that we’re delivering, and also the privacy and security of the data that we’re using,” Mishuris remarked.

Any AI deployment must demonstrate a clear positive impact on the health system without compromising those standards, she added.

In order for an AI deployment to be successful, Mishuris noted that health systems need to align people, processes and technology. Technology alone isn’t enough — she said Mass General invests heavily in AI education for staff, helping employees understand what generative AI can and cannot do, how to use it safely, and how it fits into workflows.

Once an AI solution is launched, it must be monitored in multiple layers, Mishuris stated. She described three types of monitoring at Mass General: real-time monitoring during patient care to catch potential hallucinations immediately, short-term retrospective monitoring days or weeks later to review model outputs at scale and identify potential issues, and ongoing performance monitoring to ensure tools continue delivering their intended outcomes.

But overall, Mishuris emphasized that measuring success depends on the problem the AI is intended to solve. 

There is no universal measure of AI success — for example, a tool aimed at reducing clinician burnout should be evaluated differently than one designed to improve revenue cycle efficiency, she explained.

She also pointed out that AI should be judged against real-world performance — not perfection.

When evaluating AI tools, the comparison should be how they perform relative to current workflows. In some cases, humans already make similar errors, so the key question is whether AI performs as well as or better than the status quo.

“There was actually a study out of California that showed that humans hallucinate just as much as the computer does when doing a discharge summary for a patient in the hospital. And so if you get a result like that, if it’s the same, if the humans are hallucinating and the computers are hallucinating, then what is the risk of moving to the computer?” Mishuris remarked.

Ultimately, she said, the value of any AI tool comes down to whether it meaningfully improves workflows or patient care compared with the reality clinicians face today.

Photo: Malte Mueller, Getty Images