Health Tech

How Should Providers Begin to Regulate Their Staff’s Use of ChatGPT?

The use of LLMs in healthcare is still incredibly nascent, so many experts are calling for a safety framework to make this uncharted territory feel safer. There are still a lot of unknowns, but avoiding clinical use cases and not inputting patients’ health information seem to be the most important guardrails providers are putting around LLMs at the moment.

The healthcare sector has a reputation of being notoriously slow to adopt new technologies. However, the field doesn’t seem to be immune to the lure of artificial intelligence — including large language models (LLMs), which is the hottest new category of AI to enter healthcare. In fact, providers have embraced these AI models, such as ChatGPT and GPT-4, with much more enthusiasm than might have been expected from a sector that still uses fax machines.

As of October, the FDA has authorized over 500 AI-powered medical devices. While it’s reassuring to know that the regulatory body is paying attention to AI’s evolving impact on the healthcare industry, providers are still waiting on the agency to release guidelines about how providers should responsibly approach the use of LLMs. Last month, FDA Commissioner Robert Califf declared that the country needs to be “nimble in the use and regulation of” LLMs so that it isn’t “swept up quickly by something that we hardly understand.” While health systems wait for the government to determine what that regulatory framework looks like, some are developing guardrails of their own.

The use of LLMs in healthcare is still incredibly nascent, but healthcare experts agree that use of these AI models will only grow in the coming months. Healthcare’ legacy firms are backing these new technologies — for example, Epic is integrating GPT-4 into its electronic health record. And healthcare organizations across the country are piloting LLMs to figure out the best use cases — from triaging symptoms to automating patient communication to helping medical students prepare for real-life patient visits.

Understanding how healthcare professionals are using LLMs

“Generative AI has already been let out of Pandora’s box,” Azizi Seixas declared in a recent interview.

Seixas is the interim chair of the department of information and health data science at the University of Miami’s Miller School of Medicine. He said his organization is exploring potential use cases for LLMs — from creating more personalized patient engagement to triaging patients’ concerns via telemedicine to giving doctors more detailed patient history reports.

While his organization is excited about LLMs and their potential to alleviate healthcare’s bottlenecks, it also recognizes the privacy and reliability risks at stake, Seixas said. 

The University of Miami Health System is in the process of rolling out seminars, webinars, workshops and written guidelines focused on the responsible use of new AI models like ChatGPT, Seixas explained. 

In his view, one of the most important guardrails that a health system can build is ensuring that staff have a deep understanding of data immortality as it pertains to LLMs. In other words, the data that trains AI models lives forever, so people need to know that using any form or proprietary or identifiable data to train LLMs is off-limits.

David Higginson, CIO of Phoenix Children’s Hospital, agreed with Seixas. He pointed out that the way most hospitals currently use LLMs is quite informal — for the most part, people are simply pulling up ChatGPT on their browser and experimenting.

Phoenix Children’s has had to step in and let its staff know how critical it is that they don’t input any health records or personal information into LLMs, Higginson declared. While some hospitals may consider blocking ChatGPT pages across their browsers, he said Phoenix Children’s did not choose to do this because many of its nonclinical staff members have been using these pages to make mundane processes more efficient.

For example, the hospital’s nurses have been using LLMs to refute payers’ denial letters.

“There are nurses who spend all day writing letters to the insurance company, and they have a bunch of information they have to go look up to put that letter together. But they played with ChatGPT with no PHI and found it was pretty good at writing that letter for them. And they went from being authors doing three letters a day to being editors who are able to crank out nine letters a day,” Higginson explained.

The hospital is also using LLMs to write educational material for its patients’ families, such as a guide explaining how to care for a child who has been diagnosed with type 1 diabetes. These types of guides used to take a long time for nurses to put together, but now they simply edit content instead of producing it from scratch.

Ensuring safe exploration

These nonclinical use cases are already alleviating nurses’ workloads. There are “so many opportunities” to eliminate inefficiencies in the areas of hospital operations that don’t involve direct patient care, so that’s where hospitals should focus their use of LLMs, Higginson argued. An important part of Phoenix Children’s safety framework for these AI models is staying away from use cases that involve diagnoses or clinical patient interactions, he added.

“Let’s start [in the nonclinical space] and learn as we go, because I would question anyone who claims to have come up with a great governance model already. We just haven’t had enough experience yet,” Higginson declared.

Beth Mosier, director of healthcare and life sciences at consulting firm West Monroe, said that all providers should take a page out of Phoenix Children’s and the University of Miami Health System’s book and begin educating their staff about which LLM use cases to avoid.

“I think there needs to be some sort of governance in place at health systems primarily — but also payers — to show the appropriate use cases and sources, as well as the appropriate types of data that you can share on those models,” she argued in a recent interview.

Mosier agreed with Higginson that providers should begin piloting LLMs in nonclinical settings. For example, a health system could implement an LLM to help answer patients’ questions about their bills or assist with appointment scheduling. These are examples of relatively safe use cases that don’t involve protected health information but can still improve providers’ workflows.

It’s important to remember that LLMs don’t replace people, but rather help them work more efficiently, Mosier pointed out. These AI models haven’t been designed specifically for healthcare yet (though Hippocratic AI is trying), so they will still involve a good amount of human interaction.

“Unless you’ve got a model that was trained specifically to do what you’re asking it to do, a level of human oversight and interaction is going to be required. If you’ve built a model for only oncology or only medical claims processing, then you’re going to have a higher rate of competence and precision in that model,” Mosier said.

LLMs purpose-built for healthcare will inevitably make their way into the market. When that time comes, Mosier is confident that the healthcare industry will be able to responsibly regulate these AI models. 

She drew attention to the fact that at the end of the day, LLMs are just another form of AI — and AI has been safely used by health systems for more than a decade.

Balancing innovation and risk

Another healthcare expert — Michael Abrams, co-founder and managing partner of healthcare consulting firm Numerof & Associates — agreed with Mosier, pointing out that federal regulatory bodies like the FDA already have ample experience when it comes to reining in AI.

Abrams acknowledged that LLMs are trickier to regulate than other forms of AI because they are so widely available to the public. He thinks the FDA’s most challenging task will be figuring out a way to regulate clinicians’ solo experimentations with ChatGPT — which anyone can access via their phone or computer. 

In fact, the University of Miami Health System is on a fact-finding mission currently as it relates to LLMs. It is sending out surveys asking staff how they use LLMs, Seixas said. 

“What we don’t want is for people to be living in the shadows and using it because that’s when you have data breaches,” he declared. “We want to bring it within the field and provide a very private, secure environment that is localized within the cloud. That way, it is impervious to any potential breaches.”

Seixas and his organization are attempting to take the necessary steps to regulate the new AI models that are bursting onto the scene, but he said there will always be some small degree of risk involved when a new technology is introduced.

“Innovation and risk go hand-in-hand,” he explained. “If you relegate [LLMs] to just administrative tasks, then I don’t know if you’re really testing the full scope. I believe that in order to be truly innovative and transformative as a healthcare system, you must actually take some risk.”

Healthcare has tried to make its operations run more smoothly through the use of AI for about a decade, and the field is still ripe with inefficiencies, Seixas pointed out. He thinks maybe it might be time to be a little less cautious.

This attitude is shared by John Ayers, a public health researcher at the University of California San Diego. He led a ChatGPT study published last month in JAMA Internal Medicine

The study compared two sets of written responses to real-world patient questions. One set was written by physicians, the other by ChatGPT. Both sets of answers were evaluated by a panel of licensed healthcare professionals, and the panel ended up preferring ChatGPT’s responses 79% of the time. The ChatGPT answers were deemed to have more detail and empathy — two things most doctors are too busy to provide.

Ayers pointed out that the healthcare sector is in dire need of technology to alleviate its burnout crisis and workforce shortage. In his view, the field should be setting clearer goal posts for new AI models instead of erecting guardrails. 

“Set a clear goal that people can achieve, such as making messaging in the EHR easier. Then you will see studies. If it works, you’ll see it implemented. If it doesn’t, you won’t see it implemented,” he explained.

Healthcare experts agree that LLMs need to be tested and watched closely, but they also share an understanding that these AI models’ benefits outweigh their risks. There are still a lot of unknowns, but avoiding clinical use cases and not inputting patients’ health information seem to be the most important guardrails providers are putting around LLMs at the moment.

Photo: hirun, Getty Images