Maneesh Goyal, chief operating officer of Mayo Clinic Platform, is a big believer in patient privacy, but not the way it has commonly come to be represented in the healthcare world: de-identified data, according to the HIPAA Safe Harbor method.
“A lot of organizations will take patient data and de-identify it, and once it is de-identified, it is no longer considered HIPAA data,” Goyal said in a recent interview. “We think that is interesting, but not sufficient to protect the patient data because, especially as you have more and more compute, you can actually figure it out.”
In a recent interview, he explained the approach that Mayo Clinic Platform takes in protecting privacy in the broader context of its Orchestrate platform. It’s a data platform through which biopharma and medtech companies can leverage rich Mayo Clinic Platform data and combine it with high-quality research and core lab expertise, thereby accelerating their own drug discovery and powering clinical development programs. On Feb. 11, the Rochester, Minnesota-based health system announced that the Orchestrate platform will now enable researchers to access standardized, real-world cancer data from Mayo Clinic and participating Mayo Clinic Platform Connect partners.
The Power of One: Redefining Healthcare with an AI-Driven Unified Platform
In a landscape where complexity has long been the norm, the power of one lies not just in unification, but in intelligence and automation.
So how does Mayo Clinic ensure patient privacy, especially given that this data is being made available to external users like pharma and medtech companies? And why is it important to do it this way?
“The way we’ve approached de-identification is not just to scrub away all of the things that would be identifiers, it’s actually to change it,” he said, giving an example of his own medical record. “So our tools go in there and replace it with a fictitious person, but leave the clinical notes in there. And then we do a date shift, a randomized date shift of the entire clinical record. So God forbid I was in a car accident on a date, and it’s public information, you now move it away from that date. And so I am no longer identifiable.”
Mayo Clinic has about 100 petabytes of structured and unstructured EHR data, and about 28 petabytes have been de-identified, Goyal said. The unstructured data from the clinical notes is important because it explains the provider’s rationale for, say, a diagnosis or other decision-making. All of that de-identified data is housed in a “cloud container.”
“And then we create a container and that data never leaves that container, and that has now stood the test of the U.S. regulatory system,” Goyal explained, adding that it has qualified in foreign regulatory environments as well. “So when we provide access, we provide it in a sandbox that is in our controlled environment. No individual patient record is visible. We check everything that goes out of the system. So no data ever leaves our control.”
This is known as the clean room environment, Goyal said. Another popular term for a data access process that retains patient privacy is called “federated learning” and at Mayo Clinic, that applies to health system partners that have joined the Mayo Clinic Platform, such as Hospital Israelita Albert Einstein in Brazil.
“Federated learning is basically when you send the question to all these different data sets and then you get the aggregated answer back. But each of those environments has to support this closed container, and no one has access to the central area where all the information is located,” Goyal said.
This allows pharma companies to run compute jobs or train AI models or simply query for greater understanding of the target disease. For instance, pharma companies can ask questions like, “find the course of disease X from birth to death of all these patients who match Y criteria. What additional comorbidities were seen in this population?” Or separately, ask questions like “how did this drug perform in diabetics versus non-diabetics?”
Other actions are possible too and this goes to the heart of wasted dollars in clinical development. Clinical trials need to be repeatable, and in the past, one would actually have to run the trials to know if they were repeatable. In many cases, they either failed or were not repeatable for a variety of reasons such as wrong sample size, or flawed trial design. Pharma companies would learn this only later — after time, effort and money had already been spent.
Now with Mayo Clinic Orchestrate, pharma companies can create synthetic versions of clinical trials to see if the results are repeatable in say a much larger patient population.
“So one way that our pharma partners are using this is to validate their trial hypothesis,” he said. “Our approach is let’s use the real data from real people and get as much more of the data into a single repository so you can run a synthetic trial on real data. You can actually say, ‘is this going to work? Do we have enough patients in a large non-patient population to run the trial the way I have imagined?'”
But it’s not simply querying the data, training an AI model or validating a hypothesis. Goyal explained that Orchestrate is all about bringing a fragmented R&D process into a single comprehensive platform. For instance, if a pharma company wants to run a trial on inflammatory bowel disease and comes to Mayo Clinic to recruit patients, then the process with Orchestrate would unfold something like this.
“So they identify a set of patients. We can do this in our de-identified data. We attach an IBD specialist from Mayo Clinic, we develop a cohort of patients, then we do an IRB, and rapidly recruit them to do additional tissue sample collection,” Goyal said. “The power of this is now take that tissue sample within our own infrastructure, do all the profiling, so genetic proteomic, epigenetic pathology, profiling of that against the longitudinal patient data to put it back in the clinical record in a de-identified way, and then to hand it to our pharma partners and say, it’s your playground to now invent and identify the targets that are going to matter for your condition.”
Access to the Orchestra program is subscription based, he said.
Photo: ClaudioVentrella, Getty Images