Daily

Duke digs for ‘geo data’

Duke University, and by extension its medical and nursing schools and hospitals, have teamed up with a former NASA worker on a big data system that, seemingly, will allow researchers to analyze “geospatial visualization,” essentially any group of patients with any health issue, in real time. Clinicians and researchers would be able to select, visualize […]

Duke University, and by extension its medical and nursing schools and hospitals, have teamed up with a former NASA worker on a big data system that, seemingly, will allow researchers to analyze “geospatial visualization,” essentially any group of patients with any health issue, in real time.

Clinicians and researchers would be able to select, visualize and “predictively study” an array of patients with the new system, called DEDUCE, which is described as a “home-grown business intelligence tool build on the .NET framework.” It’s further described as “an enterprise data warehouse” powered by Oracle and Greenplum databases that uses Esri’s ArcGIS Server platform and JavaScript.

The main thrust behind Duke’s dive for deeper, more accurate data is to avoid collecting data that is effectively out of date by the time it’s all gathered. This can occur particularly when combining existing data with new resources, such as geographic information systems.

Duke hired Sohayla Pruitt two years ago as its senior geospatial scientist. Pruitt has a masters’ in GIS and joined the health system after working for NASA’s Goodard Space Flight Center and at a Department of Homeland Security-backed startup, Spadac. Instead of going from one project to the next, Pruitt told Information Management her idea was to establish pre-built data and modeling that would allow researchers to quickly analyze the data.

“I thought, wow, if we could automate some of this, preselect some of the data, preprocess a lot and then sort of wait for an event to happen, we could pass it through our models, let them plow through thousands of geospatial variables and [let the system] tell us the actual statistical significance,” Pruitt says. “Then, once you know how geography is influencing events and what they have in common, you can project that to other places where you should be paying attention because they have similar probability.”

Given Duke’s size, with some 2 million patients, the health system sought to marry its EHR data with a GIS that could give its clinicians broader access.

From Information Management’s article about the new project:

presented by

“Duke collects mountains of data. It runs three hospitals and hundreds of clinics, looking after some 2 million patients, all with unique electronic health records. The health system thought that if it could marry its EHR data with a GIS it could give its clinicians the ability to pull information on certain conditions, match that to geographic locations, and predict – on demand – which people within a population are likely in the future to be diagnosed with a particular ailment.”

That sort of data, and the access to it, can be particularly revealing for researchers and doctors.

“When we visually map a population and a health issue, we want to give an understanding about why something is happening in a neighborhood,” says Pruitt. “Are there certain socioeconomic factors that are contributing? Do they not have access to certain things? Do they have too much access to certain things like fast food restaurants?”
But, according to Pruitt, efforts must be taken to ensure the data isn’t simply a mash-up of demographics and geography. Enter “big geo-data automation” – i.e. “layers of presorted geographical data, processing and standardization that “premodel” and “precorrelate” health care scenarios for researchers.”

“Duke starts with an automated geocoding system that amends and verifies every address entering its healthcare system against a USPS database for accurate spelling, abbreviations and nine digit zip codes. The standardized addresses are next passed through a commercial mapping database and geocoded to a street- or rooftop-level of accuracy (any lesser accuracy is discarded).”
Latitude and longitude residence data, along with geographic boundaries of U.S. Census block groups, census tracks and metro statistical Ids are all overlaid.

“With a block group ID, Duke can then import all the U.S. census data (including median income, average commute time and transit options, the percentage of people with high school and college degrees and the percentage on public assistance) that is reported for every block group in the U.S.”
All that data is available now to Duke researchers, who can select a cohort (grouping of patients, e.g., diabetics), filter it and analyze it in regression models that determine which socioeconomic and other variables are actually relevant to their group.

“It’s a way of taking the bias out of traditional analysis where a researcher says, ‘Let’s see if median income or public assistance plays a role,’ and you only have the benefit of a handful of variables,” Pruitt says. “Instead of that, you let your statistical models tell you what’s going on and where the true correlations exist.”

Topics