Want to know what's happening next in healthcare?

MedCityNews is the leading online news source for the business of innovation in healthcare.

“I find MedCity News a great resource for taking the pulse of the market and sourcing potential investment opportunities.”

Bernie Rudnick, Angel investor and advisor, CapGenic Advisors, LLC

Sign up for our daily newsletter

Topic: Health Data

Published in partnership with the Health Data Consortium. The Health Data Consortium is a collaboration among government, non-profit, and private sector organizations working to foster the availability and use of health data to drive innovations that improve health and health care.

Society’s next big challenge: infinite data

April 7, 2013 8:54 am by | 0 Comments

ss big data

Over the past 10 years or so, many organizations have recognized the conceptual value of data and have started recording and retaining more and more of it. But after doing this for a while, they’re asking, “What the heck do we do with it?”

Google, Netflix, and other big companies have taught us that data is valuable for insights that can be obtained from it, so others have started exploring their own data and want to do more with it. Some companies have been doing this in a way that sets no expectation as to what can be learned from it.

Rather than starting with a question and looking for an answer, people started finding the questions data was already answering. The idea has been to find the hidden potential of data, and we’ve already seen benefits to doing this type of analysis.


Fixed data or fixed problem?

What if you are really committed to solving a specific prediction problem or have a specific question in mind? Project managers know that they need to decide which of their time, budget, and schedule are fixed and which are more flexible, and that this decision can fundamentally change the nature of a project.

We need to make that same kind of decision in data analytics projects: The common opportunistic nature of “big data” implies that the question is more flexible than the data that can be used, which is fixed. If you reverse this — fix the question and accept flexibility in the data — then it now defines “infinite data.”

Predicting suicide

A recent example of an infinite-data problem is the Defense Advanced Research Projects Agency’s (DARPA’s) interest in predicting suicide.

Consistent with DARPA’s mission statement, this is an extremely ambitious endeavor, a “DARPA-hard-problem,” as DARPA would say. Also consistent with the way DARPA operates, there is not one prescribed way of approaching the problem or a defined set of data sources to be used. Hence, the playing field is wide open for approaches that rely solely on analyzing existing data on suicide in more depth and using more sophisticated machine-learning algorithms following the big-data mindset.

However, we can also consider a reversed approach and focus on incrementally identifying and collecting the right data (e.g. better understand the specific problems of at-risk personnel and how these problems manifest in measurable data). Once the object of interest is being studied, the possibilities are endless regarding the kinds of data to be collected and analyzed as long as privacy concerns are met. In the context of suicide, data can come, for instance, from clinic notes, interviews, social media behavior observed by friends, shopping behavior, and location data.

But it does not have to end there — there is always more data. Your smartphone already has access to tons of interesting data, and companies are already using it to target ads and provide buying recommendations. It seems more than appropriate to use this data to extract some value for users as well. After all, you are the ultimate owner of your behavioral data.

Infinite data: Why now?

The time is ripe for infinite data for two reasons:

  • Big data has laid the computational and economic foundation for dealing with vast amounts of a wide variety of data.
  • There are many possibilities for instrumenting environments where predictions can be made, even if the environment is physical. The possibilities exist in the form of very cheap sensors, which are already all around us.

Smartphones already capture such a slew of data about us that it now seems reasonable to even try and use this data to make difficult predictions about medical conditions such as the progression of Parkinson’s Disease.

New challenges

Given the different nature of infinite-data projects from big-data projects, how do we go about executing such a project, and what is required for it to be successful? There are three big challenges with infinite data:

  • The streams of data never end.
  • There are infinite ways of pre-processing data to carve out relevant features, including simple combinations of individual data points or more complicated ones such as change detection in the frequency domain of temporally recurring events.
  • There are always more kinds of data that can be obtained and additional models that can be applied to infer additional data.

In practice, even though data may be infinite, our available computational power and our budget for acquiring new data sources is not. So we need to identify the most relevant and significant features obtainable.

Succeeding with infinite data

The team that will succeed with infinite-data projects in practice needs to be multidisciplinary. It needs to include subject matter experts as well as a set of experts in a broad set of disciplines of computer science including feature extraction, signal processing, computer vision, natural language processing, design of experiments, automated diagnosis, spatial and temporal analysis, modeling, and statistical machine learning.

Such a team will develop new methodologies for going about the data identification exercise in a principled fashion that makes the process repeatable and generalizable. Such a methodology might be based on inferring from data that is already available to guide further data acquisition decisions. Or, the team might deploy “cheap” experiments on sub-populations or at low resolution to estimate the value of information that can be obtained from a higher resolution and larger data collection effort.

Whatever the optimal methodology ends up being, two things are clear: challenging prediction problems will get solved and tons of new data will be collected that will set off a new wave of yet more opportunities that come along with it, closing the loop with big data as we know it.

Christian Fritz, Ph.D., is a PARC researcher working on real-world applications of artificial intelligence with particular interest in the combination of symbolic knowledge representation and machine learning; behavior recognition; planning; execution monitoring; modeling; and diagnosis.

Big data image via Toria/Shutterstock

Filed under: Big Data, VentureBeat

This article originally appeared on VentureBeat

Copyright 2014 MedCity News. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.

By Fritz, Christian

Visit website | More posts by Author


Health Data Consortium

Published in partnership with the Health Data Consortium. The Health Data Consortium is a collaboration among government, non-profit, and private sector organizations working to foster the availability and use of health data to drive innovations that improve health and health care. More information on the Health Data Consortium can be found at HealthDataConsortium.org.

Latest From HDC Blog


Recommended Links

Health Data Consortium

Website | LinkedIn | Twitter


Website | Facebook | Twitter

Health Data Initiative Forum



Todd Park

United States Chief Technology Officer

Todd Park is the United States Chief Technology Officer and in this role serves as an Assistant to the President. Todd joined the Administration in August 2009 as Chief Technology Officer of the U.S. Department of Health and Human Services (HHS). In this role, he served as a change agent and “entrepreneur-in-residence,” helping HHS harness the power of data, technology, and innovation to improve the health of the nation.

Bryan Sivak

Chief Technology Officer, Department of Health and Human Services

Bryan Sivak joined HHS as the Chief Technology Officer in July 2011. In this role, he is responsible for helping HHS leadership harness the power of data, technology, and innovation to improve the health and welfare of the nation.

Steven Randazzo

Communications Lead for Innovations Team, Department of Health and Human Services

Steven works with the HHS's Chief Technology Officer to promote the formation and adoption of innovative processes and products in government. Steven is the manager of three open data and innovation blogs and his primary duties focus on external communication of the initiatives and priorities undertaken by HHS and outlined in the Open Government Plan.

Jim Cashel

Chairman of Forum One Communications

Forum One Communications is a digital communications firm which works at the nexus of technology, public policy, and online community. With offices in Washington DC, Seattle and San Francisco, Forum One has completed 1000 projects for 300 clients, including foundations, nonprofit organizations, government agencies and commercial groups. Prior to Forum One Jim co-founded the Eurasia Foundation, a Washington, D.C.-based grant-making organization.