Wikipedia seen as useful tool to predict flu outbreaks, researchers find

The tradition of predicting flu and tracking disease outbreaks has tended to fall exclusively on the shoulders of the Centers for Disease Control. Healthcare entrepreneurs have sought to supplement that info by using big data generated from social media channels such as Twitter to make assessments earlier. A new study by a couple of researchers […]

The tradition of predicting flu and tracking disease outbreaks has tended to fall exclusively on the shoulders of the Centers for Disease Control. Healthcare entrepreneurs have sought to supplement that info by using big data generated from social media channels such as Twitter to make assessments earlier. A new study by a couple of researchers at Boston Children’s Hospital identifies another website which could be used for this task: Wikipedia.

Influenza is attributed to up to 500,000 deaths each year globally, with up to 3,500 in the U.S. alone.

Researchers David J. McIver and John S. Brownstein claimed they were able to reduce the time lag of seven to 14 days associated with the CDC’s approach. It also sought to develop a more accurate tool than Google Health, which produced many more flu searches than had actually been entered.

Here’s how they did it. They developed an algorithm that gathered information on the number of times particular Wikipedia articles have been viewed. They compiled a list of Wikipedia articles that were likely to be related to influenza or health in general, which were chosen based on previous knowledge of the subject area, previously published materials, and expert opinion, according to the journal article.

Daily Wikipedia article view data was collected from December 10, 2007, through August 19th, 2013, and then aggregated for each week.

By analyzing traffic on 35 of the site’s flu-related pages, the team claimed its method was 17 percent more accurate than Google Health and was more likely to be right about the intensity of flu levels during any given week, according to a Business Week article.

It points out that the one weakness of the Wikipedia and Google Health approach is “they can determine only correlation, not causation.” Still, the researchers want to work with other smaller groups active in this area such as an online polling site called Flu Near You. About 100,000 users have signed up who are willing to share whether they feel sick or not.

Some of the healthcare startups that have taken on mapping flu outbreaks as well as other conditions using social media and publicly available data sets include Sickweather and SocialHealthInsights, which developed Mappy Health for the Department of Health and Human Services.