A team of scientists led by the National Institutes of Health published their analysis of using big data to track infectious diseases in The Journal of Infectious Diseases, and found that it can make surveillance more effective.

Big data from electronic health records, social media, the Internet, and more can give more detailed information on infectious diseases, making them easier to track than with traditional surveillance methods.

Traditional surveillance is typically based on laboratory tests and data collected by public health institutions, but can have time lags, is expensive, and lacks a fine-grained view. Big data could make infectious disease tracking available in real time and on a local level. Currently, scientists believe a hybrid approach is best–combining big data and traditional surveillance.

“The ultimate goal is to be able to forecast the size, peak, or trajectory of an outbreak weeks or months in advance in order to better respond to infectious disease threats. Integrating big data in surveillance is a first step toward this long-term goal,” says Cecile Viboud,  senior scientist at NIH’s Fogarty International Center and a co-editor of the report. “Now that we have demonstrated proof of concept by comparing data sets in high-income countries, we can examine these models in low-resource settings where traditional surveillance is sparse.”

The report looked at the opportunities available with three types of data:

  • Medical encounter files–records from health care providers and insurance companies.
  • Volunteer data–crowdsourced and self-reported in almost real time.
  • Digital information–social media, the Internet, and mobile phones.

Scientists did identify a few challenges with these nontraditional data sources. Key demographics, such as infants and the elderly, may be underrepresented, and social media may not be a stable source. Also, human behavior can rapidly alter during the course of an epidemic. Finally, using this information presents technical, practical, and privacy concerns.

The report outlined seven examples of using big data to monitor and model infectious diseases:

  • Using medical insurance claim data for flu-like illnesses to track influenza activity.
  • A European surveillance system, Influenzanet, uses standardized online surveys to gather information from volunteers who self-report their symptoms.
  • ResistanceOpen, an online platform, monitors regional antibiotic resistance from publicly available, online data from community health care institutions.
  • Looking at social media and Internet health forums for information on drug use and detecting adverse drug reactions.
  • Using call data records from mobile phones to determine how travel affects disease transmission.
  • Taking online news articles and health bulletins from public health agencies to map transmission patterns for disease outbreaks.
  • A publicly available epidemic simulation data management system, epiDMS, provides storage and indexing services for large data simulation sets, as well as search functionality and data analysis to aid decision-makers during health care emergencies.

While there is much promise for the future of using big data in disease surveillance, scientists agree reliable information is scarce.

“To be able to produce accurate forecasts, we need better observational data that we just don’t have in infectious diseases,” says Shweta Bansal of Georgetown University, a co-editor of the report. “There’s a magnitude of difference between what we need and what we have, so our hope is that big data will help us fill this gap.”

The publication’s authors include scientists affiliated with Fogarty’s Research and Policy for Infectious Diseases program (RAPIDD), grantees from NIH’s National Institute of General Medical Sciences, and researchers from nearly 20 universities throughout North America and Europe. The supplement was produced with support from Georgia State University, the Fogarty International Center, Northeastern University and Georgetown University.

Read More About