Evaluation of IBM Watson Natural Language Processing Service to predict influenza-like illness outbreaks from Twitter data
DOI:
https://doi.org/10.21533/pen.v10.i1.524Abstract
Social media has opened the gates for collecting big data that can be used to monitor epidemic trends in real time. We evaluate whether Watson NLP service can be used to reliably predict infectious disease such as influenza-like illness (ILI) outbreaks using Twitter data during the period of the main influenza season. Watson’s performance is evaluated by computing Pearson correlation between the number of tweets classified by Watson as ILI and the number of ILI occurrences recovered from traditional epidemic surveillance system of the Centers for Disease Control and Prevention (CDC). Achieved correlation was 0.55. Furthermore, a 12 week discrepancy was found between peak occurrences of ILI predicted by Watson and CDC reported data. Additionally, we developed a scoring method for ILI prediction from Twitter posts using a simple formula with the ability to predict ILI two weeks ahead of CDC reported ILI data. The method uses Watson’s sentiment and emotion scores together with identified ILI features to analyze influenza-related posts in real time. Due to Watson's high computational costs of sentiment and emotion analysis, we tested if machine learning approach can be used to predict influenza using only identified ILI keywords as influenza predictors. All three evaluated methods (Random Forest, Logistic Regression, K-NN), achieved overall accuracy of ~68.2% and 97.5% respectively, when Watson and the developed formula are used as medical experts. The obtained results suggest that data found within social media can be used to supplement the traditional surveillance of influenza outbreaks with the help of intelligent computations.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.




