Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Feb 15, 2021
Date Accepted: Apr 27, 2021

The final, peer-reviewed published version of this preprint can be found here:

Public Discussion of Anthrax on Twitter: Using Machine Learning to Identify Relevant Topics and Events

Miller M, Romine W, Oroszi T

Public Discussion of Anthrax on Twitter: Using Machine Learning to Identify Relevant Topics and Events

JMIR Public Health Surveill 2021;7(6):e27976

DOI: 10.2196/27976

PMID: 34142975

PMCID: 8277308

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Anthrax on Twitter: Analysis of Public Discussion of Anthrax Over Twelve Months of Data Collection

  • Michele Miller; 
  • Will Romine; 
  • Terry Oroszi

ABSTRACT

Background:

A computational framework that utilizes machine learning methodologies was created to collect tweets discussing anthrax, further categorize them as relevant by month of data collection and detect anthrax related events.

Objective:

The objective of this study was to detect anthrax related events and to determine the relevancy of the tweets and topics of discussion over twelve months of data collection.

Methods:

Machine learning techniques were used to determine what people were tweeting about anthrax. Data over time was graphed to see if an event was detected (a three-fold spike in tweets). A machine learning classifier was created to categorize tweets as relevant. Relevant tweets by month were examined using a topic modeling approach to determine the topics of discussion over time and how events influence that discussion.

Results:

Over the twelve months of data collection 204,008 tweets were collected. Logistic regression performed best for relevancy (precision=0.81, recall=0.81, and F1-score=0.80). Twenty-six topics were found relating to anthrax events, tweets that were highly re-tweeted, natural outbreaks, and news stories.

Conclusions:

This study demonstrated that tweets relating to anthrax can be collected and analyzed over time to determine what people are discussing and detect key anthrax-related events. Future studies can focus on opinion tweets only, use the methodology to study other terrorism events, or use the methodology to monitor for threats.


 Citation

Please cite as:

Miller M, Romine W, Oroszi T

Public Discussion of Anthrax on Twitter: Using Machine Learning to Identify Relevant Topics and Events

JMIR Public Health Surveill 2021;7(6):e27976

DOI: 10.2196/27976

PMID: 34142975

PMCID: 8277308

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.