JMIR Preprints #49139: Use of large language models to assess likelihood of epidemics from content of Tweets: Infodemiology Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Use of large language models to assess likelihood of epidemics from content of Tweets: Infodemiology Study

Michael Deiner;
Natalie A. Deiner;
Vagelis Hristidis;
Stephen D. McLeod;
Thuy Doan;
Thomas M. Lietman;
Travis C. Porco

ABSTRACT

Background:

Cost-effective automated surveillance systems could leverage social media content analysis, with the potential to serve as early indicators of conjunctivitis and other systemic infectious diseases.

Objective:

We investigated whether large language models, specifically GPT-3.5 and GPT-4, can provide probabilistic assessments of whether or not social media posts about conjunctivitis could indicate an outbreak.

Methods:

12,194 conjunctivitis-related Tweets were obtained using a targeted Boolean search in multiple languages for 9 countries. These Tweets were provided in prompts to GPT-3.5 and GPT-4, obtaining probabilistic assessments which were validated by two human raters. We then calculated Pearson correlations of these time series with post volume and the occurrence of known outbreaks in nine selected countries, with time series bootstrap used to compute confidence intervals.

Results:

Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI: 0.47–0.70) and 0.53 (95% CI: 0.40–0.65) with the two human raters, with higher results for GPT-4. Weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly Tweet volume for some countries, with correlations ranging from 0.10 (95% CI: 0.0–0.29) to 0.53 (95% CI: 0.39–0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40 (95% CI: 0.16–0.81)).

Conclusions:

These findings suggest that GPT prompting can efficiently assess content of social media post and possible outbreaks to a degree comparable to that of humans. Further, we found that automated content analysis of Twitter content is related to Twitter volume for conjunctivitis-related posts in some locations, and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for outbreak detection.

Citation

Please cite as:

Deiner M, Deiner NA, Hristidis V, McLeod SD, Doan T, Lietman TM, Porco TC

Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study

J Med Internet Res 2024;26:e49139

DOI: 10.2196/49139

PMID: 38427404

PMCID: 10943433

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 19, 2023

Date Accepted: Jan 19, 2024

Use of large language models to assess likelihood of epidemics from content of Tweets: Infodemiology Study

ABSTRACT

Citation

Copyright