JMIR Preprints #12783: Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task

Shoko Wakamiya;
Mizuki Morita;
Yoshinobu Kano;
Tomoko Ohkuma;
Eiji Aramaki

ABSTRACT

Background:

The amount of medical and clinical-related information on the Web is increasing. Among the various types of information on the Web, social media-based data obtained directly from people are particularly valuable and garnering much attention. To encourage medical natural language processing research exploiting social media data, the NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering three languages (Japanese, English, and Chinese), and annotated with eight symptom labels (e.g., cold, fever, flu, and so on). Then, participants classify each tweet into one of two categories: those containing a patientâ€™s symptom, and those that do not.

Objective:

We aim to present the results of groups participated in the Japanese subtask, the English subtask, and the Chinese subtask along with discussions, in order to clarify the issues that need to be resolved in the field of medical natural language processing.

Methods:

The performance of participant systems is assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.

Results:

In all, eight groups (19 systems) participated in the Japanese subtask, four groups (12 systems) participated in the English subtask, and two groups (six systems) participated in the Chinese subtask. The best system achieved .880 in exact match accuracy, .920 in F-measure, and .019 in Hamming loss.

Conclusions:

This paper presented and discussed the performance of systems participated in the NTCIR-13 MedWeb task. Because the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be applied directly to practical clinical applications.

Citation

Please cite as:

Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

J Med Internet Res 2019;21(2):e12783

DOI: 10.2196/12783

PMID: 30785407

PMCID: 6401666

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 9, 2018

Open Peer Review Period: Nov 9, 2018 - Nov 28, 2018

Date Accepted: Dec 13, 2018

(closed for review but you can still tweet)

Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 9, 2018

Open Peer Review Period: Nov 9, 2018 - Nov 28, 2018

Date Accepted: Dec 13, 2018

(closed for review but you can still tweet)

Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task

ABSTRACT

Citation

Per the author's request the PDF is not available.

Copyright