Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 9, 2018
Open Peer Review Period: Nov 9, 2018 - Nov 28, 2018
Date Accepted: Dec 13, 2018
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

J Med Internet Res 2019;21(2):e12783

DOI: 10.2196/12783

PMID: 30785407

PMCID: 6401666

Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task

  • Shoko Wakamiya; 
  • Mizuki Morita; 
  • Yoshinobu Kano; 
  • Tomoko Ohkuma; 
  • Eiji Aramaki

ABSTRACT

Background:

The amount of medical and clinical-related information on the Web is increasing. Among the various types of information on the Web, social media-based data obtained directly from people are particularly valuable and garnering much attention. To encourage medical natural language processing research exploiting social media data, the NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering three languages (Japanese, English, and Chinese), and annotated with eight symptom labels (e.g., cold, fever, flu, and so on). Then, participants classify each tweet into one of two categories: those containing a patient’s symptom, and those that do not.

Objective:

We aim to present the results of groups participated in the Japanese subtask, the English subtask, and the Chinese subtask along with discussions, in order to clarify the issues that need to be resolved in the field of medical natural language processing.

Methods:

The performance of participant systems is assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.

Results:

In all, eight groups (19 systems) participated in the Japanese subtask, four groups (12 systems) participated in the English subtask, and two groups (six systems) participated in the Chinese subtask. The best system achieved .880 in exact match accuracy, .920 in F-measure, and .019 in Hamming loss.

Conclusions:

This paper presented and discussed the performance of systems participated in the NTCIR-13 MedWeb task. Because the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be applied directly to practical clinical applications.


 Citation

Please cite as:

Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

J Med Internet Res 2019;21(2):e12783

DOI: 10.2196/12783

PMID: 30785407

PMCID: 6401666

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.