Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 9, 2018
Open Peer Review Period: Nov 9, 2018 - Nov 28, 2018
Date Accepted: Dec 13, 2018
(closed for review but you can still tweet)
Tweet Classification Toward Twitter-Based Disease Surveillance: Overview of the MedWeb Shared Task
ABSTRACT
Background:
The amount of medical and clinical-related information on the Web is increasing. Among the various types of information on the Web, social media-based data obtained directly from people are particularly valuable and garnering much attention. To encourage medical natural language processing research exploiting social media data, the NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering three languages (Japanese, English, and Chinese), and annotated with eight symptom labels (e.g., cold, fever, flu, and so on). Then, participants classify each tweet into one of two categories: those containing a patient’s symptom, and those that do not.
Objective:
We aim to present the results of groups participated in the Japanese subtask, the English subtask, and the Chinese subtask along with discussions, in order to clarify the issues that need to be resolved in the field of medical natural language processing.
Methods:
The performance of participant systems is assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.
Results:
In all, eight groups (19 systems) participated in the Japanese subtask, four groups (12 systems) participated in the English subtask, and two groups (six systems) participated in the Chinese subtask. The best system achieved .880 in exact match accuracy, .920 in F-measure, and .019 in Hamming loss.
Conclusions:
This paper presented and discussed the performance of systems participated in the NTCIR-13 MedWeb task. Because the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be applied directly to practical clinical applications.
Citation

Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.