Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Apr 13, 2022
Date Accepted: Oct 24, 2022

The final, peer-reviewed published version of this preprint can be found here:

Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership

Han F, Zhang Z, Zhang H, Nakaya J, Kudo K, Ogasawara K

Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership

JMIR Form Res 2022;6(11):e38677

DOI: 10.2196/38677

PMID: 36399376

PMCID: 9719062

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Fuzzy c-means for extraction and quantification of words representing degrees of diseases

  • Feng Han; 
  • ZiHeng Zhang; 
  • Hongjian Zhang; 
  • Jun Nakaya; 
  • Kohsuke Kudo; 
  • Katsuhiko Ogasawara

ABSTRACT

Background:

Modern medicine generates unstructured data containing a large amount of information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, contain several ambiguous words demonstrating the subjectivity of doctors. These data can be used to further improve the accuracy of medical support system assessments.

Objective:

We propose using fuzzy c-means (FCM) method and Gauss membership to quantify the subjective words in the clinical medical dataset MIMIC-III.

Methods:

Using 381,091 radiology reports collected from MIMIC-III, we extracted words representing the subjective degree from the text and converted them into corresponding membership intervals based on the words.

Results:

Consequently, the words representing each degree of each disease had a range of corresponding values. Examples of membership medians were atelectasis (2.971), pneumonia (3.121), pneumothorax (2.899), pulmonary edema (3.051), and pulmonary embolus (2.435). These membership sections can determine the symptoms of each disease.

Conclusions:

In this study, we used the FCM and Gaussian functions to extract words from the MIMIC-III, which represent a subjective degree and cannot be processed by a computer, and performed fuzzy processing on them. It was concluded that words representing the degree in an English interpreted report can be extracted and quantified. The use of these words in medical support systems may improve diagnostic accuracy.


 Citation

Please cite as:

Han F, Zhang Z, Zhang H, Nakaya J, Kudo K, Ogasawara K

Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership

JMIR Form Res 2022;6(11):e38677

DOI: 10.2196/38677

PMID: 36399376

PMCID: 9719062

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.