Accepted for/Published in: JMIR Formative Research
Date Submitted: Apr 13, 2022
Date Accepted: Oct 24, 2022
Extraction and quantification of words representing degrees of diseases:Combining methods Fuzzy c-means and Gaussian membership
ABSTRACT
Background:
Modern medicine generates unstructured data containing a large amount of information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, contain several ambiguous words demonstrating the subjectivity of doctors. These data can be used to further improve the accuracy of medical support system assessments.
Objective:
We propose using fuzzy c-means (FCM) method and Gauss membership to quantify the subjective words in the clinical medical dataset MIMIC-III.
Methods:
Using 381,091 radiology reports collected from MIMIC-III, we extracted words representing the subjective degree from the text and converted them into corresponding membership intervals based on the words.
Results:
Consequently, the words representing each degree of each disease had a range of corresponding values. Examples of membership medians were atelectasis (2.971), pneumonia (3.121), pneumothorax (2.899), pulmonary edema (3.051), and pulmonary embolus (2.435). These membership sections can determine the symptoms of each disease.
Conclusions:
In this study, we used the FCM and Gaussian functions to extract words from the MIMIC-III, which represent a subjective degree and cannot be processed by a computer, and performed fuzzy processing on them. It was concluded that words representing the degree in an English interpreted report can be extracted and quantified. The use of these words in medical support systems may improve diagnostic accuracy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.