Accepted for/Published in: JMIR Formative Research
Date Submitted: Sep 22, 2020
Date Accepted: Nov 17, 2020
Automated Systemic Disease and Duration Categorization from Electronic Medical Record System Data using Finite State Machine Modelling
ABSTRACT
Background:
The major problem in healthcare sector is that about 80% of the data remains unstructured and unused after it has been generated. Since it is difficult to handle this sort of unstructured data from EMR, it tends to be neglected in most hospitals or medical centers for analysis. Therefore, there is a need to analyze the unstructured big data in healthcare systems so that we can make optimum use of the data and unearth all the unexploited information from the data.
Objective:
The aim of this study is to extract a list of mentioned diseases and the other associated keywords along with the respective time duration from an indigenously developed Electronic medical record (EMR) system (eyeSmart™) implemented across a large multi-tier ophthalmology network in India and to describe the possibility of analytics from the acquired datasets.
Methods:
We propose a novel, finite state machine (FSM) to sequentially detect and cluster the diseases in the patient’s medical history. We defined three states in our FSM and the transition matrix which depends on the identified keyword. In addition, we also defined a state change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed EMR system (eyeSmart™). The dataset included the past medical history of patients and had 10,000 records of distinct patients.
Results:
The extraction of the name of the disease and associated keywords using the FSM had an accuracy of 95%, sensitivity of 94.9% and positive predictive value of 100%. For the extraction of the duration of the disease the accuracy was 93%, sensitivity was 92.9% and positive predictive value was 100%.
Conclusions:
In this study we demonstrated that the FSM can be used to accurately find the disease name, associated keywords and time duration in a large cohort of patient records from an EMR system.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.