Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 25, 2024
Date Accepted: Aug 25, 2024

The final, peer-reviewed published version of this preprint can be found here:

Semiology Extraction and Machine Learning–Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis

Xia Y, He M, Basang S, Sha L, Huang Z, Jin L, Duan Y, Tang Y, Li H, Lai W, Chen L

Semiology Extraction and Machine Learning–Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis

JMIR Med Inform 2024;12:e57727

DOI: 10.2196/57727

PMID: 39621862

PMCID: 11501417

Semiology Extraction and Machine Learning-Based Classification of Electronic Health Records for Patients with Epilepsy: Retrospective Analysis

  • Yilin Xia; 
  • Mengqiao He; 
  • Sijia Basang; 
  • Leihao Sha; 
  • Zijie Huang; 
  • Ling Jin; 
  • Yifei Duan; 
  • Yusha Tang; 
  • Hua Li; 
  • Wanlin Lai; 
  • Lei Chen

ABSTRACT

Background:

Obtaining and describing the semiology efficiently and classifying seizures types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision-support tools.

Objective:

We developed an ontology-based symptom extraction tool and employed machine learning to achieve automated binary classification of epilepsy in this study.

Methods:

Using present history data of electronic health record (EHR) from the Southwest Epilepsy Centre in China, we constructed a epilepsy semiology ontology and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with NLP techniques. Additionally, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.

Results:

Data include present history from 10,925 cases between 2010 and 2020. Six annotators labelled a total of 2,500 texts to obtain 5844 words of semiology and construct a epilepsy semiology ontology(ESO) with 702 terms. Based on the ontology, the extraction tool achievd an accuracy rate of 85% in symptom extraction. Furthermore, We trained a Stacking ensemble learning model combining XGBoost and Random Forest with a F1 score of 75.03%. And the Random Forest model had the highest area under the curve (AUC) of 0.984.

Conclusions:

This work demonstrated the feasibility of NLP-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.


 Citation

Please cite as:

Xia Y, He M, Basang S, Sha L, Huang Z, Jin L, Duan Y, Tang Y, Li H, Lai W, Chen L

Semiology Extraction and Machine Learning–Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis

JMIR Med Inform 2024;12:e57727

DOI: 10.2196/57727

PMID: 39621862

PMCID: 11501417

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.