Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 29, 2021
Date Accepted: Aug 5, 2021

The final, peer-reviewed published version of this preprint can be found here:

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers

Zanotto B, Beck da Silva Etges AP, dal Bosco A, Cortes E, Ruschell R, Souza AC, M. V Andrade C, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk CA, André Gonçalves M

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers

JMIR Med Inform 2021;9(11):e29120

DOI: 10.2196/29120

PMID: 34723829

PMCID: 8593798

Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers

  • Bruna Zanotto; 
  • Ana Paula Beck da Silva Etges; 
  • Avner dal Bosco; 
  • Eduardo Cortes; 
  • Renata Ruschell; 
  • Ana Claudia Souza; 
  • Claudio M. V Andrade; 
  • Felipe Viegas; 
  • Sergio Canuto; 
  • Washington Luiz; 
  • Sheila Ouriques Martins; 
  • Renata Vieira; 
  • Carisi Anne Polanczyk; 
  • Marcos André Gonçalves

ABSTRACT

Background:

Stroke is the second leading cause of mortality and disability-adjusted life-years globally, and the outcomes of stroke can be highly varied. Timely assessment is essential for optimal management; however, dealing with EMR data is often labor-intensive and challenging because of the lack of standardization in data entry and the free-text data nature. With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered outcome measurement.

Objective:

The aim is to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs.

Methods:

Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop ten supervised computational machine learning (ML) methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated six times, along with subject-wise sampling. We used a heatmap to display comparative result analyses according to the best algorithmic effectiveness (F1-score) and supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results.

Results:

The top-performing models were support vector machines trained with lexical and semantic textual features. The SVM (support vector machines) models produced statistically superior results in a total of 17 tasks (70%), with an F1 score > 80 regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more future tasks.

Conclusions:

Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims' clinical conditions, thus ultimately assessing the possibility of proactively using these machine-learning techniques in real-world situations.


 Citation

Please cite as:

Zanotto B, Beck da Silva Etges AP, dal Bosco A, Cortes E, Ruschell R, Souza AC, M. V Andrade C, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk CA, André Gonçalves M

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers

JMIR Med Inform 2021;9(11):e29120

DOI: 10.2196/29120

PMID: 34723829

PMCID: 8593798

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.