Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 2, 2023
Open Peer Review Period: Nov 2, 2023 - Dec 28, 2023
Date Accepted: Jul 7, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

de Paiva B, Gonçalves MA, da Rocha LCD, Marcolino MS, Lana FCB, Souza-Silva MVR, Almeida JM, Pereira PD, de Andrade CMV, Gomes AGdR, Ferreira MAP, Bartolazzi F, Sacioto MF, Boscato AP, Guimarães-Júnior MH, dos Reis PP, Costa FR, Jorge AdO, Coelho LR, Carneiro M, Sales TLS, Araújo SF, Silveira DV, Ruschel KB, Santos FCV, Cenci EPdA, Menezes LSM, Anschau F, Bicalho MAC, Manenti ERF, Finger RG, Ponce D, de Aguiar FC, Marques LM, de Castro LC, Vietta GG, Godoy MFd, Vilaça MdN, Morais VC

A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

JMIR Med Inform 2024;12:e54246

DOI: 10.2196/54246

PMID: 39467275

PMCID: 11555458

DIS: A New Natural Language Processing Inspired Methodology to Investigate Temporal Shifts (Drifts) in Healthcare Data

  • Bruno de Paiva; 
  • Marcos André Gonçalves; 
  • Leonardo Chaves Dutra da Rocha; 
  • Milena Soriano Marcolino; 
  • Fernanda Cristina Barbosa Lana; 
  • Maira Viana Rego Souza-Silva; 
  • Jussara M Almeida; 
  • Polianna Delfino Pereira; 
  • Claudio Moisés Valiense de Andrade; 
  • Angélica Gomides dos Reis Gomes; 
  • Maria Angélica Pires Ferreira; 
  • Frederico Bartolazzi; 
  • Manuela Furtado Sacioto; 
  • Ana Paula Boscato; 
  • Milton Henriques Guimarães-Júnior; 
  • Priscilla Pereira dos Reis; 
  • Felício Roberto Costa; 
  • Alzira de Oliveira Jorge; 
  • Laryssa Reis Coelho; 
  • Marcelo Carneiro; 
  • Thaís Lorenna Souza Sales; 
  • Silvia Ferreira Araújo; 
  • Daniel Vitório Silveira; 
  • Karen Brasil Ruschel; 
  • Fernanda Caldeira Veloso Santos; 
  • Evelin Paola de Almeida Cenci; 
  • Luanna Silva Monteiro Menezes; 
  • Fernando Anschau; 
  • Maria Aparecida Camargos Bicalho; 
  • Euler Roberto Fernandes Manenti; 
  • Renan Goulart Finger; 
  • Daniela Ponce; 
  • Filipe Carrilho de Aguiar; 
  • Luiza Margoto Marques; 
  • Luís César de Castro; 
  • Giovanna Grünewald Vietta; 
  • Mariana Frizzo de Godoy; 
  • Mariana do Nascimento Vilaça; 
  • Vivian Costa Morais

ABSTRACT

Background:

Healthcare data is a valuable resource for improving patient’s outcomes. If adequately treated and interpreted, it can enhance healthcare services and help to understand the impacts of new technologies and treatments. One important aspect of healthcare data is that it is usually temporal, in the sense that it is collected over time and is susceptible to temporal shifts. For instance, COVID-19 vaccination dramatically changed the profile of hospitalizations and deaths, initially decreasing the mean age of the at-risk patients and then creating a large shift in the dying patient’s characteristics. These temporal shifts may have significant impacts depending on the task one wishes to learn from and have particular relevance in understanding which factors (e.g., new technologies/treatments) affect patient outcomes.

Objective:

We propose DIS(Detection, Initial Characterization, Semantic Characterization), a new methodology for analyzing the changes in health outcomes and variables over time while discovering outcome contextual changes in large volumes of data. DIS can help identifying patterns and trends, essential for decision-making and resource allocation, as well as improving the effectiveness of machine learning predictive algorithms applied to healthcare data.

Methods:

The DIS methodology is based on a 3-step process that starts with drift detection, goes through an initial char- acterization that helps us direct the focus of the analysis, and evolves into a semantic characterization. By combining the outcomes from these three steps, our results can provide hints at specific factors, such as interventions and healthcare practice modifications, that drive the changes in patient outcomes.

Results:

We applied the DIS methodology to two distinct datasets: the Brazilian COVID-19 multicenter cohort and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) dataset. In doing so, we were able to obtain insights hinting at the root causes for the drop in overall (all causes) mortality in the two datasets, such as the role of vaccination in the COVID-19 pandemic and the decrease in iatrogenic events and cancer related deaths in the MIMIC-IV dataset.

Conclusions:

We successfully applied machine learning methods to detect, characterize and explain temporal shifts in healthcare data. Understanding these changes can improve patient outcomes, as well as healthcare resource allocation.


 Citation

Please cite as:

de Paiva B, Gonçalves MA, da Rocha LCD, Marcolino MS, Lana FCB, Souza-Silva MVR, Almeida JM, Pereira PD, de Andrade CMV, Gomes AGdR, Ferreira MAP, Bartolazzi F, Sacioto MF, Boscato AP, Guimarães-Júnior MH, dos Reis PP, Costa FR, Jorge AdO, Coelho LR, Carneiro M, Sales TLS, Araújo SF, Silveira DV, Ruschel KB, Santos FCV, Cenci EPdA, Menezes LSM, Anschau F, Bicalho MAC, Manenti ERF, Finger RG, Ponce D, de Aguiar FC, Marques LM, de Castro LC, Vietta GG, Godoy MFd, Vilaça MdN, Morais VC

A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

JMIR Med Inform 2024;12:e54246

DOI: 10.2196/54246

PMID: 39467275

PMCID: 11555458

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.