JMIR Preprints #54246: DIS: A New Natural Language Processing Inspired Methodology to Investigate Temporal Shifts (Drifts) in Healthcare Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

DIS: A New Natural Language Processing Inspired Methodology to Investigate Temporal Shifts (Drifts) in Healthcare Data

Bruno de Paiva;
Marcos André Gonçalves;
Leonardo Chaves Dutra da Rocha;
Milena Soriano Marcolino;
Fernanda Cristina Barbosa Lana;
Maira Viana Rego Souza-Silva;
Jussara M Almeida;
Polianna Delfino Pereira;
Claudio Moisés Valiense de Andrade;
Angélica Gomides dos Reis Gomes;
Maria Angélica Pires Ferreira;
Frederico Bartolazzi;
Manuela Furtado Sacioto;
Ana Paula Boscato;
Milton Henriques Guimarães-Júnior;
Priscilla Pereira dos Reis;
Felício Roberto Costa;
Alzira de Oliveira Jorge;
Laryssa Reis Coelho;
Marcelo Carneiro;
Thaís Lorenna Souza Sales;
Silvia Ferreira Araújo;
Daniel Vitório Silveira;
Karen Brasil Ruschel;
Fernanda Caldeira Veloso Santos;
Evelin Paola de Almeida Cenci;
Luanna Silva Monteiro Menezes;
Fernando Anschau;
Maria Aparecida Camargos Bicalho;
Euler Roberto Fernandes Manenti;
Renan Goulart Finger;
Daniela Ponce;
Filipe Carrilho de Aguiar;
Luiza Margoto Marques;
Luís César de Castro;
Giovanna Grünewald Vietta;
Mariana Frizzo de Godoy;
Mariana do Nascimento Vilaça;
Vivian Costa Morais

ABSTRACT

Background:

Healthcare data is a valuable resource for improving patient’s outcomes. If adequately treated and interpreted, it can enhance healthcare services and help to understand the impacts of new technologies and treatments. One important aspect of healthcare data is that it is usually temporal, in the sense that it is collected over time and is susceptible to temporal shifts. For instance, COVID-19 vaccination dramatically changed the profile of hospitalizations and deaths, initially decreasing the mean age of the at-risk patients and then creating a large shift in the dying patient’s characteristics. These temporal shifts may have significant impacts depending on the task one wishes to learn from and have particular relevance in understanding which factors (e.g., new technologies/treatments) affect patient outcomes.

Objective:

We propose DIS(Detection, Initial Characterization, Semantic Characterization), a new methodology for analyzing the changes in health outcomes and variables over time while discovering outcome contextual changes in large volumes of data. DIS can help identifying patterns and trends, essential for decision-making and resource allocation, as well as improving the effectiveness of machine learning predictive algorithms applied to healthcare data.

Methods:

The DIS methodology is based on a 3-step process that starts with drift detection, goes through an initial char- acterization that helps us direct the focus of the analysis, and evolves into a semantic characterization. By combining the outcomes from these three steps, our results can provide hints at specific factors, such as interventions and healthcare practice modifications, that drive the changes in patient outcomes.

Results:

We applied the DIS methodology to two distinct datasets: the Brazilian COVID-19 multicenter cohort and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) dataset. In doing so, we were able to obtain insights hinting at the root causes for the drop in overall (all causes) mortality in the two datasets, such as the role of vaccination in the COVID-19 pandemic and the decrease in iatrogenic events and cancer related deaths in the MIMIC-IV dataset.

Conclusions:

We successfully applied machine learning methods to detect, characterize and explain temporal shifts in healthcare data. Understanding these changes can improve patient outcomes, as well as healthcare resource allocation.

Citation

Please cite as:

de Paiva B, Gonçalves MA, da Rocha LCD, Marcolino MS, Lana FCB, Souza-Silva MVR, Almeida JM, Pereira PD, de Andrade CMV, Gomes AGdR, Ferreira MAP, Bartolazzi F, Sacioto MF, Boscato AP, Guimarães-Júnior MH, dos Reis PP, Costa FR, Jorge AdO, Coelho LR, Carneiro M, Sales TLS, Araújo SF, Silveira DV, Ruschel KB, Santos FCV, Cenci EPdA, Menezes LSM, Anschau F, Bicalho MAC, Manenti ERF, Finger RG, Ponce D, de Aguiar FC, Marques LM, de Castro LC, Vietta GG, Godoy MFd, Vilaça MdN, Morais VC

A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

JMIR Med Inform 2024;12:e54246

DOI: 10.2196/54246

PMID: 39467275

PMCID: 11555458

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 2, 2023

Open Peer Review Period: Nov 2, 2023 - Dec 28, 2023

Date Accepted: Jul 7, 2024

(closed for review but you can still tweet)

DIS: A New Natural Language Processing Inspired Methodology to Investigate Temporal Shifts (Drifts) in Healthcare Data

ABSTRACT

Citation

Copyright