Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 2, 2023
Open Peer Review Period: Nov 2, 2023 - Dec 28, 2023
Date Accepted: Jul 7, 2024
(closed for review but you can still tweet)
DIS: A New Natural Language Processing Inspired Methodology to Investigate Temporal Shifts (Drifts) in Healthcare Data
ABSTRACT
Background:
Healthcare data is a valuable resource for improving patient’s outcomes. If adequately treated and interpreted, it can enhance healthcare services and help to understand the impacts of new technologies and treatments. One important aspect of healthcare data is that it is usually temporal, in the sense that it is collected over time and is susceptible to temporal shifts. For instance, COVID-19 vaccination dramatically changed the profile of hospitalizations and deaths, initially decreasing the mean age of the at-risk patients and then creating a large shift in the dying patient’s characteristics. These temporal shifts may have significant impacts depending on the task one wishes to learn from and have particular relevance in understanding which factors (e.g., new technologies/treatments) affect patient outcomes.
Objective:
We propose DIS(Detection, Initial Characterization, Semantic Characterization), a new methodology for analyzing the changes in health outcomes and variables over time while discovering outcome contextual changes in large volumes of data. DIS can help identifying patterns and trends, essential for decision-making and resource allocation, as well as improving the effectiveness of machine learning predictive algorithms applied to healthcare data.
Methods:
The DIS methodology is based on a 3-step process that starts with drift detection, goes through an initial char- acterization that helps us direct the focus of the analysis, and evolves into a semantic characterization. By combining the outcomes from these three steps, our results can provide hints at specific factors, such as interventions and healthcare practice modifications, that drive the changes in patient outcomes.
Results:
We applied the DIS methodology to two distinct datasets: the Brazilian COVID-19 multicenter cohort and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) dataset. In doing so, we were able to obtain insights hinting at the root causes for the drop in overall (all causes) mortality in the two datasets, such as the role of vaccination in the COVID-19 pandemic and the decrease in iatrogenic events and cancer related deaths in the MIMIC-IV dataset.
Conclusions:
We successfully applied machine learning methods to detect, characterize and explain temporal shifts in healthcare data. Understanding these changes can improve patient outcomes, as well as healthcare resource allocation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.