Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 8, 2022
Date Accepted: Aug 12, 2022
Mining Severe Drug Hypersensitivity Reaction Cases in Pediatrics Electronic Health Records: Methodology Development and Applications
ABSTRACT
Background:
Severe drug hypersensitivity reactions (DHRs) refers to allergic reactions that are caused by drugs and present severe skin rash and internal damages as the main symptoms. For now, the reporting of severe DHRs in hospitals solely relies on spontaneous reporting systems (SRSs), which are operated by clinicians in charge. An automatic system that scrutinizes clinical notes and reports potential severe DHR cases will help decrease the number of missed positive cases and reduce the cost of manpower at the same time.
Objective:
Design a method that automatically identifies positive DHR cases given clinical notes in the electronic health records (EHR) system. Reduce both excess labor and computing resources. Verify the effectiveness of the proposed pipeline on a well-challenged N2C2 2016 smoking task, identifying smoking status of discharged patients. Apply the verified pipeline to our own task, automatic identification of severe DHRs in pediatrics EHRs.
Methods:
Considering the limited resources of both labor and computing power, the proposed method did not rely on extensive preprocessing, feature engineering nor hyperparameter fine-tuning. The proposed pipeline consisted three stages: (1) filter long clinical notes by a list of keywords; (2) transform the filtered texts into a high-dimensional feature space by statistical algorithms or contextualized neural language models, such as pretrained BERT models; and (3) train stochastic gradient descent (SGD), a machine learning classifier, in the high-dimensional feature space and classify each transformed document of clinical notes into a predefined category. The proposed method was verified on an openly available N2C2 2016 smoking task first. Then it was applied to automatic identify severe DHRs, both on an annotated dataset and in a nine years EHRs of pediatrics.
Results:
In the smoking task, the results showed that the domain-specific pretrained language model, ClinicalBERT (94.06%) and DischargeBERT(93.07%) outperformed the open-domain model, Bert-base-uncased(91.09%) by using filtered texts. The effectiveness of this proposed pipeline was verified by reaching the record of the state-of-the-art performance on this challenge (94.1% vs 94.2%). The proposed method was applied to the DHRs task with little transfer work. It was found that the domain-specific pretrained language model, Medbert-kd-chinese(89.09%), outperformed the Bert-base-chinese models(88.18%) and the TF-IDF baseline (83.64%).The model was then applied to a nine years of EHRs in Beijing Children’s Hospital, and a total of 1155 cases were alerted. After double-checking by clinician experts, 357 cases of severe DHRs were finally identified.
Conclusions:
It is worth considering various machine learning and deep learning models for a specific phenotyping task. The proposed method in this work is worth exploring especially considering its speed-up development process and low cost in computing.
Citation
Request queued. Please wait while the file is being generated. It may take some time.