Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 10, 2021
Date Accepted: Nov 14, 2021
Development of a pipeline for Adverse Drug Reactions Identification in clinical Notes (ADRIN): Word embedding models and string matching
ABSTRACT
Background:
Knowledge about adverse drug reactions (ADRs) in the population is limited due to underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information about incidence of ADRs is of great relevance, which can be retrieved from clinical notes. However, manual labelling of these notes is time-consuming and automatization can improve use of free text clinical notes for identification of ADRs. Furthermore, tools for language processing in languages other than English are not widely available.
Objective:
To design and evaluate a method for automatic extraction of medication and ADRs Identification in clinical Notes (ADRIN)
Methods:
Dutch free text clinical notes (n=277.398) and medication registrations (n=499.435) were used from the Cardiology Centers of the Netherlands database. All clinical notes were used to develop word embedding models. Vector representations of word embedding models and a string matching with a medical dictionary (MedDRA) were used for identification of ADRs and medication in a test set of clinical notes that was manually labelled. Several settings, including search area and punctuation, could be adjusted in the prototype to evaluate the optimal version of the prototype.
Results:
The ADRIN method was evaluated using a test set 988 clinical notes, written on the stop date of a drug. Multiple versions of the prototype were evaluated for various task. Binary classification of ADR presence achieved the highest accuracy of 0.84. Reduced search area and inclusion of punctuation improves performance of the pipeline.
Conclusions:
The ADRIN method and prototype are effective in recognizing ADRs in Dutch clinical notes from cardiac diagnostic screening centers. Surprisingly, incorporation of MedDRA did not result in improved identification on top of word embedding models. The implementation of the ADRIN tool may help to increase the identification of ADRs, resulting in better care and saving substantial health care costs. Clinical Trial: N/A
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.