JMIR Preprints #95282: Duplicate detection in pharmacovigilance: a scalable predictive modelling approach

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Duplicate detection in pharmacovigilance: a scalable predictive modelling approach

Jim W. Barrett;
Nils Erlanson;
Joana Félix China;
G. Niklas Norén

ABSTRACT

Background:

Unlinked adverse event reports referring to the same case impede statistical analysis and may mislead clinical assessment. Pharmacovigilance relies on large databases of adverse event reports to discover potential new causal associations, and computational methods are required to identify duplicates at scale. Current state-of-the-art is statistical record linkage which outperforms rule-based approaches. In particular, vigiMatch is in routine use for VigiBase, the WHO global database of adverse event reports, and represents the first statistical duplicate detection approach in pharmacovigilance deployed at scale. Originally developed for both medicines and vaccines, its application to vaccines has been limited due to inconsistent performance across countries.

Objective:

To advance state-of-the-art for duplicate detection in large-scale pharmacovigilance databases and achieve more consistent performance across adverse event reports from different countries.

Methods:

This paper extends vigiMatch from probabilistic record linkage to predictive modelling, refining features for medicines, vaccines, and adverse events using country-specific reporting rates, extracting dates from free text, and training separate support vector machine classifiers for medicines and vaccines. Recall was evaluated using 5 independent labelled test sets. Precision was assessed by annotating random selections of report pairs classified as duplicates.

Results:

Precision for the new method was 92% for vaccines and 54% for medicines, compared with 41% for the comparator method. Recall ranged from 80-85% across test sets for vaccines and from 40–86% for medicines, compared with 24–53% for the comparator method.

Conclusions:

Predictive modeling, use of free text, and country-specific features advance state-of-the-art for duplicate detection in pharmacovigilance.

Citation

Please cite as:

Barrett JW, Erlanson N, Félix China J, Norén GN

Duplicate detection in pharmacovigilance: a scalable predictive modelling approach

JMIR Preprints. 13/03/2026:95282

DOI: 10.2196/preprints.95282

URL: https://preprints.jmir.org/preprint/95282

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Mar 13, 2026

Open Peer Review Period: Mar 20, 2026 - May 15, 2026

(currently open for review)

Duplicate detection in pharmacovigilance: a scalable predictive modelling approach

ABSTRACT

Citation

Copyright