JMIR Preprints #44331: Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: A Pilot Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: A Pilot Study

Walter Nelson;
Nityan Khanna;
Mohamed Ibrahim;
Justin Fyfe;
Maxwell Geiger;
Keith Edwards;
Jeremy Petch

ABSTRACT

Background:

To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index software. Record linkage in the master patient index is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served.

Objective:

We aimed to develop and evaluate a machine learning-based software tool which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database.

Methods:

We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface, and so is agnostic to the choice of master patient index software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SanteMPI, an open-source master patient index. We validated the tool using several synthetic patient populations in SanteMPI by comparing the performance of the optimized configuration in held-out data to SanteMPI’s default matching configuration using sensitivity and specificity.

Results:

The machine learning-optimized configurations correctly detect over 90% of true record linkages as definite matches in all datasets, with 100% specificity and positive predictive value in all datasets, whereas the baseline detects none. In the largest dataset examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI: 88.4, 92.0) and specificity of 100%. By comparison, the machine learning-optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI: 95.9, 96.0). We report significant gains in sensitivity in all datasets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and dataset generator have been made freely available.

Conclusions:

Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served.

Citation

Please cite as:

Nelson W, Khanna N, Ibrahim M, Fyfe J, Geiger M, Edwards K, Petch J

Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation

JMIR Form Res 2023;7:e44331

DOI: 10.2196/44331

PMID: 37384382

PMCID: 10365597

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Dec 14, 2022

Date Accepted: May 30, 2023

Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: A Pilot Study

ABSTRACT

Citation

Copyright