Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Nov 11, 2024
Date Accepted: Jul 3, 2025

The final, peer-reviewed published version of this preprint can be found here:

Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study

Sercombe J, Bryant Z, Wilson J

Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study

JMIR Form Res 2025;9:e68666

DOI: 10.2196/68666

PMID: 40789147

PMCID: 12338963

Evaluating a customised version of ChatGPT for systematic review data extraction in health research: a pilot study

  • Jayden Sercombe; 
  • Zachary Bryant; 
  • Jack Wilson

ABSTRACT

Background:

Systematic reviews are essential for synthesising research in health sciences, yet they are resource-intensive and prone to human error. Recent advancements in artificial intelligence (AI), specifically Large Language Models (LLMs) like ChatGPT, may streamline this process.

Objective:

This study aims to develop and evaluate a custom Generative Pre-Training Transformer (GPT), named Systematic Review Extractor Pro, for automating the data extraction phase of systematic reviews in health research.

Methods:

OpenAI's GPT Builder was used to create a GPT tailored to extract information from academic manuscripts. A sample of 20 studies across two distinct systematic reviews was used to evaluate the GPT's performance in extraction. Agreement rates between the GPT outputs and human reviewers were calculated for each study subsection.

Results:

The GPT demonstrated high overall agreement rates with human reviewers, achieving 91.45% for review 1 and 89.31% for review 2. It was particularly accurate in extracting study (review 1: 95.25; review 2: 90.83%) and participant (review 1: 95.03%; review 2: 90.00%) characteristics, with lower performance observed in more complex areas such as methodological characteristics (87.07%) and statistical results (77.50%).

Conclusions:

As AI is a rapidly evolving, the technology may significantly enhance systematic review practices by improving efficiency and reducing human errors. The tool in the current study has been made open access.


 Citation

Please cite as:

Sercombe J, Bryant Z, Wilson J

Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study

JMIR Form Res 2025;9:e68666

DOI: 10.2196/68666

PMID: 40789147

PMCID: 12338963

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.