JMIR Preprints #70107: Large Language Models (Chat GPT, Claude, and Gemini) have potential to significantly transform data analysis and medical education in Assisted Reproductive Technology: Comparison Study.

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Models (Chat GPT, Claude, and Gemini) have potential to significantly transform data analysis and medical education in Assisted Reproductive Technology: Comparison Study.

Noriyuki Okuyama;
Mika Ishi;
Yuriko Fukuoka;
Hiromitsu Hattori;
Yuta Kasahara;
Tai Toshihiro;
Koki Yoshinaga;
Tomoko Hashimoto;
Koichi Kyono

ABSTRACT

Background:

Recent studies have demonstrated that large language models (LLMs) exhibit exceptional performance in medical examinations. However, there is a lack of reports assessing their capabilities in specific domains or their application in practical data analysis using code interpreters. Furthermore, comparative analyses across different LLMs have not been extensively conducted.

Objective:

The purpose of this study was to evaluate whether advanced AI models can analyze data from template-based input and can demonstrate basic knowledge of reproductive medicine. The three AI models (Chat GPT, Claude, and Gemini) were evaluated for their data analytical capabilities through numerical calculations and graph rendering. Their knowledge of infertility treatment was assessed by solving ten examination questions from experts.

Methods:

First, we uploaded data to the AI models and furnished instruction templates using the chat interface. The study investigated whether the AI models could perform pregnancy rate analysis and graph rendering, based on blastocyst grades according to Gardner criteria. Second, we assessed model diagnostic capabilities based on specialized knowledge. This evaluation utilized ten questions derived from the Japanese Fertility Specialist Examination and the Embryologist Certification Exam, along with chromosome imaging. These materials were curated under the supervision of certified embryologists and fertility specialists. All procedures were repeated ten times per AI model.

Results:

GPT-4o and Gemini performed analyses within 30 seconds, requiring minor corrections from time to time thereafter. However, the process did not reach the stage of data analysis with Claude. GPT-4o, Claude, and Gemini achieved perfect scores on a set of nine knowledge-based questions derived from professional fertility specialist examinations. However, none of the AI models were able to accurately perform karyotype diagnostic tasks in reproductive medicine.

Conclusions:

This rapid processing demonstrates the potential for these AI models to significantly expedite data-intensive tasks in clinical settings. This performance underscores their potential utility as educational tools or decision support systems in reproductive medicine. However, none of the models were able to accurately interpret and diagnose using medical images.

Citation

Please cite as:

Okuyama N, Ishi M, Fukuoka Y, Hattori H, Kasahara Y, Toshihiro T, Yoshinaga K, Hashimoto T, Kyono K

Application of Large Language Models in Data Analysis and Medical Education for Assisted Reproductive Technology: Comparative Study

JMIR Form Res 2025;9:e70107

DOI: 10.2196/70107

PMID: 41032884

PMCID: 12488165

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Dec 15, 2024

Open Peer Review Period: Dec 16, 2024 - Feb 10, 2025

Date Accepted: Aug 13, 2025

(closed for review but you can still tweet)

Large Language Models (Chat GPT, Claude, and Gemini) have potential to significantly transform data analysis and medical education in Assisted Reproductive Technology: Comparison Study.

ABSTRACT

Citation

Copyright