Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 25, 2024
Date Accepted: May 4, 2024

The final, peer-reviewed published version of this preprint can be found here:

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Xu J, Lu L, Yang S, Liang B, Peng X, Pang J, Ding J, Shi X, Yang L, Song H, Li K, Sun X, Zhang S

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

JMIR Med Inform 2024;12:e57674

DOI: 10.2196/57674

PMID: 38952020

PMCID: 11225096

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Per the author's request this version is not available.