Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Oct 26, 2025
Open Peer Review Period: Aug 23, 2025 - Oct 18, 2025
Date Accepted: Feb 5, 2026
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

ChatGPT versus UpToDate in Preclinical Medical Education: Cross-Sectional Analysis Using Term Frequency–Inverse Document Frequency Cosine Similarity

Thiru SS, Aksu NE, Chiang M, Gallagher DO, Furlong M, Prevou ER, Khanna AJ

ChatGPT versus UpToDate in Preclinical Medical Education: Cross-Sectional Analysis Using Term Frequency–Inverse Document Frequency Cosine Similarity

JMIR Med Educ 2026;12:e82885

DOI: 10.2196/82885

PMID: 41861392

ChatGPT vs UpToDate: A Cross-Sectional Analysis of Alignment Across Preclinical Medical Topics Using TF-IDF Cosine Similarity

  • Shankar S Thiru; 
  • Nicholas E Aksu; 
  • Matthew Chiang; 
  • Daniel O Gallagher; 
  • Mary Furlong; 
  • Elizabeth R Prevou; 
  • Akhil Jay Khanna

ABSTRACT

Background:

ChatGPT is increasingly relied upon as a study tool among medical trainees during the preclinical curricular phase, raising concern about its accuracy and reliability.

Objective:

The aim of this study is to compare ChatGPT 4o mini to UpToDate with the purpose of assessing for similarity.

Methods:

We queried a total of 150 preclinical-level questions: 30 biochemistry, 30 immunology, 30 microbiology, 30 pharmacology, and 30 pathology. ChatGPT was asked each question 5 times to account for stochasticity. Next, a text network analysis was performed using cosine comparisons of term frequency inverse-document frequency (TF-IDF) to gauge similarity between ChatGPT and UpToDate responses per question for each subject. A statistical reference (p = 0.05) for interpretation of TF-IDF values was generated using random text samples with same length distribution as the UpToDate responses. TF-IDF similarity of ChatGPT responses to overall subject category was also performed.

Results:

ChatGPT responses were most similar to UpToDate with regard to answering pharmacology questions (TF-IDF 0.3380.134). ChatGPT’s response similarity to UpToDate for the remaining subjects were 0.3210.142 for pathology, 0.296±0.120 for biochemistry, 0.2970.108 for microbiology, and 0.2750.102 for immunology. Reference TF-IDF scores of randomly generated text were 0.262, 0.279, 0.243, 0.267, and 0.281 for biochemistry, immunology, microbiology, pharmacology, and pathology respectively.

Conclusions:

The majority of ChatGPT responses are similar to UpToDate responses for preclinical questions across the subjects of biochemistry, immunology, microbiology, pharmacology, and pathology. Thus, ChatGPT may have a role in medical training during the preclinical curricular phase with the caveat that its utility may vary based on subject.


 Citation

Please cite as:

Thiru SS, Aksu NE, Chiang M, Gallagher DO, Furlong M, Prevou ER, Khanna AJ

ChatGPT versus UpToDate in Preclinical Medical Education: Cross-Sectional Analysis Using Term Frequency–Inverse Document Frequency Cosine Similarity

JMIR Med Educ 2026;12:e82885

DOI: 10.2196/82885

PMID: 41861392

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.