Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: May 19, 2023
Open Peer Review Period: May 19, 2023 - Jul 14, 2023
Date Accepted: Sep 10, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study

Liao W, Liu Z, Dai H, Xu S, Wu Z, Zhang Y, Huang X, Zhu D, Cai H, Li Q, Liu T, Li X

Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study

JMIR Med Educ 2023;9:e48904

DOI: 10.2196/48904

PMID: 38153785

PMCID: 10784984

Differentiate ChatGPT-generated and Human-written Medical Texts

  • Wenxiong Liao; 
  • Zhengliang Liu; 
  • Haixing Dai; 
  • Shaochen Xu; 
  • Zihao Wu; 
  • Yiyang Zhang; 
  • Xiaoke Huang; 
  • Dajiang Zhu; 
  • Hongmin Cai; 
  • Quanzheng Li; 
  • Tianming Liu; 
  • Xiang Li

ABSTRACT

Background:

Large language models such as ChatGPT can capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public.

Objective:

This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT.

Methods:

We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT.

Results:

Medical texts written by humans are typically more concrete, diverse, and contain more useful information, while medical texts generated by ChatGPT tend to prioritize fluency and logic and generally express general terminologies rather than context-specific information. A BERT-based model can effectively detect medical texts generated by ChatGPT, with an F1 score exceeding 95%.

Conclusions:

Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of medical texts generated by ChatGPT differ from those written by human experts. Medical text generated by ChatGPT can be effectively detected by the proposed machine learning algorithms. This study provides a pathway towards trustworthy and accountable use of large language models in medicine.


 Citation

Please cite as:

Liao W, Liu Z, Dai H, Xu S, Wu Z, Zhang Y, Huang X, Zhu D, Cai H, Li Q, Liu T, Li X

Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study

JMIR Med Educ 2023;9:e48904

DOI: 10.2196/48904

PMID: 38153785

PMCID: 10784984

The author of this paper has made a PDF available, but requires the user to login, or create an account.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.