Accepted for/Published in: JMIR Medical Education
Date Submitted: May 19, 2023
Open Peer Review Period: May 19, 2023 - Jul 14, 2023
Date Accepted: Sep 10, 2023
(closed for review but you can still tweet)
Differentiate ChatGPT-generated and Human-written Medical Texts
ABSTRACT
Background:
Large language models such as ChatGPT can capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public.
Objective:
This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT.
Methods:
We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT.
Results:
Medical texts written by humans are typically more concrete, diverse, and contain more useful information, while medical texts generated by ChatGPT tend to prioritize fluency and logic and generally express general terminologies rather than context-specific information. A BERT-based model can effectively detect medical texts generated by ChatGPT, with an F1 score exceeding 95%.
Conclusions:
Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of medical texts generated by ChatGPT differ from those written by human experts. Medical text generated by ChatGPT can be effectively detected by the proposed machine learning algorithms. This study provides a pathway towards trustworthy and accountable use of large language models in medicine.
Citation
The author of this paper has made a PDF available, but requires the user to login, or create an account.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.