Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Assessing the application of Natural Language Processing Models (NLPMs) in generating dermatologic patient education materials according to reading level
ABSTRACT
Background:
Health literacy presents a barrier to receiving outpatient dermatologic care. Yet, dermatologic patient education materials (PEMs) are often written above the national average 7-8th-grade reading level. Chat Generative Pre-Trained Transformer (ChatGPT), DermGPT and DocsGPT are natural language processing models responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.
Objective:
To assess the ability of NLPMs ChatGPT, DocsGPT and DermGPT to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, to assess preservation of meaning across such NLPM-generated PEMs, as assessed by dermatology resident trainees.
Methods:
We evaluated the Flesch-Kincaid reading level (FKRL) of current AAD PEMs for four common (atopic dermatitis, acne vulgaris, psoriasis, herpes zoster) and rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, lichen planus) dermatologic conditions. We prompted ChatGPT, DermGPT and DocsGPT “Create a patient education handout about [condition] at a [FKRL],” to iteratively generate 10 PEMs per condition at unspecified, 5th and 7th-grade FKRLs evaluated with Microsoft Word readability statistics. Preservation of meaning across NLPMs was assessed by two dermatology resident trainees.
Results:
Current AAD PEMs had an average FKRL of 9.35 and 9.50 for common and rare diseases, respectively. For common diseases, ChatGPT-produced PEMs had average FKRLs of 11.21 (unspecified prompt), 5.02 (5th-grade prompt) and 6.56 (7th-grade prompt); DocsGPT-produced PEMs had average FKRLs of 10.18 (unspecified prompt), 5.01 (5th-grade prompt) and 5.98 (7th-grade prompt); and DermGPT-produced PEMs had average FKRLs of 11.14 (unspecified prompt), 7.43 (5th-grade prompt) and 7.28 (7th-grade prompt). For rare diseases, ChatGPT-generated materials had average FKRLs of 11.45 (unspecified prompt), 5.13 (5th-grade prompt) and 6.75 (7th-grade prompt); DocsGPT-produced PEMs had average FKRLs of 10.41 (unspecified prompt), 5.30 (5th-grade prompt) and 6.43 (7th-grade unspecified); and DermGPT-generated PEMS had average FKRLs of 11.93 (unspecified prompt), 7.14 (5th-grade prompt) and 7.58 (7th-grade unspecified). Compared to DermGPT, both DocsGPT (P=1.75E-06, P=7.26E-05) and ChatGPT (P=2.60E-09, P=.000172) were better able to generate PEMs at a 5th-grade reading level for common and rare conditions, respectively. Preservation of meaning analysis revealed that for common conditions, DermGPT ranked highest for overall ease of reading, patient understandability and accuracy (14.75/15) followed by DocsGPT (14.25/15) and ChatGPT (13.5/15). For rare conditions, handouts generated by ChatGPT ranked highest (13.5/15), followed by DermGPT (13/15) and DocsGPT (13/15).
Conclusions:
Our analysis suggests that NLPMs may reliably meet 7th-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients and mostly accurate. More specifically, DocsGPT and ChatGPT appear to outperform DermGPT at the 5th-grade FKRL, though both DermGPT and DocsGPT perform better at the 7th-grade FKRL with few differences observed across common or rare conditions. As such, NLPMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.