Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Dermatology

Date Submitted: Dec 29, 2023
Date Accepted: Mar 6, 2024

The final, peer-reviewed published version of this preprint can be found here:

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study

Lambert R, Choo ZY, Gradwohl K, Schroedl L, Ruiz De Luzuriaga A

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study

JMIR Dermatol 2024;7:e55898

DOI: 10.2196/55898

PMID: 38754096

PMCID: 11140271

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Assessing the application of Natural Language Processing Models (NLPMs) in generating dermatologic patient education materials according to reading level

  • Raphaella Lambert; 
  • Zi-Yi Choo; 
  • Kelsey Gradwohl; 
  • Liesl Schroedl; 
  • Arlene Ruiz De Luzuriaga

ABSTRACT

Background:

Health literacy presents a barrier to receiving outpatient dermatologic care. Yet, dermatologic patient education materials (PEMs) are often written above the national average 7-8th-grade reading level. Chat Generative Pre-Trained Transformer (ChatGPT), DermGPT and DocsGPT are natural language processing models responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.

Objective:

To assess the ability of NLPMs ChatGPT, DocsGPT and DermGPT to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, to assess preservation of meaning across such NLPM-generated PEMs, as assessed by dermatology resident trainees.

Methods:

We evaluated the Flesch-Kincaid reading level (FKRL) of current AAD PEMs for four common (atopic dermatitis, acne vulgaris, psoriasis, herpes zoster) and rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, lichen planus) dermatologic conditions. We prompted ChatGPT, DermGPT and DocsGPT “Create a patient education handout about [condition] at a [FKRL],” to iteratively generate 10 PEMs per condition at unspecified, 5th and 7th-grade FKRLs evaluated with Microsoft Word readability statistics. Preservation of meaning across NLPMs was assessed by two dermatology resident trainees.

Results:

Current AAD PEMs had an average FKRL of 9.35 and 9.50 for common and rare diseases, respectively. For common diseases, ChatGPT-produced PEMs had average FKRLs of 11.21 (unspecified prompt), 5.02 (5th-grade prompt) and 6.56 (7th-grade prompt); DocsGPT-produced PEMs had average FKRLs of 10.18 (unspecified prompt), 5.01 (5th-grade prompt) and 5.98 (7th-grade prompt); and DermGPT-produced PEMs had average FKRLs of 11.14 (unspecified prompt), 7.43 (5th-grade prompt) and 7.28 (7th-grade prompt). For rare diseases, ChatGPT-generated materials had average FKRLs of 11.45 (unspecified prompt), 5.13 (5th-grade prompt) and 6.75 (7th-grade prompt); DocsGPT-produced PEMs had average FKRLs of 10.41 (unspecified prompt), 5.30 (5th-grade prompt) and 6.43 (7th-grade unspecified); and DermGPT-generated PEMS had average FKRLs of 11.93 (unspecified prompt), 7.14 (5th-grade prompt) and 7.58 (7th-grade unspecified). Compared to DermGPT, both DocsGPT (P=1.75E-06, P=7.26E-05) and ChatGPT (P=2.60E-09, P=.000172) were better able to generate PEMs at a 5th-grade reading level for common and rare conditions, respectively. Preservation of meaning analysis revealed that for common conditions, DermGPT ranked highest for overall ease of reading, patient understandability and accuracy (14.75/15) followed by DocsGPT (14.25/15) and ChatGPT (13.5/15). For rare conditions, handouts generated by ChatGPT ranked highest (13.5/15), followed by DermGPT (13/15) and DocsGPT (13/15).

Conclusions:

Our analysis suggests that NLPMs may reliably meet 7th-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients and mostly accurate. More specifically, DocsGPT and ChatGPT appear to outperform DermGPT at the 5th-grade FKRL, though both DermGPT and DocsGPT perform better at the 7th-grade FKRL with few differences observed across common or rare conditions. As such, NLPMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.


 Citation

Please cite as:

Lambert R, Choo ZY, Gradwohl K, Schroedl L, Ruiz De Luzuriaga A

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study

JMIR Dermatol 2024;7:e55898

DOI: 10.2196/55898

PMID: 38754096

PMCID: 11140271

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.