Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 9, 2024
Date Accepted: Jul 31, 2025
Parallel Corpus Analysis of Text and Audio Comprehension: Evaluating Readability Formula Effectiveness
ABSTRACT
Background:
Health literacy, the ability to understand and act on health information, is critical for patient outcomes and healthcare system effectiveness. While plain language guidelines enhance text-based communication, audio-based health information remains underexplored, despite the growing use of virtual assistants and smart devices in healthcare. Traditional readability formulas, such as Flesch-Kincaid, provide limited insights into the complexity of health-related texts and fail to address challenges specific to audio formats. Factors like syntax and semantic features significantly influence comprehension and retention across modalities.
Objective:
This study investigates features that affect comprehension of medical information delivered via text or audio formats. We also examine existing readability formulas and their correlation with perceived and actual difficulty of health information for both modalities.
Methods:
We developed a parallel corpus of health-related information that differed in delivery format: text or audio. We used text from BMJ Lay Summary (N=193), WebMD (N=40), Patient Instruction (N=40), Simple Wikipedia (N=243), BMJ Journal (N=200). Participants (N = 487) read or listened to a health text and then completed a questionnaire evaluating perceived difficulty of the text measured using a 5-point Likert scale and actual difficulty measured using multiple-choice and true-false questions (comprehension) as well as free recall of information (retention). Questions were generated by generative AI (ChatGPT 4.0 ). Underlying syntactic, semantic, and domain specific features, as well as common readability formulas were evaluated for their relation to information difficulty.
Results:
Text versions were perceived as easier than audio, with BMJ Lay Summary scoring 1.76 vs. 2.1 and BMJ Journal 2.59 vs. 2.83 (lower is easier). Comprehension accuracy was higher for text across all sources (e.g., BMJ Journal: 76% vs. 58%; Patient Instructions: 86% vs. 66%). Retention was better for text, with significant differences in exact word matching for Patient Instructions and BMJ Journal. Longer texts increased perceived difficulty in text but reduced free recall in both modalities (-0.23, -0.25 in audio). Higher content word frequency improved retention (0.23, 0.21) and lowered perceived difficulty (-0.20 in audio). Verb-heavy content eased comprehension (-0.29 in audio), while nouns and adjectives increased difficulty (0.20, 0.18). Readability formulas outcomes were unrelated to comprehension or retention, but correlated with perceived difficulty in text (e.g., Smog Index: 0.334 correlation).
Conclusions:
Text was more effective for conveying complex health information, but audio can be suitable for easier content. In addition, several textual features affect information comprehension and retention for both modalities. Finally, existing readability formulas did not explain actual difficulty. This study highlighted the importance of tailoring health information delivery to content complexity by using appropriate style and modality.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.