Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 31, 2019
Date Accepted: Feb 21, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
CHELCS: A Text Complexity Measurement Tool to Assess Consumer Health Language Differences
ABSTRACT
Background:
The language gap between health consumers and health professionals has been long recognized as the hindrance for effective health information comprehension. Although providing health information access in consumer health language is widely accepted as the solution to the problem, health consumers were found to have various health language preferences and proficiencies. To adaptively simplify health documents for heterogeneous consumer groups, it is important to quantify how consumer health languages are different in terms of complexity among various consumer groups.
Objective:
This study proposes a measurement tool (CHELCS, Consumer Health Language Complexity Score) to quantify the complexity of consumer health language (CHL) using syntax-level, text-level, term-level, and semantic-level complexity measurements. Specifically, we used CHELCS to compare posts of each individual in online health forums designed for: (a) the general public, (b) D/deaf and hard of hearing (D/hh) people, and for (3) people with autism spectrum disorder (ASD).
Methods:
Posts with more than four sentences of each user from three health forums: 12,560 posts from 3,756 users in Yahoo!Answers, 25,545 posts from 1,623 users in AllDeaf, and 26,484 posts from 2,751 users in WrongPlanet were examined to understand CHL complexity differences among these groups. We calculated syntax-level, text-level, term-level, and semantic-level complexity score and CHELCS for each user, and compared the scores of three user groups (i.e., D/hh, ASD, public) through two-sample Kolmogorov–Smirnov tests and ANCOVA tests.
Results:
The results suggest that participants in the public forum used more complex CHL, particularly more diverse semantics and more complex health terms compared to those participating in the ASD and D/hh forums. However, between the latter two, ASD users used more complex words, and D/hh users used more complex syntax.
Conclusions:
Our results showed that the users in the three online forums had significantly different CHL complexities. The proposed CHELCS and detailed measurements helped to comprehensively quantify these CHL complexity differences. The results emphasize the importance of tailoring health content for different consumer groups with varying CHL complexities.
Citation