Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Monday, March 11, 2019 at 4:00 PM to 4:30 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Currently accepted at: Journal of Medical Internet Research

Date Submitted: May 7, 2018
Open Peer Review Period: May 11, 2018 - Jun 28, 2018
Date Accepted: Sep 23, 2018
(closed for review but you can still tweet)

This paper has been accepted and is currently in production.

It will appear shortly on 10.2196/10986

The final accepted version (not copyedited yet) is in this tab.

The final, peer-reviewed published version of this preprint can be found here:

A Study of Web Page Understandability for Consumer Health Search

Palotti J, Zuccon G, Hanbury A

A Study of Web Page Understandability for Consumer Health Search

J Med Internet Res 2019;21(1):e10986

DOI: 10.2196/10986

PMID: 30698536

PMCID: 6372940

A Study of Web Page Understandability for Consumer Health Search

  • Joao Palotti; 
  • Guido Zuccon; 
  • Allan Hanbury



Understandability plays a key role in ensuring that people accessing health information are capable of gaining insights that can assist them with their health concerns and choices. The access to unclear or misleading information has been shown to negatively impact on the health decisions of the general public.


We investigated methods to estimate the understandability of health Web pages and used these to improve the retrieval of information for people seeking health advice on the Web.


Our investigation considered methods to automatically estimate the understandability of health information in Web pages, and it provided a thorough evaluation of these methods using human assessments as well as an analysis of pre-processing factors affecting understandability estimations, and associated pitfalls. Furthermore, lessons learnt for estimating Web page understandability were applied to the construction of retrieval methods with specific attention to retrieving information understandable by the general public.


We found that machine learning techniques were more suitable to estimate health Web page understandability than traditional readability formulae, which are often used as guidelines and benchmarking by health information providers on the Web (larger difference found for Pearson correlation of .602 using Gradient Boosting regressor compared to .438 using SMOG Index with CLEF 2015 collection). Learning to rank effectively exploited these estimates to provide the general public with more understandable search results (H_RBP^* reached 29.20, 22% higher than a BM25 baseline and 13% higher than the best system at CLEF 2016, both P≤.001).


The findings reported in this article are important for specialised search services tailored to support the general public in seeking health advice on the Web, as they document and empirically validate state-of-the-art techniques and settings for this domain application. Clinical Trial: Not required.


Please cite as:

Palotti J, Zuccon G, Hanbury A

A Study of Web Page Understandability for Consumer Health Search

Journal of Medical Internet Research. (forthcoming/in press)

DOI: 10.2196/10986


PMID: 30698536

PMCID: 6372940

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.