JMIR Preprints #30588: Predicting the Linguistic Accessibility of Chinese Health Translations:Using Machine Learning Algorithms

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting the Linguistic Accessibility of Chinese Health Translations:Using Machine Learning Algorithms

Meng Ji;
Pierrette Bouillon

ABSTRACT

Background:

Linguistic accessibility has important impact on the reception and utilization of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied.

Objective:

Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organization health materials with original Chinese health resources developed by the Chinese health authorities.

Methods:

Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organization with original Chinese materials from China Centre for Disease Control and Prevention.

Results:

Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008.

Conclusions:

The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.

Citation

Please cite as:

Ji M, Bouillon P

Predicting the Linguistic Accessibility of Chinese Health Translations: Machine Learning Algorithm Development

JMIR Med Inform 2021;9(10):e30588

DOI: 10.2196/30588

PMID: 34617914

PMCID: 8532010

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 21, 2021

Date Accepted: Jul 2, 2021

Predicting the Linguistic Accessibility of Chinese Health Translations:Using Machine Learning Algorithms

ABSTRACT

Citation

Copyright