Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 18, 2022
Date Accepted: Oct 12, 2022
TESLEA: Medical Text Simplification using Reinforcement Learning
ABSTRACT
Background:
In most cases, the abstracts of articles in the medical domain are publicly available. Although these are accessible by everyone, they are hard to comprehend for a wider audience due to the complex medical vocabulary. Thus, simplifying these complex abstracts is essential to make medical research accessible to the general public.
Objective:
This paper aims to develop a deep learning model that converts complex medical text to a simpler version while maintaining the quality of the generated text.
Methods:
A text simplification approach using Reinforcement Learning and Transformer-based language models was developed. Relevance reward, Flesch Kincaid reward and Lexical Simplicity reward were optimized to help simplify jargon-dense complex medical paragraphs to their simpler versions while retaining the quality of the text. The model was trained using 3,568 complex-simple medical paragraphs and evaluated on 480 paragraphs via the help of automated metrics and human annotation.
Results:
The proposed method outperformed previous baselines on Flesch Kincaid Scores (11.84) and achieved comparable performance to other baselines when measured using ROUGE-1 (0.39), ROUGE-2 (0.11) and SARI scores (0.40). Manual evaluation showed that percent agreement between human annotators was more than 70% when factors like fluency, coherence and adequacy were considered.
Conclusions:
A unique medical text simplification approach is successfully developed that leverages reinforcement learning and accurately simplifies complex medical paragraphs, hence increasing their readability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.