Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 30, 2019
Open Peer Review Period: Dec 30, 2019 - Jan 10, 2020
Date Accepted: Apr 3, 2020
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Using Natural Language Processing Techniques to Provide Personalized Health Education for Chronic Disease Patients: Implementation of A Knowledge-based Health Recommender System
ABSTRACT
Background:
Health education is an important intervention for improving chronic disease patients’ awareness and self-management abilities. The rapid development of information technologies changes the form of patient education materials from traditional paper materials to electronic materials. To date, the amount of educational materials on the Internet is tremendous and their quality is highly variable. Patients without a medical background may find it hard to distinguish the most valuable materials for themselves.
Objective:
The aim of this study is to develop a health recommender system to recommend appropriate educational materials to chronic disease patients.
Methods:
We implemented a knowledge-based recommender system using ontology and several natural language processing (NLP) techniques. The development process was divided into 2 stages. In stage 1, we constructed an ontology for chronic disease patient education aiming to understand and analyze patient data. In stage 2, we implemented an algorithm to generate the recommendations based on the ontology. Patient data and educational materials were mapped to the ontology and converted into vectors with the same length, then the recommendations were generated based on the similarity of these vectors. We used keyword extraction algorithms and pre-trained word embeddings to preprocess the educational materials. Concretely, the term frequency-inverse document frequency (TF-IDF) and TextRank methods were adopted to extract keywords; the word2vec model was adopted to train the word embeddings. We also proposed three strategies to improve the keyword extraction performance. The evaluation was based on a manually assembled gold standard dataset for 50 patients and 100 educational materials. The recommendation performance was assessed using the macro precision of top-ranked documents.
Results:
The constructed Chronic Disease Patient Education Ontology (CDPEO) mainly consisted of two levels. Level 1 included 5 terms: demographic, disease, physiological index, lifestyle and medication, which describe the characteristics contained in the patient data, meanwhile corresponding to the topics of educational materials. Level 2 contained the detailed elements for each Level 1 class. The ontology vector is a 32-dimensional vector generated from the Level 2 classes. In the keyword extraction performance evaluation, the improved TextRank algorithm achieved the best precision of 53.2%, compared with the manual extraction results. In the recommendation performance evaluation, the improved TF-IDF method achieved the highest macro precision of 97% at the top 1 recommendation.
Conclusions:
This study implemented a knowledge-based health recommender system to provide personalized health education for chronic disease patients. The system proved to be effective and we learned from the study that efficient NLP techniques for preprocessing education materials are crucial to such systems.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.