JMIR Preprints #83219: Automated Multi-Tier Tagging of Chinese Online Health Education Materials Using a Large Language Model: Development and Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automated Multi-Tier Tagging of Chinese Online Health Education Materials Using a Large Language Model: Development and Validation Study

Jialin Meng;
Ruiming Dai;
Xiaolan Huang;
Yi Gu;
Shixing Yan;
Xiaoke Wang;
Jingrong Gao;
Tiantian Zhang

ABSTRACT

Background:

Effective precision health education and promotion depend on the efficient dissemination of health information. However, current health communication encounters structural bottlenecks, including information overload with insufficient precision matching, variable quality of health resources, and a lack of personalized services. These challenges impede large-scale targeted distribution and audience access. This study aimed to develop and validate an automated tagging system using a large language model (LLM) to enhance the efficiency and equity of health communication and promotion.

Objective:

This study aimed to develop, deploy, and validate an artificial intelligence-driven, multi-tier, automated content tagging system to address the core challenges in managing Chinese health education resources and provide a technical foundation for scalable precision health communication.

Methods:

We developed a health promotion taxonomy with 10 primary, 34 secondary, and 90,562 tertiary tags using a hybrid method combining a top-down approach (aligned with national standards and expert knowledge) and a bottom-up approach (corpus mining). Subsequently, we constructed an automated tagging system for health promotion materials by fine-tuning a Baichuan2-7B LLM with Low-Rank Adaptation (LoRA), then integrated it with a named entity recognition model and a vector database (Chroma DB), and evaluated its performance.

Results:

The final taxonomy included all 16 national priority health domains. The model achieved an overall tag automation rate of 94.8% on the test set, with rates of 97.38% for text-only resources and 89.55% for nontext resources. In a comparative analysis, the model-generated tags demonstrated a higher thematic relevance to the source content than the original manual annotations.

Conclusions:

A fine-tuned LLM can efficiently automate the assignment of a granular multilevel tagging system for Chinese health promotion resources. This approach provides a scalable solution to a key bottleneck in health-information management, establishing a technical foundation for advancing precise health communication and improving equitable access to health information.

Citation

Please cite as:

Meng J, Dai R, Huang X, Gu Y, Yan S, Wang X, Gao J, Zhang T

Automated Multitier Tagging of Chinese Online Health Education Resources Using a Large Language Model: Development and Validation Study

J Med Internet Res 2025;27:e83219

DOI: 10.2196/83219

PMID: 41251541

PMCID: 12756663

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 29, 2025

Date Accepted: Nov 18, 2025

Date Submitted to PubMed: Nov 18, 2025

Automated Multi-Tier Tagging of Chinese Online Health Education Materials Using a Large Language Model: Development and Validation Study

ABSTRACT

Citation

Copyright