Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Oct 25, 2025
Date Accepted: May 30, 2026
Predictive Modeling of Tuberculosis Outcomes Using Routine Surveillance Data: Retrospective Cohort Study in Chiang Mai, Thailand
ABSTRACT
Background:
The most common cause of mortality due to infectious diseases worldwide is Tuberculosis (TB). As of 2023, approximately 10.8 million new cases of TB have been diagnosed. Importantly, the 8.2 million cases recorded regionally have established that Thailand is a high-burden country for both TB and TB/HIV with an estimated 113,000 TB new cases recorded each year. Accordingly, these incidences have been associated with a treatment coverage rate of only 71% [1]. In Chiang Mai Province, there are still high disparities with respect to early detection, especially among rural and remote districts where the use of innovative surveillance models that address equity concerns is required.
Objective:
This proposed research is an attempt to create and test hybrid surveillance system applications using the SEIR (Susceptible-Exposed-Infectious-Recovered) epidemiological model alongside machine learning (ML) algorithms to enhance TB risk forecast (prevention) and aid informed decision-making in Chiang Mai, Thailand.
Methods:
The study method we employed was that of a retrospective cohort study involving data mining that utilized data on 5,557 known cases of TB cases registered in the National Tuberculosis Information Program (NTIP), 2020-2024. A hybrid SEIR-ML model was yielded, which matched individual algorithms to each stage of the disease; progression: logistic regression (risk of infection), progression: random forest, Cox proportional hazards model (mortality): Cox proportional hazard, and accelerated failure time model (treatment delay): treatment delay. The area under the receiver was operated as a characteristic curve (AUC), wherein the C-index, as provided by Harrell and R2, was used as a measurement of model performance. Simulations (scenario) were conducted to determine the possible impacts of model implementation on the system, while also monitoring any possible implementation problems.
Results:
Integrated models revealed high predictive labels in all dimensions of AUC at 0.89 (95% CI: 0.87-0.91) with regard to infection and 0.91 (95% CI: 0.89-0.93) with regard to progression; C-index 0.86 (95% CI: 0.84-0.88) with regard to mortality, and R2 = 0.74 with regard to treatment delay (all p <.001). The HIV co-infection (HR = 5.8) and the HIV concurrent at ages above 65 years (HR = 12.3) were greatly associated with a risk of mortality. Additionally, rural residence, older age, and health insurance were significantly correlated with treatment delays (mean delay of treatment: 12-18 days). The projected outcomes indicated a 25% early detection increase, while 15% better treatment results and 20% decreased mortality rates were demonstrated as results of implementing the proposed framework through the use of scenario modeling.
Conclusions:
A combination of mechanistic SEIR modeling and danger forecasting, which was achieved through machine learning, enhanced TB surveillance through population-scale and individual-scale forecasting. The framework also detected structural imbalances in healthcare access and could be used as a scalable and decision-supporting form of control for TB in resource-limited settings, resulting in a more equity-focused solution.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.