Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 23, 2024
Open Peer Review Period: Jul 25, 2024 - Sep 19, 2024
Date Accepted: Oct 8, 2024
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Predicting Prefecture−Level Well−Being Indicators in Japan Using Search Volumes in Internet Search Engines: an Infodemiology Study
ABSTRACT
Background:
In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as Gross Domestic Product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation, but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offer the potential for more cost-effective, timely, and less biased assessments.
Objective:
Our study aimed to create a model using internet search data to predict well-being indicators at the regional level in Japan, providing policymakers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions.
Methods:
This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. As predictor variables, z-score normalized relative search volume (RSV) data from Google Trends for words relevant to each RWI domain collected for the same years were used. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to build a model to predict RWI using RSVs, where α balances between ridge and lasso regression effects, and λ regulates their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root Mean Square Errors (RMSE) and Coefficients of Determination (R2) were used to assess the model’s predictive accuracy and fit.
Results:
An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from −1.587 to 3.902, with standard deviations between 3.025 and 0.053. The optimized Elastic Net model, with parameters α = 0.2 and λ = 0.537, showed an RMSE of 1.504 and an R2 of 0.867, incorporating 1 to 11 variables per domain.
Conclusions:
This study demonstrates the effectiveness of using Internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost−efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence−based policymaking aimed at enhancing well−being. Clinical Trial: Not applicable.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.