Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Jun 28, 2023
Open Peer Review Period: Jun 28, 2023 - Jul 20, 2023
Date Accepted: Nov 28, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
A method to generate new contextual variables through Web scraping, Text Mining and Spatial overlay analysis (WeTMS): An application to program evaluation in health research
ABSTRACT
Contextual variables representing the economic, political, or cultural characteristics of a specific area have crucial applications in public health research and program evaluation, such as evaluating policy implementation in local areas or explaining variability in health outcomes across populations. However, accessing context-level data can pose significant challenges in the absence of monitoring systems. Even though the Internet can serve as a major source of information, website data is often unstructured and not suitable for analysis. This study aims to describe a novel research method that integrates web scraping, text mining, and spatial overlay analysis to convert unstructured website data into theoretically informed contextual variables. The paper is structured as follows. In the first section, we describe the method while introducing the techniques of web scraping, text mining, and spatial overlay analysis. The process is explained step-by-step and applied to a real research case to generate contextual-level variables on health assets with the potential to foster social connections among older adults in the context of a large regional public health program. The method, however, can also be useful in public health, health services research, health policy analysis, program evaluation, epidemiology, and other disciplines with an interest in contextual-level data where data is scarce, hard to obtain or reflects emerging issues where data has not been generated.
Citation
The author of this paper has made a PDF available, but requires the user to login, or create an account.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.