Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 17, 2025
Open Peer Review Period: Oct 17, 2025 - Dec 12, 2025
Date Accepted: Apr 7, 2026
(closed for review but you can still tweet)
Fusing Specialized Surveys of Rare Populations to Larger Surveys for Generalized Inference: A cross-sectional survey
ABSTRACT
Background:
Mainstays of pharmacoepidemiology are large, representative, behavioral surveys, which focus on many drugs with few detailed behaviors. Smaller, targeted studies frequently measure drug-specific patterns, but without explicit assumptions of generalizability, the evidence generated is narrow.
Objective:
In this study, we outline a theoretical framework based on data fusion and demonstrate effective combination of two surveys: a representative and a targeted survey about psychedelic drugs in the United States. Application of calibration weighting transports estimates from the smaller survey to the larger survey, fusing the data.
Methods:
The psychedelic-focused enriched survey was sampled from a commercial online panel of adults in two waves, from 19 April to 25 June 2024 and from 24 January to 21 March 2025. The larger, representative survey was sampled from a different online panel, also fielded twice from 13 March to 6 May and from 14 August to 9 October 2024. Internal consistency (transport bias) and external validity (root-mean-square error, RMSE) metrics were calculated. External validity was assessed by comparing demographic, health, and substance use estimates to national probability-based benchmarks.
Results:
A total of 2,048 weighting schemes were tested, fusing a sample of adults using psychedelic drugs to the representative sample. After fusion, transport bias was low, at <5 percentage points. RMSE compared to other national benchmarks confirmed good external validity. After fusion, reasons for using were estimated for psilocybin, LSD, and MDMA where recreational use was most common (>90%) while medical use was less common (20-30%).
Conclusions:
This application of data fusion presents rigorous, but accessible, methodology for small surveys to achieve national-level inference and generalizable estimates. Recreational reasons for use were more common than medical reasons across three psychedelic substances (psilocybin, LSD, MDMA), indicating that measurement into the depth of behaviors creates meaningful, novel understanding of use beyond simple prevalence estimates provided in existing representative surveys. The “fused survey design” is a methodologically valid path to generating these rare behavioral estimates.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.