Data integration to improve real-world health outcomes research for non-small cell lung cancer in the United States: The RESOUNDS study
ABSTRACT
Background:
The integration of data from disparate sources could help alleviate data insufficiency in real-world studies and compensate for inadequacies of single data sources and short-duration, small sample size studies while improving the utility of data for research.
Objective:
This study sought to describe and evaluate a process of integrating data from several complementary sources to conduct health outcomes research in patients with non-small cell lung cancer (NSCLC). The integrated data set was then used to describe patient demographics, clinical characteristics, treatment patterns, and mortality rates.
Methods:
This retrospective cohort study integrated data from four sources: administrative claims from the HealthCore Integrated Research Database, clinical data from a Cancer Care Quality Program (CCQP), clinical data from abstracted medical records (MR), and mortality data from the US Social Security Administration. Lung cancer patients who initiated 2nd-line (2L) therapy between 11/01/2015 and 04/13/2018 were identified in the claims and CCQP data. Eligible patients were age 18 years or older and received atezolizumab, docetaxel, erlotinib, nivolumab, pembrolizumab, pemetrexed, or ramucirumab in the 2L setting. The main analysis cohort included patients with claims data plus data from at least one additional data source (CCQP or MR). Patients without integrated data (claims only) were reported separately. Descriptive and univariate statistics were reported.
Results:
Data integration resulted in a main analysis cohort of 2,195 patients with NSCLC; 2,106 patients had CCQP and 407 patients had MR data. The claims-only cohort included 931 eligible patients. For the main analysis cohort, mean (SD) age was 62.1 (9.27) years, 49% female, median length of follow-up 6.8 months, 38% with observed death. For the claims-only cohort, mean (SD) age was 66.6 (12.69) years, 52% female, median length of follow-up 8.6 months, 29% with observed death. The most frequent 2L treatment was immunotherapy (50%), followed by platinum-based regimens (22%) and single-agent chemotherapy (20%); mean (median) duration of 2L therapy was 5.6 (4.0) months. We describe challenges and learnings from the data integration process; benefits of the integrated data set include a richer set of clinical and outcome data to supplement the utilization metrics available in administrative claims.
Conclusions:
The management of patients with NSCLC requires care from a multidisciplinary team, leading to lack of a single aggregated data source in real world settings. The availability of integrated clinical data from medical records, health plan claims and other sources of clinical care may improve the ability to assess emerging treatments.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.