Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
The Continuity Trap in Data Science Health Research
ABSTRACT
Secondary use is now the ordinary course for data and biospecimens in health research. Clinical records collected for care become training data for prediction tools; archived images become foundation models; legacy biospecimens become renewable cell lines; and large corpora are repurposed to build health-related language models. Governance nevertheless continues to privilege the most obvious signals of persistence such as provenance logs, repository approvals, broad-consent forms, locality-preserving architectures, and documented ingestion pipelines, as if they were sufficient to establish legitimacy. They are not. In this article, we define the Continuity Trap as a continuity-specific form of proxy closure: a review-stage governance error in which a salient continuity signal in one domain is treated as sufficient reason to stop inquiry into whether semantic, authorization, and relational continuity have also been preserved. The concept is narrower than generic proceduralism, ethics washing, or proxy failure, because it isolates a specific inferential mistake in secondary-use review; and it is distinct from Goodhart’s and Campbell’s laws, which describe the dynamic corruption of measures once they become targets. The Continuity Trap can occur at an earlier stage, even in good-faith review. Continuity of ethical governance in data science health research must therefore be assessed across four domains, provenance, semantics, authorization, and relational standing, that we previously developed in our Representational Veracity framework. These domains can diverge as data are linked, transformed, modeled, and redeployed. We use vignettes from polygenic risk scores, legacy induced pluripotent stem cell derivation, federated learning, and health-related large language models to illustrate the problem. The policy implication is not universal re-review, but triggered continuity review whenever visible continuity is likely to be overread.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.