Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 22, 2019
Date Accepted: Mar 23, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Towards the development of data governance standards for using clinical free-text data in health research
ABSTRACT
Background:
Free-text clinical data (such as outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be de-identified or anonymised before they can be reused for research, but there is a lack of established guidelines to govern effective de-identification and use of free-text information and avoid damaging data utility as a by-product.
Objective:
We set out to work towards data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient/public benefit.
Methods:
We outlined (UK) data protection legislation and regulations for context, and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders including text mining researchers and the general public to explore perceived barriers and solutions in working with clinical free-text.
Results:
We propose a set of recommendations, including the need: for authoritative guidance on data governance for the reuse of free-text data; to ensure public transparency in data flows and uses; to treat de-identified free-text as potentially identifiable with use limited to accredited data safe-havens; and, to commit to a culture of continuous improvement to understand the relationships between efficacy of de-identification and re-identification risks, so this can be communicated to all stakeholders.
Conclusions:
By drawing together the findings of a combination of activities, our unique study has added new knowledge towards the development of data governance standards for the reuse of clinical free-text data for secondary purposes. Whilst working in accord with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.