Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Mar 4, 2026

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Assessing the Validity and Utility of LLM-Supported Qualitative Analysis of Statutory Policy Documents: A Comparative Study Using Integrated Care Board Joint Forward Plans

  • Soheila Ghasri; 
  • Jennifer Liddle; 
  • Sean Gill; 
  • Hannah O’Keefe; 
  • Gemma Frances Spiers; 
  • Chris Marshall; 
  • Usha Boolaky; 
  • Jane Mcdermott

ABSTRACT

Background:

Large language models (LLMs) are increasingly being used to accelerate qualitative research tasks such as document review and data extraction. Yet there is limited empirical evidence on how accurately these systems perform when applied to complex statutory health policy documents, which are often long, densely written, and designed for governance and assurance rather than analytic clarity. In England’s National Health Service (NHS), the Health and Care Act 2022 established Integrated Care Systems and introduced Integrated Care Board (ICB) Joint Forward Plans (JFPs). Rapid analysis of healthcare priorities and systematic mapping of unmet needs across ICBs can support the identification of regional variation and inform research, policy development, and innovation.

Objective:

To assess whether LLMs can support framework-based qualitative analysis of ICB JFPs by comparing LLM-assisted deductive data extraction with manual researcher-led extraction, focusing on accuracy and traceability to source text.

Methods:

We conducted a comparative evaluation of deductive qualitative data extraction undertaken by researchers and by three LLMs: ChatGPT (OpenAI), Grok (xAI), and Claude (Anthropic). A predefined analytical framework comprising 9 domains and 41 analytical questions was developed to guide both manual and automated analysis. Five JFPs were sampled from ICBs serving areas of high socioeconomic deprivation in England. Two researchers independently conducted manual extractions using structured spreadsheets, followed by cross-checking and consensus resolution. The same framework was operationalized as structured prompts using a Role–Action–Context–Execution approach and applied consistently across subscription-tier versions of each model. Outputs were compared with manual extraction across forty-one analytical fields per document. Overall accuracy was defined as the proportion of agreement, partial agreement, and disagreement in favor of the LLM, including cases where only the LLM identified relevant evidence.

Results:

All three LLMs completed data extraction in 5 to 7 minutes per document, compared with approximately 6 hours per document for manual extraction. No hallucinated content was identified when LLM-only evidence was manually checked. Model performance varied by LLM and domain. Grok achieved the highest overall accuracy, matching or outperforming manual extraction in 83.4% of fields, particularly in domains with explicit operational content (e.g., cross-cutting system capabilities, use of data and evidence, and cross-system comparison). ChatGPT achieved moderate overall accuracy (54.5%) and performed best where priorities, specificity, and key performance indicators were clearly signposted. Claude showed lower overall accuracy (37.1%) but performed relatively better in more narrative domains, including cross-system comparison and public and community engagement.

Conclusions:

LLMs can substantially reduce the time required for framework-based data extraction from statutory health policy documents and can capture clearly stated, structured content. However, performance varies meaningfully across models and analytic domains, supporting a transparent human-in-the-loop approach in which LLMs assist with extraction while researchers retain responsibility for verification, interpretation, and synthesis.


 Citation

Please cite as:

Ghasri S, Liddle J, Gill S, O’Keefe H, Frances Spiers G, Marshall C, Boolaky U, Mcdermott J

Assessing the Validity and Utility of LLM-Supported Qualitative Analysis of Statutory Policy Documents: A Comparative Study Using Integrated Care Board Joint Forward Plans

JMIR Preprints. 04/03/2026:94639

DOI: 10.2196/preprints.94639

URL: https://preprints.jmir.org/preprint/94639

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.