JMIR Preprints #104400: Auditing Citation Grounding in LLM-Generated OCT Reports Using Public Data: Evaluation Framework Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Auditing Citation Grounding in LLM-Generated OCT Reports Using Public Data: Evaluation Framework Study

Chia-Wei Sun

ABSTRACT

Background:

Evidence tags and structured schemas are often used to make large language model (LLM)-generated clinical text appear grounded. However, the presence of an evidence tag does not by itself establish that the tagged sentence is supported by the cited evidence. Health AI evaluation therefore needs methods that separate formatting compliance from semantic consistency and clinical truth.

Objective:

This study aimed to evaluate a public-data audit framework for citation-grounded optical coherence tomography (OCT) report generation and to test whether schema input adds measurable audit value beyond explicit citation instructions.

Methods:

Using the first 50 parseable public MORG report excerpts, we derived English evidence schemas and ran a four-arm computational ablation: free text without citation, free text with citation, schema without citation, and schema with citation. A local Gemma 4 model generated reports. A deterministic scrutinizer measured evidence-tag presence, invalid tags, lexical alignment, and evidence-field coverage. Three local LLM judges screened sentence-level semantic consistency, and a distractor test probed scope control. Results were analyzed descriptively.

Results:

Citation-enabled arms achieved complete tag presence, whereas no-citation arms produced no tagged sentences. Schema plus citation improved lexical alignment compared with free text plus citation (0.857 vs 0.617) and improved mean evidence-field coverage (0.996 vs 0.952). The free-text citation arm missed 8 of 157 derived evidence fields, compared with 1 of 157 in the schema-citation arm. Across 226 tagged sentences, exact pairwise judge agreement ranged from 0.942 to 0.978 and Gwet AC1 ranged from 0.941 to 0.977. Distractor uptake was 0/50.

Conclusions:

Citation instructions drove tag behavior, while schema input improved auditability and completeness checks. The framework should be interpreted as a language-layer health AI evaluation method, not as image-interpretation validation, clinical grounding, or deployment safety evidence.

Citation

Please cite as:

Sun CW

Auditing Citation Grounding in LLM-Generated OCT Reports Using Public Data: Evaluation Framework Study

JMIR Preprints. 11/06/2026:104400

DOI: 10.2196/preprints.104400

URL: https://preprints.jmir.org/preprint/104400

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Jun 11, 2026

Open Peer Review Period: Jun 19, 2026 - Aug 14, 2026

(currently open for review)

Auditing Citation Grounding in LLM-Generated OCT Reports Using Public Data: Evaluation Framework Study

ABSTRACT

Citation

Copyright