Currently submitted to: JMIR Human Factors
Date Submitted: Mar 18, 2026
Open Peer Review Period: Apr 7, 2026 - Jun 2, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Sentence-Level Provenance for AI Medical Record Summarization: Formative Usability Evaluation of a Click-to-Inspect Interface
ABSTRACT
Background:
Large language models (LLMs) can generate fluent summaries of longitudinal medical records, but in high-stakes clinical settings, verification burden remains a barrier to trust. Existing provenance mechanisms, such as document-level citations and section references, often require manual search within long, fragmented notes, limiting their usefulness during time-constrained workflows for clinicians.
Objective:
To design and evaluate a sentence-level provenance interface (“click-to-inspect”) that enables rapid verification of AI-generated longitudinal medical record summaries at the level of individual statements.
Methods:
We developed and tested a web-based interface in which every sentence in an AI-generated longitudinal patient summary is clickable and linked to a semantically matched source sentence in the originating clinical note. Clicking a sentence opens the source note in a side-by-side view, scrolls to the matched passage, and highlights it in context. Formative usability testing was conducted with 46 clinician interactions using synthetic longitudinal patient charts. Participants included medical students, residents, and attending physicians across multiple specialties including internal medicine, dermatology, radiology, plastic surgery, anesthesiology, interventional radiology, obstetrics-gynecology, and family medicine. Usability was assessed using the System Usability Scale (SUS) and Net Promoter Score (NPS), alongside qualitative feedback.
Results:
Clinicians reported high usability (mean SUS score 86.25, SD 7.77; 95% CI 83.96–88.54) and a positive overall experience (NPS 35; 22/46 promoters, 18/46 passives, 6/46 detractors). Participants described rapid access to supporting evidence as critical for trust calibration during first-pass chart review. Qualitative feedback identified friction in traditional citation-based interfaces and supported sentence-level inspectability as a low-friction verification mechanism.
Conclusions:
Sentence-level provenance transforms AI-generated summaries from static narratives into interactive verification tools. An approach that enables rapid, selective inspection of individual claims during longitudinal chart review, may reduce verification burden and support calibrated reliance in high-risk clinical contexts. Clinical Trial: NA
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.