Currently submitted to: Journal of Medical Internet Research
Date Submitted: Jun 10, 2026
Open Peer Review Period: Jun 11, 2026 - Aug 6, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Toward Citation-Guided Sensor Metadata Enrichment: A Large Language Model-based Approach
ABSTRACT
Background:
Sensor metadata is critical for exposure health research because it supports accurate sensor identification, deployments, data integration, interoperability, and reproducibility. Yet it is often fragmented across multiple heterogeneous sources, such as scientific literature and manufacturer guides, where key specifications are frequently reported indirectly through citation chains, making reference tracing essential for metadata enrichment and completeness.
Objective:
To address this bottleneck, we developed and evaluated an LLM-based automated, citation-aware pipeline that enriches sensor metadata extracted from a primary article by identifying sensor-related citation markers and extracting additional metadata from the referenced sources.
Methods:
We extend our prior LLM-based metadata extraction approach by (i) detecting sensor mentions in full-text articles, (ii) capturing nearby citation markers, (iii) resolving markers to full bibliographic entries in the reference list, and (iv) retrieving cited papers to extract additional sensor metadata that may be absent from the primary document and using it to enrich and complete the base metadata.
Results:
Across 20 primary papers, the citation extraction component achieved 74.2% precision, 92.0% recall, 82.1% F1-score, and 69.7% accuracy, and all extracted bibliographic entries were correctly matched to their source references. This component increased sensor extraction by about 261%, yielding 94 additional sensors overall.
Conclusions:
The developed citation-guided pipeline improved sensor discovery and metadata completeness, thereby supporting the development of richer, more complete sensor metadata repositories.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.