Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jun 10, 2026
Open Peer Review Period: Jun 11, 2026 - Aug 6, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Toward Citation-Guided Sensor Metadata Enrichment: A Large Language Model-based Approach

  • Fatemeh Shah-Mohammadi; 
  • Sunho Im; 
  • Urvi Varma; 
  • Julio Facelli; 
  • Mollie Cummins; 
  • Ramkiran Gouripeddi

ABSTRACT

Background:

Sensor metadata is critical for exposure health research because it supports accurate sensor identification, deployments, data integration, interoperability, and reproducibility. Yet it is often fragmented across multiple heterogeneous sources, such as scientific literature and manufacturer guides, where key specifications are frequently reported indirectly through citation chains, making reference tracing essential for metadata enrichment and completeness.

Objective:

To address this bottleneck, we developed and evaluated an LLM-based automated, citation-aware pipeline that enriches sensor metadata extracted from a primary article by identifying sensor-related citation markers and extracting additional metadata from the referenced sources.

Methods:

We extend our prior LLM-based metadata extraction approach by (i) detecting sensor mentions in full-text articles, (ii) capturing nearby citation markers, (iii) resolving markers to full bibliographic entries in the reference list, and (iv) retrieving cited papers to extract additional sensor metadata that may be absent from the primary document and using it to enrich and complete the base metadata.

Results:

Across 20 primary papers, the citation extraction component achieved 74.2% precision, 92.0% recall, 82.1% F1-score, and 69.7% accuracy, and all extracted bibliographic entries were correctly matched to their source references. This component increased sensor extraction by about 261%, yielding 94 additional sensors overall.

Conclusions:

The developed citation-guided pipeline improved sensor discovery and metadata completeness, thereby supporting the development of richer, more complete sensor metadata repositories.


 Citation

Please cite as:

Shah-Mohammadi F, Im S, Varma U, Facelli J, Cummins M, Gouripeddi R

Toward Citation-Guided Sensor Metadata Enrichment: A Large Language Model-based Approach

JMIR Preprints. 10/06/2026:104322

DOI: 10.2196/preprints.104322

URL: https://preprints.jmir.org/preprint/104322

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.