Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Research Protocols

Date Submitted: Mar 19, 2024
Date Accepted: Nov 27, 2024

The final, peer-reviewed published version of this preprint can be found here:

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

O'Connor K, Weissenbacher D, Elyanderani A, Lautenbach E, Scotch M, Gonzalez-Hernandez G

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

JMIR Res Protoc 2025;14:e58567

DOI: 10.2196/58567

PMID: 40262134

PMCID: 12056431

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

  • Karen O'Connor; 
  • Davy Weissenbacher; 
  • Amir Elyanderani; 
  • Ebbing Lautenbach; 
  • Matthew Scotch; 
  • Graciela Gonzalez-Hernandez

ABSTRACT

Background:

There has been an unprecedented effort to sequence the SARS-CoV-2 virus and examine its molecular evolution. This has been facilitated by the availability of publicly accessible databases, the Global Initiative on Sharing All Influenza Data (GISAID) and GenBank, which collectively hold millions of SARS-CoV-2 sequence records. Genomic epidemiology, however, seeks to go beyond phylogenetic analysis by linking genetic information to patient characteristics and disease outcomes, enabling a comprehensive understanding of transmission dynamics and disease impact. While these repositories include fields reflecting patient-related metadata for a given sequence, inclusion of these demographic and clinical details is scarce. The extent to which patient-related metadata is reported in published sequencing studies and its quality remains largely unexplored.

Objective:

In our review, we aim to quantitatively assess the extent and quality of patient-reported metadata, including demographic, clinical, and geographic information, in articles reporting original whole genome sequencing of the SARS-CoV-2 virus. We will perform a comprehensive bibliometric analysis to ascertain differences and discernible patterns between articles that include patient metadata and those that do not. Finally, we will evaluate the efficacy and reliability of a machine learning classifier in accurately identifying relevant articles for inclusion in the scoping review, enhancing the efficiency and effectiveness of the study selection process.

Methods:

The NIH's LitCovid collection will be used for automated classification of articles reporting having deposited SARS-CoV-2 sequences in public repositories, while an independent search will be conducted in PubMed Central for validation. Data extraction will be conducted using Covidence. The extracted data will be synthesized and summarized to quantify the availability of patient metadata in the published literature of SARS-CoV-2 sequencing studies. For the bibliometric analysis, relevant data points, such as author affiliations and citation metrics will be extracted.

Results:

We will summarize and narratively describe our findings, using tables, graphs, and charts when applicable regarding the number of sequences covered in our included studies, the distribution of the sequences in the respective repositories, and the quantity and type of reported patient metadata in the studies.

Conclusions:

This scoping review will report findings on the extent and types of patient-related metadata reported in genomic viral sequencing studies of SARS-CoV-2, identify gaps in the reporting of patient metadata, and make recommendations for improving the quality and consistency of reporting in this area. The bibliometric analysis will uncover trends and patterns in the reporting of patient-related metadata, including differences in reporting based on study types or geographic regions. Co-occurrence networks of author keywords will also be presented. The insights gained from this study may help improve the quality and consistency of reporting patient metadata, enhancing the utility of sequence metadata and facilitating future research on infectious diseases.


 Citation

Please cite as:

O'Connor K, Weissenbacher D, Elyanderani A, Lautenbach E, Scotch M, Gonzalez-Hernandez G

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

JMIR Res Protoc 2025;14:e58567

DOI: 10.2196/58567

PMID: 40262134

PMCID: 12056431

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.