Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 15, 2021
Date Accepted: Mar 1, 2022
Date Submitted to PubMed: Mar 14, 2022
Pre2Pub: An algorithm for tracking the path from preprint to journal
ABSTRACT
Background:
The current Corona crisis underscores the importance of preprints, as they allow for rapid communication of research results without delay in review. To fully integrate this type of publication into library information systems, we developed preVIEW - a publicly available, central search engine for COVID-19 preprints that clearly distinguish this source from peer-reviewed publications. The relationship between the preprint version and its corresponding journal version should be stored as metadata in both versions so that duplicates can be easily identified and information overload for researchers is reduced.
Objective:
In this work, we investigate the extent to which the relationship information between preprint and corresponding journal publication is present in the published metadata, how it can be further completed, and how it can be used in preVIEW to identify already re-published preprints and filter those duplicates in search results.
Methods:
We first analyze the information content available at the preprint servers themselves and the information that can be retrieved via Crossref. Moreover, we develop the algorithm Pre2Pub to find the corresponding reviewed article for each preprint. We integrate the results of those different resources into our search engine preVIEW, present the information in the result set overview and add filter options accordingly.
Results:
Preprints have found their place in the publication workflows, however, the link from a preprint to its corresponding journal publication is not completely covered in the metadata of the preprint servers or in Crossref. Our algorithm Pre2Pub is able to find about 28% more related journal articles with 99.27% precision. We also integrate this information in a transparent way within preVIEW so that researchers can use it in their search.
Conclusions:
Relationships between preprint version and its journal version is valuable information that helps researchers finding only previously unknown information in preprints. As long as there is no transparent and complete way to store this relationship in metadata, the Pre2Pub algorithm is a suitable extension to retrieve this information.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.