JMIR Preprints #68603: After One Year, Where are Large Language Models Headed: A Thematic Analysis using Bibliometric Methodology

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

After One Year, Where are Large Language Models Headed: A Thematic Analysis using Bibliometric Methodology

Ethan Bernstein;
Anya Ramsamooj;
Kelsey Leann Millar;
Zachary C Lum

ABSTRACT

Background:

Since the release of ChatGPT and other large language models (LLMs), there has been a significant increase in academic publications exploring their capabilities and implications across various fields, such as Medicine, Education, and Technology.

Objective:

This study aims to identify the most influential academic works on LLMs published in the past year, categorize their research types and thematic focuses, within different professional fields. The study also evaluates the ability of AI tools, such as ChatGPT, to accurately classify academic research.

Methods:

We conducted a bibliometric analysis using Clarivate’s Web of Science (WOS) to extract the top 100 most cited articles on LLMs. Articles were manually categorized by field, journal, author, and research type. ChatGPT-4 was used to generate categorizations for the same articles, and its performance was compared to human classifications. Statistical analyses were performed to determine the prevalence of research fields and to evaluate the accuracy of AI-generated classifications.

Results:

Medicine emerged as the predominant field among the top-cited articles, accounting for 43%, followed by Education (26%) and Technology (15%). Medical literature primarily focused on clinical applications of LLMs, limitations of AI in healthcare, and the role of AI in medical education. In Education, research was centered around ethical concerns and potential applications of AI for teaching and learning. ChatGPT demonstrated high concordance with human reviewers, achieving an agreement rating of 86% for research types and 92% for fields of study.

Conclusions:

While LLMs like ChatGPT exhibit considerable potential in aiding research categorization, human oversight remains essential to address issues such as hallucinations, outdated information, and biases in AI-generated outputs. This study highlights the transformative potential of LLMs across multiple sectors and emphasizes the importance of continuous ethical evaluation and iterative improvement of AI systems to maximize their benefits while minimizing risks.

Citation

Please cite as:

Bernstein E, Ramsamooj A, Millar KL, Lum ZC

Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis

JMIR AI 2025;4:e68603

DOI: 10.2196/68603

PMID: 40864888

PMCID: 12384689

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Nov 10, 2024

Date Accepted: Jun 28, 2025

After One Year, Where are Large Language Models Headed: A Thematic Analysis using Bibliometric Methodology

ABSTRACT

Citation

Copyright