Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: May 18, 2025
Date Accepted: Oct 23, 2025

The final, peer-reviewed published version of this preprint can be found here:

Information Extraction of Doctoral Theses Using Two Different Large Language Models vs Health Services Researchers: Development and Usability Study

Cittadino J, Traulsen P, Schmahl T, Wewetzer L, Cummerow J, Flägel K, Götz K, Steinhäuser J

Information Extraction of Doctoral Theses Using Two Different Large Language Models vs Health Services Researchers: Development and Usability Study

JMIR Form Res 2025;9:e77707

DOI: 10.2196/77707

PMID: 41370789

PMCID: 12694942

Information extraction of doctoral theses using two different LLMs versus health services re-searcher: a feasibility study

  • Jonas Cittadino; 
  • Pia Traulsen; 
  • Teresa Schmahl; 
  • Larisa Wewetzer; 
  • Julia Cummerow; 
  • Kristina Flägel; 
  • Katja Götz; 
  • Jost Steinhäuser

ABSTRACT

Background:

The “Archive of German language general practice” (ADAM) stores about 800 paper based doctoral theses from 1965 till today. While they have been grouped in different categories no deeper systematic process of information extraction (IE) has been performed yet. Recently developed Large Language Models (LLMs) like ChatGPT have been attributed the potential to help in IE of medical documents. However, there are concerns about hallucination of LLM. Furthermore, there have not been reports regarding their usage in non-recent doctoral theses yet.

Objective:

To analyze if LLMs can help to extract information from doctor thesis by using GPT-4o and gemini-1.5-flash for paper based doctoral theses in ADAM.

Methods:

We randomly selected ten doctoral theses from 1965 and 2022. After preprocessing we used two different LLM pipelines, using models of OpenAI and Google, respectively. Pipelines were used to extract dissertation characteristics and generate uniform abstracts. Furthermore, one pooled human-generated abstract was written. Blinded raters were then asked to evaluate LLM-generated abstracts in comparison to human-generated ones.

Results:

Relevant dissertation characteristics and keywords could be extracted for all theses (n=10): name and place of the institute, title of theses, name(s) of author and year of publication. For all except one doctoral thesis an abstract could be generated using GPT-4o, while gemini-1.5-flash provided abstracts in all cases (n=10). Translation from German into English did not re-sult in a loss of information in all cases (n=10). The modality of abstract generation showed no influence in raters evaluation using ANOVA (P=.23). Creation of LLM-generated abstracts was estimated 24 to 36 times faster than creation by humans.

Conclusions:

An accumulating body of not published doctoral theses makes it difficult to extract relevant information. Recently, great hopes have been placed in LLMs like ChatGPT, which to this day have not yet been used in IE of “historic” medical documents. This feasibility study shows that both models, GPT-4o and gemini-1.5-flash helped to accurately simplify and condense doctor-al theses into relevant information while LLM-generated abstracts being perceived similar and taking about thirty times less time in comparison to human-generated ones. Thus, LLMs can be used to extract relevant information and produce accurate abstracts from doctoral theses in ADAM when used cautiously. Taken together this information could help to better search the scientific background from family medicine from the last 60 years, helping other researchers and thus strengthen research in family medicine.


 Citation

Please cite as:

Cittadino J, Traulsen P, Schmahl T, Wewetzer L, Cummerow J, Flägel K, Götz K, Steinhäuser J

Information Extraction of Doctoral Theses Using Two Different Large Language Models vs Health Services Researchers: Development and Usability Study

JMIR Form Res 2025;9:e77707

DOI: 10.2196/77707

PMID: 41370789

PMCID: 12694942

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.