Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 7, 2024
Date Accepted: Mar 16, 2025
Enhancing BERT with Frame Semantics to Extract Clinically Relevant Information from German Mammography Reports: Algorithm Development and Validation
ABSTRACT
Background:
Structured reporting is essential for improving the clarity and accuracy of radiological information. Despite its benefits, the European Society of Radiology (ESR) notes that it is not widely adopted. This raises the need for automatic methods to extract relevant information from unstructured radiology reports and thereby create structured reports automatically.
Objective:
This study explores to combine a Bidirectional Encoder Representations from Transformers (BERT) architecture with the linguistic concept of frame semantics to extract and normalize information from free-text mammography reports.
Methods:
After creating an annotated corpus of 210 German reports for fine-tuning, we generate several BERT-model variants by applying three pre-training strategies on hospital data. Afterwards, a fact extraction pipeline is built, comprising an extractive question answering model and a sequence labelling model. We evaluate all model variants quantitatively using common evaluation metrics (model perplexity, squad_v2, seqeval) and perform qualitative evaluation of the whole pipeline by clinicians on a manually created synthetic dataset of 21 reports.
Results:
Our system is capable of extracting 14 fact types and 40 entities from the clinical findings section of mammography reports. Further pre-training on hospital data reduced model perplexity, although not having significant impact on the two downstream-tasks. We achieved averaged F1 scores of >90 % and >80 % for question answering and sequence labelling, respectively. Qualitative evaluation of the pipeline based on synthetic data shows overall precision of 96.1 % and 99.6 % for facts and entities, respectively.
Conclusions:
The proposed BERT-based framework incorporating frame semantics effectively extracts structured information from unstructured radiology reports. This system shows promise for advancing automated structured reporting in radiology, supporting improved clarity and usability of radiological data.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.