JMIR Preprints #86365: Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data

Tomohiro Kikuchi;
Yosuke Yamagishi;
Kohei Yamamoto;
Toshiaki Akashi;
Harushi Mori;
Harushi Mori;
Hisaki Makimoto;
Takahide Kohro;
Takahide Kohro

ABSTRACT

Background:

Vision-language models (VLMs) for radiology require large-scale image–text pairs. However, free-text reports mix background information, findings, and continuation sentences. Manual annotation is labor-intensive, and the direct use of clinical reports raises privacy concerns.

Objective:

We aimed to develop a context-aware sentence classification model for Japanese radiology reports using synthetic and automatically annotated data and validate it using multi-institutional clinical reports.

Methods:

Synthetic Japanese radiology reports were generated using OpenAI API (GPT-4.1); sentence-level annotations were performed using GPT-4.1-mini in four categories: context, positive findings, negative findings, and continuation. After filtering, 3,104 reports were divided into training (2,670), validation (334), and testing (100) sets. For external validation, 280 reports dated October 1, 2024, were sampled from seven institutions in the Japan Medical Image Database and annotated by two radiologists. Large language models (Qwen3 and LLaMA 3.2) and Japanese text classification models (BERT base Japanese and ModernBERT-Ja-130M, JMedRoBERTa) were fine-tuned and evaluated for accuracy, macro–F1, and positive predictive value for label 1 (PPV_1).

Results:

For the internal test set (1,124 sentences), all models performed well: accuracy, 0.939–0.950; macroF1, 0.924–0.940; and PPV_1, 0.904–0.953. For the external dataset (3,477 sentences), the accuracy declined to 0.783–0.812 and macro–F1 to 0.761–0.790. Qwen3-4B showed the best performance (PPV_1 = 0.952).

Conclusions:

The model trained solely on synthetic reports showed robust performance in real-world Japanese radiology reports. This approach enables the efficient extraction of finding-level sentences and supports the large-scale construction of image–text pairs for Japanese VLM development.

Citation

Please cite as:

Kikuchi T, Yamagishi Y, Yamamoto K, Akashi T, Mori H, Mori H, Makimoto H, Kohro T, Kohro T

Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data: Development and Validation Study

J Med Internet Res 2026;28:e86365

DOI: 10.2196/86365

PMID: 41764068

PMCID: 13068187

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 23, 2025

Date Accepted: Feb 28, 2026

Date Submitted to PubMed: Mar 1, 2026

Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data

ABSTRACT

Citation

Copyright