Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently accepted at: Journal of Medical Internet Research

Date Submitted: Oct 23, 2025
Date Accepted: Feb 28, 2026
Date Submitted to PubMed: Mar 1, 2026

This paper has been accepted and is currently in production.

It will appear shortly on 10.2196/86365

The final accepted version (not copyedited yet) is in this tab.

An "ahead-of-print" version has been submitted to Pubmed, see PMID: 41764068

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data

  • Tomohiro Kikuchi; 
  • Yosuke Yamagishi; 
  • Kohei Yamamoto; 
  • Toshiaki Akashi; 
  • Harushi Mori; 
  • Harushi Mori; 
  • Hisaki Makimoto; 
  • Takahide Kohro; 
  • Takahide Kohro

ABSTRACT

Background:

Vision-language models (VLMs) for radiology require large-scale image–text pairs. However, free-text reports mix background information, findings, and continuation sentences. Manual annotation is labor-intensive, and the direct use of clinical reports raises privacy concerns.

Objective:

We aimed to develop a context-aware sentence classification model for Japanese radiology reports using synthetic and automatically annotated data and validate it using multi-institutional clinical reports.

Methods:

Synthetic Japanese radiology reports were generated using OpenAI API (GPT-4.1); sentence-level annotations were performed using GPT-4.1-mini in four categories: context, positive findings, negative findings, and continuation. After filtering, 3,104 reports were divided into training (2,670), validation (334), and testing (100) sets. For external validation, 280 reports dated October 1, 2024, were sampled from seven institutions in the Japan Medical Image Database and annotated by two radiologists. Large language models (Qwen3 and LLaMA 3.2) and Japanese text classification models (BERT base Japanese and ModernBERT-Ja-130M, JMedRoBERTa) were fine-tuned and evaluated for accuracy, macro–F1, and positive predictive value for label 1 (PPV_1).

Results:

For the internal test set (1,124 sentences), all models performed well: accuracy, 0.939–0.950; macroF1, 0.924–0.940; and PPV_1, 0.904–0.953. For the external dataset (3,477 sentences), the accuracy declined to 0.783–0.812 and macro–F1 to 0.761–0.790. Qwen3-4B showed the best performance (PPV_1 = 0.952).

Conclusions:

The model trained solely on synthetic reports showed robust performance in real-world Japanese radiology reports. This approach enables the efficient extraction of finding-level sentences and supports the large-scale construction of image–text pairs for Japanese VLM development.


 Citation

Please cite as:

Kikuchi T, Yamagishi Y, Yamamoto K, Akashi T, Mori H, Mori H, Makimoto H, Kohro T, Kohro T

Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data

JMIR Preprints. 23/10/2025:86365

DOI: 10.2196/preprints.86365

URL: https://preprints.jmir.org/preprint/86365

PMID: 41764068

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.