Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Oct 20, 2025
Open Peer Review Period: Oct 20, 2025 - Dec 15, 2025
(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).

Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.

Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).

Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.

Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.

Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Extracting and Classifying Drug Discontinuations from Estonian Electronic Health Records: Development and Validation Study

  • Hendrik Šuvalov; 
  • Nikita Umov; 
  • Markus Haug; 
  • Sven Laur; 
  • Marek Oja; 
  • Sirli Tamm; 
  • Sulev Reisberg; 
  • Jaak Vilo; 
  • Raivo Kolde

ABSTRACT

Background:

Drug adherence is crucial for chronic disease management, yet treatment discontinuation remains common due to factors such as side effects, inefficacy, or cost. These reasons are often recorded only in free-text clinical notes, making large-scale analysis difficult. While large language models (LLMs) can interpret such unstructured data more effectively than traditional natural language processing methods, few studies have systematically categorized reasons for discontinuation or identified whether the decision was initiated by the patient or the clinician, especially in low-resource languages like Estonian.

Objective:

To assess the ability of LLMs to extract and classify reasons for drug discontinuation and identify who initiated it using Estonian electronic health records, and to characterize the observed discontinuation patterns and initiators for statins and antidiabetic medications.

Methods:

We combined prescription data with free-text anamneses from a 10% sample of the Estonian population (2012–2019). LLMs (Llama-3.1-70B and GPT-4o) were applied to extract discontinuation phrases and reasons, classify them into a clinician-developed taxonomy, and identify who discontinued the treatment. Performance was evaluated on randomly chosen 100 cases per drug group.

Results:

Extraction yielded 625 antidiabetic drug and 233 statin discontinuation cases. Validation confirmed high accuracy, with 93–98% of extracted phrases and 95–96% of extracted reasons judged correct. Classification of discontinuation reasons achieved weighted F1 scores of 0.808–0.836, while classification of who initiated discontinuation achieved weighted F1 scores of 0.645–0.774. Adverse reactions were the most frequent reason overall, accounting for ~70% of discontinuations for statins and ~44% for antidiabetic drugs. For antidiabetic drugs, treatment inefficacy and contraindications were more common. Patients more often stopped due to adverse reactions or non-medical reasons, while physicians more often initiated discontinuation for contraindications.

Conclusions:

LLMs can accurately extract and classify medication discontinuation reasons and initiators from Estonian clinical narratives. Both local and proprietary models performed well, enabling scalable analyses that complement structured health records. This demonstrates the potential of LLMs to unlock information from clinical notes, turning this underutilized EHR component into a valuable resource for monitoring treatment patterns and detecting adverse event signals.


 Citation

Please cite as:

Šuvalov H, Umov N, Haug M, Laur S, Oja M, Tamm S, Reisberg S, Vilo J, Kolde R

Extracting and Classifying Drug Discontinuations from Estonian Electronic Health Records: Development and Validation Study

JMIR Preprints. 20/10/2025:86183

DOI: 10.2196/preprints.86183

URL: https://preprints.jmir.org/preprint/86183

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.