Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 16, 2025
Date Accepted: Feb 17, 2026

The final, peer-reviewed published version of this preprint can be found here:

Code-Based Versus AutoML Methods for Pill Recognition in Clinical Settings: Comparative Performance Study

Ashraf AR, Rádli R, Vörösházi Z, Fittler A

Code-Based Versus AutoML Methods for Pill Recognition in Clinical Settings: Comparative Performance Study

JMIR Med Inform 2026;14:e79160

DOI: 10.2196/79160

PMID: 41961529

PMCID: 13068000

Code-Based Versus AutoML Methods for Pill Recognition in Clinical Settings: Comparative Performance Study

  • Amir Reza Ashraf; 
  • Richárd Rádli; 
  • Zsolt Vörösházi; 
  • András Fittler

ABSTRACT

Background:

Visual identification and verification of medications during dispensing and administration are prone to human error, particularly in high-pressure and high-volume clinical settings. Misidentification can lead to medication errors, posing risks to patient safety and burdening healthcare systems. Recent advances in computer vision and object detection offer promising solutions for automated solid oral dosage form (pill) recognition. However, comprehensive studies comparing code-based and no-code (AutoML) approaches for pill recognition are lacking.

Objective:

We aimed to evaluate and compare performance, cost, usability, and deployment feasibility of pill recognition models developed with Ultralytics YOLO11 and three cloud-based AutoML platforms (Amazon Rekognition Custom Labels, Google Vertex AI AutoML Vision, and Microsoft Azure Custom Vision) using multiple datasets, including real-world clinical images.

Methods:

Five training subsets of increasing size (1230, 3450, 7380, 14,400, and 26,880 images) from 30 commonly dispensed medications were used to train models on YOLO11 and three AutoML platforms. Models were evaluated on six datasets from different environments: clinical images from three hospitals, a verification dataset, a laboratory dataset, and an exhaustive testing set. Performance metrics including accuracy, precision, recall, and mean average precision (mAP) were calculated. We evaluated the influence of training data size on performance, benchmarked training time, platform costs and limitations.

Results:

No single platform dominated across all test environments. On the verification dataset (optimal conditions), accuracy ranged from 80.83% (YOLO11) to 91.60% (Google Vertex AI) when trained with the full training dataset. YOLO11 showed consistent performance improvement on the verification dataset with increasing training data (accuracy: 63.06% to 80.83%) and achieved near-perfect precision and mAP scores (0.95-1.00). Google Vertex AI reached above 90% accuracy on three training subsets but showed unpredictable declines. Amazon Rekognition maintained near-perfect precision (0.92-1.00) but suffered the highest false negative rates (up to 0.74), missing many pills. Microsoft Azure Custom Vision showed steady performance improvements (77.08% to 85.62% accuracy), but lagged behind other AutoML platforms, probably due to its older YOLOv2-based architecture. However, when testing clinical datasets, accuracy fluctuated dramatically, ranging from 20.62% to 90.00% depending on dataset and platform. Training costs and time varied: YOLO11 (free/open-source), Microsoft Azure (US$9.50-US$28.60 allowed user-predefined training duration), Google Vertex AI (US$69.30 with consistent 2.5-3-hour training times), and Amazon Rekognition (US$5.43-US$43.89 with size-dependent training time scaling, reaching nearly 40 hours on the full 26,880 image dataset).

Conclusions:

Each platform offers distinct advantages and trade-offs: YOLO11 provides the highest flexibility and lowest platform costs but requires technical expertise, while AutoML platforms can offer high performance at higher cost but with limited user control adding unpredictability. The performance variations demonstrate that successful clinical deployment requires careful platform selection based on specific performance requirements, budget constraints, and available technical resources, followed by rigorous validation using real-world, representative data to ensure patient safety in clinical workflows.


 Citation

Please cite as:

Ashraf AR, Rádli R, Vörösházi Z, Fittler A

Code-Based Versus AutoML Methods for Pill Recognition in Clinical Settings: Comparative Performance Study

JMIR Med Inform 2026;14:e79160

DOI: 10.2196/79160

PMID: 41961529

PMCID: 13068000

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.