JMIR Preprints #95075: Standardized Performance Assessment Methodology and End-to-End Framework for Tactical Combat Casualty Care Autonomous Documentation Algorithms

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Standardized Performance Assessment Methodology and End-to-End Framework for Tactical Combat Casualty Care Autonomous Documentation Algorithms

Jeanette Little, MS.;
Jaeyeon Lee;
Ethan Quist;
Omar Badawi;
Nathan Fisher

ABSTRACT

Background:

The Military Healthcare System (MHS) mandates medical documentation at all echelons of care, however care providers in high-intensity combat situations must prioritize lifesaving measures over record-keeping, leading to information gaps across the care continuum. Effective human machine teaming (HMT) solutions designed to autonomously document care delivery will serve as future force multipliers in tactical combat casualty care (TCCC) environments.

Objective:

To address this challenge, the United States Army Institute of Surgical Research (USAISR) commenced an effort to prototype HMT systems designed to passively document care delivery within the TCCC environment. However, common artificial intelligence (AI) performance evaluation methods do not adequately represent the temporal, repetitive and context dependent nature of real-world TCCC delivery. Therefore, it is essential to conduct comprehensive assessments to ascertain that the AI tools function in a timely, synchronized manner within operational workflows. During the initial prototyping phase, five algorithm developers were provided with annotated datasets from seventy-five TCCC simulations and given six months to develop their algorithms.

Methods:

To assess the algorithms the research team leveraged a reserved dataset to perform evaluations. In the first phase of the assessment a standardized, repeatable performance methodology and framework was leveraged to evaluate individual algorithms that detect: (1) injury location on a casualty; (2) medical objects visible in the scene, and (3) treatments administered by the care provider. Detection effectiveness included four metrics: modified accuracy; precision; recall; and F1 scores. Algorithm processing efficiency was also evaluated by calculating lag time scores. A final composite score was used to quantify performance differences among the algorithms within a specific detection category. The second phase of the evaluation integrated multiple algorithms into a centralized orchestration framework to enable synchronized execution and consolidated outputs. System-level resource usage and throughput metrics were evaluated to characterize the computational efficiencies. Quantified memory consumption and central processing unit (CPU) and graphics processing unit (GPU) resource utilization were assessed, followed by benchmarking the edge compute orchestration framework utilization.

Results:

Results are presented for a representative algorithm in each category. Medical Object Detection achieved the highest performance (mean F1≈0.42, range 0–0.71). Injury Detection and Localization showed lower performance (mean F1≈0.27, range 0–0.60), with higher recall than precision. Medical Procedure Detection yielded procedure-level mean F1 scores from 0.00 to 0.31 and simulation-level means from 0.00 to 0.33. Stronger results were observed for Nasopharyngeal Airway (NPA) and Chest Seal Application (mean F1≈0.28, 0.31) medical procedures. The results to date are preliminary and serve as illustrative examples of the evaluation framework outputs.

Conclusions:

The preliminary results highlight the evaluation framework’s end-to-end, standardized results across core algorithm functions. While the algorithm performance to date is modest, the framework demonstrates its capacity to capture both variability and recurring patterns across simulations, thereby highlighting strengths, limitations, and areas requiring refinement. It enables reproducible, cross data set comparisons, allowing evaluators to quantify algorithm performance. By leveraging both simulation-level evaluations and detection specific performance aggregated across simulations, the framework enables targeted identification of underperforming areas, supporting iterative and strategic AI model refinement.

Citation

Please cite as:

Little, MS. J, Lee J, Quist E, Badawi O, Fisher N

Standardized Performance Assessment Methodology and End-to-End Framework for Tactical Combat Casualty Care Autonomous Documentation Algorithms

JMIR Preprints. 25/03/2026:95075

DOI: 10.2196/preprints.95075

URL: https://preprints.jmir.org/preprint/95075

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Mar 25, 2026

Open Peer Review Period: Mar 25, 2026 - May 20, 2026

(currently open for review)

Standardized Performance Assessment Methodology and End-to-End Framework for Tactical Combat Casualty Care Autonomous Documentation Algorithms

ABSTRACT

Citation

Copyright