Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Aug 26, 2025
Date Accepted: Jan 31, 2026

The final, peer-reviewed published version of this preprint can be found here:

Susceptibility of Assessment Types to AI-Generated Content in Digital Health and Health Information Management Education: Quasi-Experimental Pilot Study

Wani TA, Liem M, Prasad N, Robinson K, Nexhip A, Tassos M, Gjorgioski S, Khan UR, Boyd J, Riley M

Susceptibility of Assessment Types to AI-Generated Content in Digital Health and Health Information Management Education: Quasi-Experimental Pilot Study

JMIR Med Educ 2026;12:e82988

DOI: 10.2196/82988

PMID: 41911020

Susceptibility of Assessment Types to AI-Generated Content: a Quasi-Experimental Pilot Study in Digital Health and Health Information Management Education

  • Tafheem Ahmad Wani; 
  • Michael Liem; 
  • Natasha Prasad; 
  • Kerin Robinson; 
  • Abbey Nexhip; 
  • Melanie Tassos; 
  • Stephanie Gjorgioski; 
  • Urooj Raza Khan; 
  • James Boyd; 
  • Merilyn Riley

ABSTRACT

Background:

Generative artificial intelligence (GenAI) tools, such as ChatGPT, are reshaping higher education and prompting urgent discussions about academic integrity. In Digital Health and Health Information Management (DIGHIM) programs, where assessment tasks often require a combination of technical proficiency, contextual reasoning, and professional judgment, the integration of GenAI presents unique opportunities and risks. These programs train graduates to work at the intersection of health, data, and technology, making it essential to understand how AI performs across the diverse assessment formats that reflect real-world professional competencies.

Objective:

The pilot study aimed to evaluate ChatGPT’s performance across diverse assessment types in DIGHIM education by examining how task complexity influences AI-generated output quality, and develop recommendations for ethical and effective AI integration in assessments.

Methods:

A pilot quasi-experimental design compared ChatGPT-generated responses with de-identified student submissions across five assessment types: digital health solution design, business case analysis, reflective assessment, SQL health database programming, and a health classification quiz. For each task, multiple AI submissions were produced using different prompting strategies, including rubric integration and the use of ChatGPT-4.0 and o1 Preview. Blinded academic markers evaluated all submissions against standard rubrics, and descriptive statistics were used to compare performance.

Results:

ChatGPT’s performance varied considerably across assessment types. It achieved its highest accuracy in objective, rule-based tasks such as multiple-choice quiz items in health classification (mean 87.5%) and produced well-structured, coherent responses for reflective assessments (mean 69.4%), though these often-lacked personalisation and nuanced industry context. In descriptive analytical tasks, such as digital health business cases and solution designs, ChatGPT produced logically structured work with reasonable use of evidence but failed to provide deep contextualisation, domain-specific insights, or visual elements expected in DIGHIM practice. Technical assessments revealed the greatest limitations: SQL programming tasks averaged 42.3%, with persistent schema errors, incomplete queries, and weak interpretation of health data outputs, while scenario-based clinical coding scored just 7.1%, reflecting a lack of precision in applying ICD-10-AM rules and coding conventions. Structured prompting and rubric integration improved results, particularly in descriptive and reflective tasks (up to 80%), but the advanced ChatGPT o1 Preview model did not consistently outperform earlier versions.

Conclusions:

While ChatGPT demonstrates strong capability in structured, rule-based, and reflective tasks, it remains limited in technical accuracy, contextual reasoning, and application to Digital Health and Health Information Management contexts. To preserve academic integrity and ensure graduates are workforce-ready, assessment designs should emphasise critical thinking, ethical reasoning, and scenario-based problem-solving that reflect real-world DIGHIM practice. Integrating AI as a tool for critique, refinement, and validation, rather than as a replacement for student work can help educators prepare students for responsible AI use in digital health and health information management professions.


 Citation

Please cite as:

Wani TA, Liem M, Prasad N, Robinson K, Nexhip A, Tassos M, Gjorgioski S, Khan UR, Boyd J, Riley M

Susceptibility of Assessment Types to AI-Generated Content in Digital Health and Health Information Management Education: Quasi-Experimental Pilot Study

JMIR Med Educ 2026;12:e82988

DOI: 10.2196/82988

PMID: 41911020

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.