Accepted for/Published in: JMIR Rehabilitation and Assistive Technologies
Date Submitted: Oct 30, 2025
Date Accepted: Feb 19, 2026
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Comparative Evaluation of AI Models for Low-Cost Assistive Software Design Using Natural Language Prompts: An exploratory study.
ABSTRACT
Background:
Artificial intelligence (AI) is redefining software creation by enabling non-technical users to program through natural-language interaction. This paradigm has major implications for the inclusive design of assistive products (APs), particularly in education and health contexts where technical expertise or budgets are limited. However, the variability in performance, accuracy, and coherence across AI models remains largely unexplored, especially regarding their capacity to autonomously generate functional assistive software using natural prompts.
Objective:
This study compared the performance of eight AI models—six free and two paid—in generating functional Python code to create a low-cost, personalized assistive software solution. The aim was to identify which models are most effective, accessible, and consistent for supporting non-technical professionals in developing inclusive digital tools.
Methods:
Eight large language models were evaluated: ChatGPT-Free (GPT-4.1 mini), ChatGPT-Pro (GPT-5), Gemini-Free, Gemini-Pro, Claude-Free, DeepSeek, and Copilot. Each was prompted, using standardized natural-language instructions without technical jargon, to design a Python program that converts an arcade gamepad into an adapted mouse-like controller. Sixteen progressively complex functions were requested through iterative prompts. No technical feedback was given—only the message “it doesn’t work, fix it”—to emulate a non-expert user scenario. Performance was measured by (1) the number of functions successfully implemented and (2) the mean number of prompts required per function. Each model was tested in isolated sessions to prevent contamination or memory bias.
Results:
Marked differences emerged between paid and free AI models. Gemini Pro (Google) implemented 14 of 16 functions with an average of 1.25 prompts, showing exceptional contextual understanding and internal coherence. ChatGPT Plus (GPT-5) achieved 11 of 16 functions with 1.31 prompts, generating functional but occasionally inconsistent code. Free models performed significantly worse, achieving between 0 and 4 functional outcomes. DeepSeek and Gemini Free performed best within the free tier, implementing up to 4 and 2 complete functions, respectively, before reaching prompt limits. Claude Free and Copilot produced no usable software. Paid models demonstrated superior stability, tolerance for natural language, and reduced conversational drift, while free alternatives suffered from interaction caps and lower model sophistication.extual contamination or memory bias.
Conclusions:
Paid AI models—particularly Gemini Pro and ChatGPT Plus—exhibited the strongest potential for bridging the gap between healthcare or educational professionals and software development. Their ability to generate coherent, functional, and customizable code from plain language makes them valuable tools for low-cost, user-centered assistive technology design. Free models may still serve exploratory or educational purposes but require extensive supervision and time. Gemini Pro stands out as the most balanced option in terms of cost-effectiveness and usability. Overall, AI can act as an accessible interface enabling non-technical professionals to prototype inclusive digital solutions; however, human oversight, sound prompt design, and basic programming literacy remain essential to ensure reliability, ethical application, and sustainability of generated tools. Future research should assess multimodal and self-prompting capabilities in AI-based software creation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.