JMIR Preprints #86786: Comparative Evaluation of AI Models for Low-Cost Assistive Software Design Using Natural Language Prompts: An exploratory study.

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Comparative Evaluation of AI Models for Low-Cost Assistive Software Design Using Natural Language Prompts: An exploratory study.

Francesc Antoni Bañuls-Lapuerta;
Vicent Marti-Miralles;
Rómulo J. Gónzalez-García;
Gabriel Martínez-Rico

ABSTRACT

Background:

Artificial intelligence (AI) is redefining software creation by enabling non-technical users to program through natural-language interaction. This paradigm has major implications for the inclusive design of assistive products (APs), particularly in education and health contexts where technical expertise or budgets are limited. However, the variability in performance, accuracy, and coherence across AI models remains largely unexplored, especially regarding their capacity to autonomously generate functional assistive software using natural prompts.

Objective:

This study compared the performance of eight AI models—six free and two paid—in generating functional Python code to create a low-cost, personalized assistive software solution. The aim was to identify which models are most effective, accessible, and consistent for supporting non-technical professionals in developing inclusive digital tools.

Methods:

Eight large language models were evaluated: ChatGPT-Free (GPT-4.1 mini), ChatGPT-Pro (GPT-5), Gemini-Free, Gemini-Pro, Claude-Free, DeepSeek, and Copilot. Each was prompted, using standardized natural-language instructions without technical jargon, to design a Python program that converts an arcade gamepad into an adapted mouse-like controller. Sixteen progressively complex functions were requested through iterative prompts. No technical feedback was given—only the message “it doesn’t work, fix it”—to emulate a non-expert user scenario. Performance was measured by (1) the number of functions successfully implemented and (2) the mean number of prompts required per function. Each model was tested in isolated sessions to prevent contamination or memory bias.

Results:

Marked differences emerged between paid and free AI models. Gemini Pro (Google) implemented 14 of 16 functions with an average of 1.25 prompts, showing exceptional contextual understanding and internal coherence. ChatGPT Plus (GPT-5) achieved 11 of 16 functions with 1.31 prompts, generating functional but occasionally inconsistent code. Free models performed significantly worse, achieving between 0 and 4 functional outcomes. DeepSeek and Gemini Free performed best within the free tier, implementing up to 4 and 2 complete functions, respectively, before reaching prompt limits. Claude Free and Copilot produced no usable software. Paid models demonstrated superior stability, tolerance for natural language, and reduced conversational drift, while free alternatives suffered from interaction caps and lower model sophistication.extual contamination or memory bias.

Conclusions:

Paid AI models—particularly Gemini Pro and ChatGPT Plus—exhibited the strongest potential for bridging the gap between healthcare or educational professionals and software development. Their ability to generate coherent, functional, and customizable code from plain language makes them valuable tools for low-cost, user-centered assistive technology design. Free models may still serve exploratory or educational purposes but require extensive supervision and time. Gemini Pro stands out as the most balanced option in terms of cost-effectiveness and usability. Overall, AI can act as an accessible interface enabling non-technical professionals to prototype inclusive digital solutions; however, human oversight, sound prompt design, and basic programming literacy remain essential to ensure reliability, ethical application, and sustainability of generated tools. Future research should assess multimodal and self-prompting capabilities in AI-based software creation.

Citation

Please cite as:

Bañuls-Lapuerta FA, Marti-Miralles V, Gónzalez-García RJ, Martínez-Rico G

Using Natural Language Prompts With AI Models for Low-Cost Assistive Software Design: Exploratory Comparative Evaluation

JMIR Rehabil Assist Technol 2026;13:e86786

DOI: 10.2196/86786

PMID: 41875212

PMCID: 13012223

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Rehabilitation and Assistive Technologies

Date Submitted: Oct 30, 2025

Date Accepted: Feb 19, 2026

Comparative Evaluation of AI Models for Low-Cost Assistive Software Design Using Natural Language Prompts: An exploratory study.

ABSTRACT

Citation

Copyright