Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 5, 2025
Date Accepted: Oct 29, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Leveraging Machine Learning and Robotic Process Automation to identify and convert unstructured colonoscopy results into actionable data: a proof of concept
ABSTRACT
Background:
Effective Colorectal cancer (CRC) screening is a cornerstone of preventive healthcare. Existing electronic health record (EHR) tools to facilitate reminders for CRC screening follow-up are inadequate to address clinician needs. With rising patient volumes and a focus on quality, our health system had the objective to create a more efficient way to ensure accurate documentation of CRC screening follow-up intervals from inbound colonoscopy reports. We developed an integrated end-to-end workflow solution using machine learning (ML) and robotic process automation (RPA) to extract and update follow-up dates from unstructured data.
Objective:
To automate data extraction from external, free-text colonoscopy reports to identify and document recommended follow-up dates for CRC screening in structured fields within the EHR.
Methods:
As proof of concept, we outline the process development, validity, and implementation of an approach that integrates available tools to automate data retrieval and entry within the EHR of a large academic health system. The health system uses Epic Systems as its EHR platform. This proof-of-concept process study consisted of six stages: 1) identification of gaps in documenting recommendations for follow-up CRC screening from external colonoscopy reports; 2) defining process objectives; 3) identification of technologies; 4) creation of process architecture; 5) process validation; and 6) health system-wide implementation. Chart review was performed to validate process outcomes and estimate impact.
Results:
We developed an automated process with three primary steps that leveraged ML and RPA to create a fully orchestrated workflow to update the CRC screening recall date based on colonoscopy reports received from external sources. Process validity was assessed with 690 scanned colonoscopy reports. From the organization-wide implementation go-live date until December 31st, 2024, the system has processed 16,563 external colonoscopy reports. Of these 35.3% (5841) had a follow-up date that met the relevant threshold by the ML model and thus were identified as ready for RPA processing. This resulted in an estimated increase in documentation accuracy of 27.2%.
Conclusions:
The implementation of an automated workflow to extract and update CRC screening follow-up dates from external colonoscopy reports is feasible and has the potential to improve accuracy in patient recall based on recommendations while reducing clinician documentation burden. Expanding similar processes to other types of unstructured data could provide another mechanism to solve for a lack of data integration and improve reporting for quality measures within the EHR. Automated workflows leveraging ML and RPA offer practical solutions to overcome interoperability challenges and enhance the use of unstructured data within healthcare systems.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.