Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 6, 2022
Date Accepted: Apr 11, 2022
Date Submitted to PubMed: Apr 20, 2022
Predicting COVID symptoms from free text in medical records using Artificial Intelligence: a feasibility study
ABSTRACT
Background:
Electronic Medical Records (EMRs) have opened up opportunities to analyze clinical practice at large scale. Structured registries and coding procedures such as the International Classification of Primary Care (ICPC) further improved these. However, a large part of the information about the state of patient and the doctors observations is still entered in free text fields. The main function of those fields is to report the doctors line of thought, to remind oneself and colleagues on follow-up actions and for later accountability of clinical decisions. These fields contain rich, complementary information to that in coded fields, and are today hardly being used for analysis.
Objective:
This study aimed to develop a prediction model approach to convert the free text information on COVID-related symptoms from out of hours care EMR into usable symptom-based data that can be analysed at large scale. The design was a feasibility study, in which we examined the content of the raw data, steps and methods for modelling, the precision and the accuracy of the models.
Methods:
A data prediction model for 27 pre-identified COVID-relevant symptoms was developed for a dataset derived from the database of primary-care out of hours consultations in Flanders. A multi-class multi-label categorization classifier was developed. We tested two approaches: a classical machine learning based text categorization approach Binary Relevance, and a deep neural network learning approach with BERTje, including a domain adapted version.
Results:
The normal BERTje model performed the best on the data, reaching an F1-macro score of 0.58 indication precision and recall, and an accuracy score of 0.38. As for the individual codes themselves, the domain adapted version of BERTje performs better on several of the less common objectives codes, while BERTje reaches higher F1-scores for the least common labels especially and most other codes in general.
Conclusions:
The AI model BERTje can reliably and predict COVID-related information from medical records using text mining from the free text fields generated in primary care settings. This feasibility study invites researchers to further examine further possibilities to use primary care routine data.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.