Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Hybrid Partial Genetic Algorithm Classification Model: Evolving cost-effective algorithms for frailty screening
ABSTRACT
Background:
A commonly used method for measuring frailty is the accumulation of deficits expressed as a Frailty Index (FI). FIs can be readily adapted to many databases as the parameters to use are not prescribed, but rather reflect a subset of extracted features (variables). Unfortunately, the structure of many databases does not permit direct extraction of a suitable subset, requiring additional effort to determine and verify the value of features for each record, thus significantly increasing cost. Our objective is to describe how an Artificial Intelligence (AI) optimisation technique, called partial genetic algorithms, can be used to refine the subset of features used to calculate the FI, and favour features that have the least cost of acquisition. This is a secondary analysis of a Queensland residential care database compiled from 10 facilities among 592 residents aged 75 years and over, using routinely collected administrative data and unstructured patient notes. The primary study derived an electronic Frailty Index (eFI) calculated from 36 suitable features. We then structurally modified a genetic algorithm to find an optimal predictor of the calculated eFI (0.21 threshold) from two sets of features. Partial genetic algorithms have been used to optimise three underlying classification models: Logistic Regression, Decision Trees, and Support Vector Machines. Of the three underlying models, Logistic Regression was found to produce the best models in almost all scenarios and feature set sizes. The best models built using all the low-cost features and as few as 10 high-cost features, and performed well enough (sensitivity 85%, specificity 87%) to be considered as candidates for a low-cost frailty screening test. A systematic approach for selecting an optimal, low-cost of acquisition, set of features with performance comparable to the eFI for detecting frailty has been demonstrated on an aged care database. Partial genetic algorithms have proven useful in offering a trade-off between cost and accuracy to systematically identify frailty.
Objective:
Within the context of global population ageing, the number of older people who will live a significant proportion of their lives with frailty is growing rapidly (1). Frailty is problematic for older people and the societies in which they live, due to the elevated risks associated with the syndrome in terms of both poor health outcomes (2) and additional use of health and aged care services (3–6), leading to inflated health care costs (7–9). However, emerging research suggests that frailty is a highly dynamic (10,11) and potentially modifiable state given appropriate intervention (12,13). Screening for early detection has been proposed as one means of increasing the likelihood that the worst impacts of frailty can be lessened (3). There are two main approaches to identifying frailty: the Frailty Phenotype (FP) and the Frailty Index (FI) (14). However, these established approaches have known drawbacks, requiring significant time investment, face-to-face interaction and/or specific data items to be collected (15). Recently an electronic Frailty Index (eFI) has been proposed (16), which has the potential for achieving greater efficiencies over face-to-face models when applied to administrative data sets, the need to ensure a minimum set of items adhering to pre-specified criteria remains a barrier to implementation. For example, previous research has shown that although it is possible to construct an eFI based on an aged care administrative data set, calculation of a significant proportion of the items required manual calculation to ensure accuracy and improve quality (17). Clearly, it would be preferable to identify automated techniques capable of delivering comparable accuracy and quality, but with greater efficiency. Consequently, this study aimed to apply a sophisticated genetic algorithm technique to identify an optimal predictor of the calculated eFI.
Methods:
Methods Study Design, Participants and Setting This retrospective study utilised a dataset previously compiled (18) from the administrative database of 10 residential aged care facilities located in Queensland, Australia. Participants were included in the study if they were aged 75 years or older and had completed an Aged Care Funding Instrument (ACFI) assessment within the previous 3 years. The Human Research Ethics Committee of Torrens University Australia approved the initial study due to the administrative nature of the data set and its use for quality improvement purposes within the originating organisation. As this is a secondary study of the same data, the consent extended to this work as well. Frailty Outcome Measure An eFI had previously been calculated for this data (18), based on a formulation originally specified by Clegg et al (19). Care was taken to ensure the included deficits adhered to the criteria recommended by Searle and colleagues (20), which resulted in 32 of the 35 deficits being extracted from unstructured patient notes, with only 3 being derived from the ACFI data. The binary frailty classification was derived using a threshold of 0.21 (e.g., frailty defined as > 0.21) (21). Screening Test Construction Genetic algorithms are an optimisation technique applied in machine learning to filter a set of features that are used to construct a classification model. During training, a classification algorithm is tuned on a training set and the success of attaining a generalised predictive algorithm is then verified by measuring the classification errors in test set. Genetic algorithms leverage the observation that classification models often perform better when they are trained on a subset of the available features. Which subset of features to use, however, is not obvious. Genetic algorithms start with a population of randomly generated subsets of features, or chromosomes, that are all, independently, used to generate classification models. The chromosomes from the population that generated the best performing models are allowed to combine, or breed, to form a new generation of the population, while the worst performing ones are removed completely. The process continues until either a predefined number of generations have been trained or the performance of the models have plateaued. Once training is complete, the best performing model is deployed, using only the naturally selected subset of the available features. While genetic algorithms are good at selecting an optimal subset of features, they select the features based on maximising the classification accuracy of a generated model. The cost of acquiring the various features is not factored into the choice of features, even if the performance of less expensive features is close to that of their more expensive counterparts. In this study the cost of a feature is the combination of the effort, the monetary cost, and the patient risk involved in capturing the values. We want to minimise the number of expensive features chosen to form the model but allow as many low-cost features to be used as is necessary to gain acceptable performance of the model. Figure 1: Genetic Algorithm configuration for training a single member of the population To achieve the inclusion of low-cost features in the classification model the standard genetic algorithm training configuration illustrated in Figure 1 is modified as illustrated in Figure 2. Figure 2: Partial Genetic Algorithm configuration for training a single member of the population This modification is performed every time a model is trained for every member of the population trialled by the genetic algorithm. When the genetic algorithm trains a model, it passes a subset of the available training records to the classification model’s training algorithm. The low-cost feature values for each record need to be added to the selected training records before the training is commenced. The genetic algorithm trains the classification model for each chromosome multiple times, using different subsets of the training records, and determining the performance of each model using records not used in training that instance. As with the training records, the low-cost features need to be added into the records used for determining a model’s performance. The performance of the chromosome is determined as the average performance of all the models built from different subsets of the training records. This process is called n-fold cross validation, where n is the number of models built. In this study, we used 3-fold cross validation. Three types of classification models have been optimised using partial genetic algorithms: Logistic Regression, Support Vector Machines, and Decision Trees. These algorithms are popular choices for classification as they have proven successful in generating generalised models for a wide range of applications. Logistic regression is a statistical modelling technique where a linear combination of the input features is found during training that models the logarithm of the odds that a binary outcome is in the true state. Support Vector Machines (SVM) aim to learn a multi-dimensional hyperplane that separates the set of records given to it for training. Predictions are made by placing the candidate record into the same multidimensional classification space and determining which side of the hyperplane it maps to. SVM was developed in the 1990s and has since enjoyed success in many real-world applications including pattern recognition(22), text classification(23), and bioinformatics. Decision Trees employ a divide and conquer strategy. A tree is formed of nodes, where each node performs a comparison of a single input feature and a threshold if the variable is continuous or a state if the feature is discrete. The outcome of the comparison determines the choice of the next node, which either performs a new comparison or terminates the tree with a given classification. During training the set of training records are used to find the comparisons at each node that gain the most information by reducing the entropy in the outcomes by the greatest amount. Following training predictions are made by feeding records into the root node and determining the classification of the terminating node where the record exits the tree.
Results:
Model Generation and Results Of the 69 features considered, 34 were extracted directly from the ACFI assessment and 35 were the values used to calculate the eFI. Two of the ACFI features (PAS Score and Cornell Scale) were excluded as they had a high percentage of missing values (PAS Score 36%, Cornell Scale 42%). The remaining 32 ACFI assessment features had no missing values and were categorised as low cost of acquisition features. Of the 35 features used to calculate the eFI, 32 were extracted by an automated search for key words in the unstructured patient notes followed by manual inspection and verification by a clinician. These were categorised as having a high cost of acquisition. The remaining 3 features used to calculate the eFI were direct combinations of ACFI features. As the calculation of these features could be fully automated, they were included with the low-cost features. Four sets of low-cost features were considered: • ACFI features + the low-cost eFI features • The low-cost eFI features • No low-cost features • A set of features chosen from the low-cost features using genetic algorithms. A different set was found for each of the classification algorithms. Twelve scenarios were trialled, each of the above 4 sets of low-cost features for each of the three classification algorithms. For each scenario the partial genetic algorithm was used to optimise the classification algorithm with different limits placed on the number of high-cost features. The limits were varied sequentially from 1 to the number of candidate high-cost features, 32. The performance of each of the 32 algorithms generated for each scenario were plotted on a single graph. The graphs for each scenario are plotted in Figures 1 to 3. Figure 3: Logistic Regression optimised with Partial Genetic Algorithm Figure 4: Support Vector Machine optimised with Partial Genetic Algorithm Figure 5: Decision Tree optimised with Partial Genetic Algorithm Comparing the graphs for each classification model, Logistic Regression outperforms Decision Trees in every scenario and SVM in almost all scenarios. Table I, Table II, and Table III demonstrate the numeric comparison of the 12 scenarios when 5, 10, and 15 of the high cost of acquisition features are used. 5 High-Cost Features and Sensitivity Specificity PPA NPA Accuracy ACFI + Low Cost EFI Logistic Regression 77.3 76.7 73.0 80.6 77.0 Support Vector Machine 77.3 71.7 71.7 77.3 74.8 Decision Tree 64.0 51.7 53.5 62.3 58.5 Low Cost EFI Logistic Regression 73.3 70.0 67.7 75.3 71.9 Support Vector Machine 74.7 66.7 67.8 73.7 71.1 Decision Tree 76.0 53.3 64.0 67.1 65.9 No Low-Cost Features Logistic Regression 65.3 75.0 63.4 76.6 69.6 Support Vector Machine 64.0 75.0 62.5 76.2 68.9 Decision Tree 76.0 56.7 65.4 68.7 67.4 Genetically Selected Low-Cost Features Logistic Regression 80.0 60.0 70.6 71.4 71.1 Support Vector Machine 77.3 70.0 71.2 76.3 74.1 Decision Tree 76.0 56.7 65.4 68.7 67.4 Table I: Performance of the 12 scenarios with 5 high-cost features 10 High-Cost Features and Sensitivity Specificity PPA NPA Accuracy ACFI + Low Cost EFI Logistic Regression 85.3 86.7 82.5 88.9 85.9 Support Vector Machine 85.3 80.0 81.4 84.2 83.0 Decision Tree 64.0 65.0 59.1 69.6 64.4 Low Cost EFI Logistic Regression 78.7 86.7 76.5 88.1 82.2 Support Vector Machine 77.3 80.0 73.9 82.9 78.5 Decision Tree 82.7 68.3 75.9 76.6 76.3 No Low-Cost Features Logistic Regression 81.3 73.3 75.9 79.2 77.8 Support Vector Machine 74.7 66.7 67.8 73.7 71.1 Decision Tree 84.0 58.3 74.5 71.6 72.6 Genetically Selected Low-Cost Features Logistic Regression 78.7 71.7 72.9 77.6 75.6 Support Vector Machine 82.7 70.0 76.4 77.5 77.0 Decision Tree 72.0 60.0 63.2 69.2 66.7 Table II: Performance of the 12 scenarios with 10 high-cost features 15 High-Cost Features and Sensitivity Specificity PPA NPA Accuracy ACFI + Low Cost EFI Logistic Regression 85.3 88.3 82.8 90.1 86.7 Support Vector Machine 82.7 81.7 79.0 84.9 82.2 Decision Tree 72.0 58.3 62.5 68.4 65.9 Low Cost EFI Logistic Regression 85.3 81.7 81.7 85.3 83.7 Support Vector Machine 86.7 86.7 83.9 89.0 86.7 Decision Tree 84.0 61.7 75.5 73.3 74.1 No Low-Cost Features Logistic Regression 77.3 85.0 75.0 86.6 80.7 Support Vector Machine 77.3 85.0 75.0 86.6 80.7 Decision Tree 68.0 61.7 60.7 68.9 65.2 Genetically Selected Low-Cost Features Logistic Regression 86.7 80.0 82.8 84.4 83.7 Support Vector Machine 70.7 86.7 70.3 86.9 77.8 Decision Tree 73.3 53.3 61.5 66.3 64.4 Table III: Performance of the 12 scenarios with 15 high-cost features The option of ‘No Low-Cost’ features was provided to determine how much predictive value the low-cost features were adding to the classification. As expected, this option performed the worst for all the classification algorithms, confirming that the low-cost features are adding value. Next, models were built using only the three low-cost EFi features as fixed features. This improved the accuracy of the logistic regression algorithm to 97% when almost all the EFi features were included (see Table IV). Although this is a good outcome, a model built using so many of the high-cost features was not the goal of this study. Algorithm Sensitivity Specificity PPA NPA Accuracy LR 97.3 96.7 96.7 97.3 97.0 SVM 86.7 95.0 85.1 95.6 90.4 DT 72.0 68.3 66.1 74.0 70.4 Table IV: Performance of models based on all features Genetic algorithms work by selecting an optimal subset of all the features made available to it. This characteristic was the motivation behind building a version of the models in two stages. In the first stage a standard, non-partial, genetic algorithm was used on the low-cost features to find an optimal combination. These models performed so poorly (see Table V) that they cannot be used without further improvement. The combination of features used to generate these models (see Supplement 3: Low-Cost Features Selected for Models built with GA Selected Subset) was then employed as the fixed features in the partial genetic algorithm during the second stage. The performance of the second stage models was surprisingly bad, showing no difference to the models built without any low-cost features, regardless of the classification model used. Algorithm Sensitivity Specificity PPA NPA Accuracy LR 77.3 58.3 67.3 69.9 68.9 SVM 77.3 58.3 67.3 69.9 68.9 DT 61.3 65.0 57.4 68.7 63.0 Table V: Performance of models based only on low-cost features Using all the low-cost features in a partial genetic algorithm gave the best overall results. As expected, it matched the 97% accuracy achieved by the models that used the low-cost EFi features when the model could select most of the high-cost EFi features. At 10 features, however, the extra low-cost features allowed the algorithm to increase its sensitivity from 78.7% to 85.3% without compromising the specificity which remained at 86.7%.
Conclusions:
The value of screening tests is in their cost-effective application. The main cost in applying a model-based screening test is the cost of acquiring the measures fed into the model. To derive useful screening tests using AI techniques, algorithms must be employed that favour the use of cheaper features over those that require more effort, or patient risk, to acquire. What all aged care providers and their clinical advisers need is a screening tool that will allow the efficient planning of evidence-based interventions to those older frail people who will best benefit. At a time where the aged care sector and all providers are being asked by Governments and national quality agencies to focus on this vulnerable group, it is crucial that we employ an efficient screening tool. This paper has shown how partial genetic algorithms can be used to determine an optimal subset of high-cost features to use with cheap features to derive AI models to classify frailty, both in terms of which parameters and how many to use. This technique can be applied to any database. It does not guarantee that an adequate model will be found from any database, but it does give a very good indication of whether there is sufficient information in the data to derive a model. Partial genetic algorithms were demonstrated in this paper to derive a cost-effective screening test for frailty, but the method can be applied to any screening tests where there is a disparity in the cost of measuring the required features. Clinical Trial: NA
Citation
The author of this paper has made a PDF available, but requires the user to login, or create an account.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.