JMIR Preprints #13995: Novel Machine Learning Method for Prediction Using Times Series Data: Initial Application to Prediction of On Road Exam Outcomes from Virtual Driving Test Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Novel Machine Learning Method for Prediction Using Times Series Data: Initial Application to Prediction of On Road Exam Outcomes from Virtual Driving Test Data

David Grethlein;
Flaura Koplin Winston;
Elizabeth Walshe;
Sean Tanner;
Venk Kandadai;
Santiago Ontañón

ABSTRACT

Background:

A large midwestern state commissioned a virtual driving test (VDT) to assess safe driving skills preparedness before the on-road license examination (ORE). Since July, 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with an aim to create a scoring algorithm.

Objective:

Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning-based classifier using limited domain knowledge and minimal feature engineering to predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to classification of other time series data collected within medical and other settings.

Methods:

We analyzed an initial dataset comprised of 4,308 drivers who completed both the VDT and the ORE; where 1,096 (25.4%) drivers went on to fail the ORE. We studied two different approaches to constructing feature sets to use as input to machine learning (ML) algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior, and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome.

Results:

The new method using time series clustering performed similarly compared to the standard method in terms of overall accuracy (0.761 vs. 0.762) and AUC (0.656 vs. 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure versus passing the ORE: those predicted to fail were three times more likely to fail the ORE than those predicted to pass (novel clustering method yields a risk ratio of 3.07 [95% CI: 2.75, 3.43]); standard variables method, 2.68 [95% CI: 2.41, 2.99]. Also, the time series clustering method with logistic regression produced the lowest ratio of false alarms (0.27).

Conclusions:

Our results provide initial evidence that the clustering method has utility for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables

Citation

Please cite as:

Grethlein D, Winston FK, Walshe E, Tanner S, Kandadai V, Ontañón S

Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data

J Med Internet Res 2020;22(6):e13995

DOI: 10.2196/13995

PMID: 32554384

PMCID: 7333075

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 12, 2019

Open Peer Review Period: Mar 15, 2019 - May 10, 2019

Date Accepted: Dec 16, 2019

(closed for review but you can still tweet)

Novel Machine Learning Method for Prediction Using Times Series Data: Initial Application to Prediction of On Road Exam Outcomes from Virtual Driving Test Data

ABSTRACT

Citation

Copyright