Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Apr 24, 2023
Open Peer Review Period: May 15, 2023 - May 18, 2023
Date Accepted: Aug 10, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Patient Health Questionnaire-9 Item Pairing Predictiveness for Prescreening Depressive Symptomatology: Machine Learning Analysis

Glavin D, Grua EM, Nakamura CA, Scazufca M, Ribeiro dos Santos E, Wong GH, Hollingworth W, Peters TJ, Araya R, Van de Ven P

Patient Health Questionnaire-9 Item Pairing Predictiveness for Prescreening Depressive Symptomatology: Machine Learning Analysis

JMIR Ment Health 2023;10:e48444

DOI: 10.2196/48444

PMID: 37856186

PMCID: 10623235

A Machine Learning analysis of PHQ-9 item pairing predictiveness for pre-screening depressive symptomatology

  • Darragh Glavin; 
  • Eoin M. Grua; 
  • Carina A. Nakamura; 
  • Marcia Scazufca; 
  • Edinilza Ribeiro dos Santos; 
  • Gloria H.Y. Wong; 
  • William Hollingworth; 
  • Tim J. Peters; 
  • Ricardo Araya; 
  • Pepijn Van de Ven

ABSTRACT

Background:

Anhedonia and depressed mood are considered the cardinal symptoms of major depressive disorder. These are the first two items of the Patient Health Questionnaire-9 (PHQ-9) and make up the ultra-brief PHQ-2 questionnaire used for pre-screening depressive symptomatology. The pre-screening performance of alternative PHQ-9 item pairings is rarely compared with that of the PHQ-2.

Objective:

Use a data-driven Machine Learning (ML) approach on the PHQ-9 items to find and validate the most predictive two-item depressive symptomatology ultra-brief questionnaire. Test the generalisability of the best pairings found on the primary dataset, with six external datasets from different populations to validate their use as pre-screening instruments.

Methods:

All thirty-six possible PHQ-9 item pairings (each yielding scores of 0-6) were investigated using Machine Learning-based methods with logistic regression models. Their performances were evaluated on the classification of depressive symptomatology, defined as PHQ-9 scores ≥10. This gave each pairing equal opportunity and avoided any bias in item pairing selection.

Results:

The ML-based phq2&4, the depressed mood and low energy item pairing, and phq2&8, the depressed mood and psychomotor retardation or agitation item pairing, were found to be the best on the primary dataset training split. They generalised well on the primary dataset test split with Area Under the Receiver Operating Characteristic Curves (AUC) of 0.954 and 0.946, respectively, compared with an AUC of 0.942 for the PHQ-2. The phq2&4 had a higher AUC than the PHQ-2 on all six external datasets and the phq2&8 was higher than the PHQ-2 on three. The logistic regression probability thresholds that maximised Youden’s index (an unweighted average of sensitivity and specificity) during cross-validation on the primary dataset were applied to the phq2&4 and phq2&8 models. The phq2&4 had the highest Youden’s index on two external datasets and the phq2&8 was highest on another two. The PHQ-2 ≥2 also had the highest Youden’s index on two external datasets, joint highest with the phq2&4 on one, but its performance fluctuated the most. The PHQ-2 ≥3 had the highest Youden’s index on one external dataset. The sensitivity and specificity achieved by the phq2&4 and phq2&8 were more evenly balanced than the PHQ-2 ≥2 and ≥3 cutpoints. For the PHQ-2, the former cutpoint excessively favoured sensitivity over specificity and vice versa for the latter.

Conclusions:

The PHQ-2 was not a superior pre-screening instrument to other PHQ-9 item pairings. Evaluating all item pairings showed that, compared with alternative partner items, the anhedonia item underperformed alongside the depressed mood item. This suggests the inclusion of the anhedonia item in ultra-brief questionnaires may be a relatively arbitrary choice. The use of the PHQ-2 to pre-screen for depressive symptomatology could result in a greater number of misclassifications than alternatives such as the phq2&4 and phq2&8.


 Citation

Please cite as:

Glavin D, Grua EM, Nakamura CA, Scazufca M, Ribeiro dos Santos E, Wong GH, Hollingworth W, Peters TJ, Araya R, Van de Ven P

Patient Health Questionnaire-9 Item Pairing Predictiveness for Prescreening Depressive Symptomatology: Machine Learning Analysis

JMIR Ment Health 2023;10:e48444

DOI: 10.2196/48444

PMID: 37856186

PMCID: 10623235

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.