Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 30, 2019
Date Accepted: Feb 10, 2020

The final, peer-reviewed published version of this preprint can be found here:

Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping

Horne E, Tibble H, Sheikh A, Tsanas A

Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping

JMIR Med Inform 2020;8(5):e16452

DOI: 10.2196/16452

PMID: 32463370

PMCID: 7290450

Challenges of clustering multimodal clinical data: a review of applications in asthma subtyping

  • Elsie Horne; 
  • Holly Tibble; 
  • Aziz Sheikh; 
  • Athanasios Tsanas

ABSTRACT

Background:

In the current era of personalised medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method which is commonly used for identifying subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging.

Objective:

To review the research literature on applications of clustering multimodal clinical data to identify asthma subtypes. We wanted to assess common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies.

Methods:

We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies applying dissimilarity-based cluster analysis methods. We recorded the analytic methods used by each study at each step of the cluster analysis process.

Results:

Our literature search identified 63 studies which applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were mixed-type in 47 (75%) studies, continuous in 12 (19%) and the feature type was unclear in the remaining four (6%) studies. Twenty-three (37%) studies used hierarchical clustering with Ward’s linkage and 22 (35%) studies used k-means. Out of these 45 studies, 39 had mixed-type features, but only five specified dissimilarity measures that could handle mixed-type features. Nine (14%) studies used a pre-clustering step to create small clusters to feed to a hierarchical method. The original sample sizes in these nine studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), multiple kernel k-means clustering (n=1), and in one study the methods were unclear. Fifty-four (86%) studies explained the methods used for determining the number of clusters; 24 (38%) studies tested whether their cluster solution was reproducible and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification.

Conclusions:

This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. While cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings. Clinical Trial: N/A


 Citation

Please cite as:

Horne E, Tibble H, Sheikh A, Tsanas A

Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping

JMIR Med Inform 2020;8(5):e16452

DOI: 10.2196/16452

PMID: 32463370

PMCID: 7290450

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.