Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 9, 2019
Open Peer Review Period: May 13, 2019 - Jul 8, 2019
Date Accepted: Aug 22, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

Bai J, Jhaney I, Wells J

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

JMIR Med Inform 2019;7(4):e14667

DOI: 10.2196/14667

PMID: 31710301

PMCID: 6913755

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group

  • Jinbing Bai; 
  • Ileen Jhaney; 
  • Jessica Wells

ABSTRACT

Background:

Cloud computing for microbiome data sets can significantly increase the working efficiencies and expedite the translation of research findings into clinical practice. Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis.

Objective:

The purpose of this study was to develop a microbiome data analysis pipeline by using AWS cloud and conduct a proof-of-concept test for microbiome data storage, processing, and analysis.

Methods:

A multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that can be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples.

Results:

Using AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service (S3) for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (i.e., servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (i.e., QIIME 2 and RStudio) were installed within Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in Mac OS™ systems. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost. Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team.

Conclusions:

Building a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline (MAP-AWS) provides an efficient tool to conduct microbiome data analysis. Clinical Trial: NA


 Citation

Please cite as:

Bai J, Jhaney I, Wells J

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

JMIR Med Inform 2019;7(4):e14667

DOI: 10.2196/14667

PMID: 31710301

PMCID: 6913755

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.