Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Mar 4, 2022
Date Accepted: Aug 22, 2022

The final, peer-reviewed published version of this preprint can be found here:

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu PWL, Lee NC, Lai F

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

JMIR Bioinform Biotech 2022;3(1):e37701

DOI: 10.2196/37701

PMCID: 11168239

Diagnosis of Single Nucleotide Variant in Whole Exome Sequencing Data for Patients with Inherited Diseases: Using AI Variant Prioritization

  • Yu-Shan Huang; 
  • Ching Hsu; 
  • Yu-Chang Chune; 
  • I-Cheng Liao; 
  • Hsin Wang; 
  • Yi-Lin Lin; 
  • Paul Wuh-Liang Hwu; 
  • Ni-Chung Lee; 
  • Feipei Lai

ABSTRACT

Background:

In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period of time. Therefore, NGS technology is being widely introduced into clinical diagnosis practice, especially with those diagnosis of hereditary disorders. Processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines, and the exome data of single nucleotide variant (SNVs) will be generated.

Objective:

To assist physicians to interpret the genetic variation information generated by NGS in a short period of time

Methods:

We constructed a machine learning model for disease causing variants prediction in exome data. In our research, we collected sequencing data from whole exome sequencing and gene panel as training set. Then we integrated variant annotations from multiple genetic databases for model training. The model we built will rank SNVs and output the most possible disease-causing candidates. For model testing, we collected whole exome sequencing data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by keyword extraction tool from patient's electronic medical records into our machine learning model.

Results:

we succeed in 92.6% of the cases to locate the causative variant in the top 10 ranking list of average 741 candidate variants per person after filtering.

Conclusions:

The model ranks the same as manual performance, and it has been to use to help clinical diagnosis with genetic diseases.


 Citation

Please cite as:

Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu PWL, Lee NC, Lai F

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

JMIR Bioinform Biotech 2022;3(1):e37701

DOI: 10.2196/37701

PMCID: 11168239

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.