JMIR Preprints #37701: Diagnosis of Single Nucleotide Variant in Whole Exome Sequencing Data for Patients with Inherited Diseases: Using AI Variant Prioritization

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Diagnosis of Single Nucleotide Variant in Whole Exome Sequencing Data for Patients with Inherited Diseases: Using AI Variant Prioritization

Yu-Shan Huang;
Ching Hsu;
Yu-Chang Chune;
I-Cheng Liao;
Hsin Wang;
Yi-Lin Lin;
Paul Wuh-Liang Hwu;
Ni-Chung Lee;
Feipei Lai

ABSTRACT

Background:

In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period of time. Therefore, NGS technology is being widely introduced into clinical diagnosis practice, especially with those diagnosis of hereditary disorders. Processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines, and the exome data of single nucleotide variant (SNVs) will be generated.

Objective:

To assist physicians to interpret the genetic variation information generated by NGS in a short period of time

Methods:

We constructed a machine learning model for disease causing variants prediction in exome data. In our research, we collected sequencing data from whole exome sequencing and gene panel as training set. Then we integrated variant annotations from multiple genetic databases for model training. The model we built will rank SNVs and output the most possible disease-causing candidates. For model testing, we collected whole exome sequencing data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by keyword extraction tool from patient's electronic medical records into our machine learning model.

Results:

we succeed in 92.6% of the cases to locate the causative variant in the top 10 ranking list of average 741 candidate variants per person after filtering.

Conclusions:

The model ranks the same as manual performance, and it has been to use to help clinical diagnosis with genetic diseases.

Citation

Please cite as:

Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu PWL, Lee NC, Lai F

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

JMIR Bioinform Biotech 2022;3(1):e37701

DOI: 10.2196/37701

PMCID: 11168239

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Mar 4, 2022

Date Accepted: Aug 22, 2022

Diagnosis of Single Nucleotide Variant in Whole Exome Sequencing Data for Patients with Inherited Diseases: Using AI Variant Prioritization

ABSTRACT

Citation

Copyright