Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 26, 2019
Date Accepted: Feb 10, 2020
On Symptom Distribution Regularity of Insomnia based on Node2vec and Spectral Clustering
ABSTRACT
Background:
Recent research in machine learning technique has led to significant progress in various research fields. Especially, the knowledge discovery using this method in Traditional Chinese Medicine has been becoming a hot topic. Being one kind of the key clinical manifestations of patients, symptoms play a significant role in clinical diagnosis and treatment, which evidently have their underlying TCM mechanisms.
Objective:
We make attempts to explore the core symptoms and potential regularity of symptoms for diagnosing insomnia, a fact that can reveal the key symptoms of insomnia, the hidden relationships underlying the symptoms and their corresponding syndromes.
Methods:
The insomnia data set with 807 samples has been extracted from the real-world Electronic Medical Records. After cleaning and selecting the theme data referring to the syndromes and symptoms, the symptom network analysis model has been constructed using the theory of complex network. Then, we used four evaluation metrics of node centrality to discover the core symptom nodes from multiple aspects. In order to explore the hidden relationships among symptoms, we trained each symptom node in network to obtain the symptom embedding representation using the Skip-Gram model and the theory of node2vec. After acquiring the symptom vocabulary with the digital format of vectors, we calculated the similarities between any two symptom embeddings, and clustered these symptom embeddings into five communities using the Spectral Clustering algorithm.
Results:
The top 5 core symptoms of insomnia diagnosis, including difficulty falling asleep, easy to wake up at night, dysphoria and irascibility, forgetful, and spiritlessness and weakness, were identified using evaluation metrics of node centrality. The symptom embeddings with the hidden relationships were constructed, which can be considered as the basic database for future insomnia research. The symptom network was divided into 5 communities, and these symptoms were accurately categorized into their corresponding syndromes.
Conclusions:
The experimental results shed light on that the methodologies used in this manuscript can objectively and effectively find the key symptoms and relationships among symptoms. The research results also reveal the symptom distribution and symptom clusters of insomnia and provide tremendously valuable guidance for clinical diagnosis and treatment for insomnia.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.