Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 26, 2019
Date Accepted: Feb 10, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
On Symptom Distribution Regularity of Insomnia based on Word2vec and Spectral Clustering Algorithm
ABSTRACT
Background:
Recent research in machine learning technique has led to significant progress in various research fields. Especially, the knowledge discovery using this method in Traditional Chinese Medicine (TCM) has been becoming a hot topic. Being one kind of the key clinical manifestations of patients, symptoms play significant role for clinical diagnosis and treatment, which evidently have their underlying TCM mechanisms.
Objective:
We make attempts to explore the core symptom and potential regularity of symptoms for diagnosing the insomnia, a fact that can reveal the key symptoms of insomnia, the hidden relationship underlying the symptoms and their corresponding syndromes.
Methods:
The insomnia data set with 807 samples have been extracted from the real-world Electronic Medical Records (EMRs). After cleaning and selecting the theme data referring to the syndromes and symptoms, the symptom network analysis model has been constructed using the theory of complex network. Then, we used four evaluation metrics of node centrality to discover the core symptom nodes from multiple aspects. In order to explore the hidden relationships between symptoms, we trained each symptom node in network to obtain the symptom embedding using the Skip-Gram model in Word2vec. After acquiring the vocabulary of symptoms with the digital format of vectors, we calculated the similarity between any two symptom embeddings, and clustered these symptom embeddings into five communities using the Spectral Clustering (SC) algorithm.
Results:
The top 5 core symptoms of insomnia diagnosis, including difficulty falling asleep, easy to wake up at night, dysphoria and irascibility, forgetful, and spiritlessness and weakness, were identified using evaluation metrics of node centrality. The symptom embeddings with the hidden relationship were constructed, which can be considered as the basic database for insomnia research. The symptom network was divided into 5 communities, and these symptoms were accurately categorized into their corresponding syndromes.
Conclusions:
The experimental results shed light on that the methodologies used in this manuscript can objectively and effectively find the key symptoms and relationships between symptoms. The research results also reveal the symptom distribution and symptom clusters of insomnia and provide tremendously valuable guidance for clinical diagnosis and treatment for insomnia.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.