Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 11, 2020
Date Accepted: Jun 21, 2021

The final, peer-reviewed published version of this preprint can be found here:

Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System

He K, Yao L, Zhang J, Li Y, Li C

Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System

J Med Internet Res 2021;23(8):e25670

DOI: 10.2196/25670

PMID: 34346903

PMCID: 8374669

Construction of Genealogical Knowledge Graphs from Obituaries: A Multi-task neural network extraction system

  • Kai He; 
  • Lixia Yao; 
  • JiaWei Zhang; 
  • Yufei Li; 
  • Chen Li

ABSTRACT

Background:

Genealogical information, such as family trees, is imperative for many biomedical research, such as disease heritability and risk prediction. Researchers have utilized the policyholders and their dependents information in medical claims data and emergency contacts in Electronic Health Records (EHR) to infer the family relationships at large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees.

Objective:

Aiming at supplementing EHR data with family relationships for more biomedical research, we build an end-to-end information extraction system using a multi-task based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death/birth dates and residence.

Methods:

Built on a predefined family relationship map consisting of 4 types of entities (e.g., people’s name, residence, birth and death date) and 71 types of relationships, we curate a corpus containing 1,700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopt the data augment technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. Then, a multi-task based artificial neural network model is built to simultaneously detect the names, extract the relationships between them, and assign the attributes (e.g., birth and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into bigger ones by identifying people appearing in multiple obituaries.

Results:

Our system achieves the satisfying Precision (94.79%), Recall (91.45%), and F-1 measure (93.09%) on 10-fold cross-validation. We also construct 12,407 GKGs, with the largest one made up of 4 generations and 30 people.

Conclusions:

In this work, we discussed the meaning of GKGs for biomedical research, presented a new version corpus with a predefined family relationship map and the augmented training data, and proposed a multi-task deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub, without the corpus for privacy protection.


 Citation

Please cite as:

He K, Yao L, Zhang J, Li Y, Li C

Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System

J Med Internet Res 2021;23(8):e25670

DOI: 10.2196/25670

PMID: 34346903

PMCID: 8374669

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.