Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 6, 2021
Date Accepted: Oct 27, 2021

The final, peer-reviewed published version of this preprint can be found here:

A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

Pan Y, Wang C, Hu B, Xiang Y, Wang X, Chen Q, Chen J, Du J

A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

JMIR Med Inform 2021;9(12):e32698

DOI: 10.2196/32698

PMID: 34889749

PMCID: 8701710

MedTS: A BERT-based generation model to transform medical texts to SQL queries for electronic medical records

  • Youcheng Pan; 
  • Chenghao Wang; 
  • Baotian Hu; 
  • Yang Xiang; 
  • Xiaolong Wang; 
  • Qingcai Chen; 
  • Junjie Chen; 
  • Jingcheng Du

ABSTRACT

Background:

Electronic medical records (EMRs) are usually stored in relational databases that require structured query language (SQL) queries to retrieve information of interest. Effectively completing such queries is usually a challenging task for medical experts due to the barriers in expertise. However, existing text-to-SQL generation studies have not been fully embraced in the medical domain.

Objective:

The objective of this study was to propose a neural generation model, which can jointly consider the characteristics of medical text and the SQL structure, to automatically transform medical texts to SQL queries for EMRs.

Methods:

In contrast to regarding the SQL query as an ordinary word sequence, the syntax tree, introduced as an intermediate representation, is more in line with the tree-structure nature of SQL and also can effectively reduce the search space during generation. We proposed a medical text-to-SQL model (MedTS), which employed a pre-trained BERT as the encoder and leveraged a grammar-based LSTM as the decoder to predict the tree-structured intermediate representation that can be easily transformed to the final SQL query. Experiments are conducted on the MIMICSQL dataset and five competitor methods are compared.

Results:

Experimental results demonstrated that MedTS achieved the accuracy of 0.770 and 0.888 on the test set in terms of logic form and execution respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and has substantial improvements.

Conclusions:

The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potentials to be applied in the real medical scenario.


 Citation

Please cite as:

Pan Y, Wang C, Hu B, Xiang Y, Wang X, Chen Q, Chen J, Du J

A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

JMIR Med Inform 2021;9(12):e32698

DOI: 10.2196/32698

PMID: 34889749

PMCID: 8701710

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.