JMIR Preprints #41818: Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Sungrim Moon;
Huan He;
Heling Jia;
Hongfang Liu;
Jungwei Fan

ABSTRACT

Background:

Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions.

Objective:

Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions.

Methods:

We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset.

Results:

The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions.

Conclusions:

The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research. We created and shared a clinical EQA dataset with multi-answer and multi-focus questions that would channel future research efforts toward more realistic scenarios.

Citation

Please cite as:

Moon S, He H, Jia H, Liu H, Fan J

Extractive Clinical Question-Answering With Multianswer and Multifocus Questions: Data Set Development and Evaluation Study

JMIR AI 2023;2:e41818

DOI: 10.2196/41818

PMID: 38875580

PMCID: 11041481

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Aug 9, 2022

Date Accepted: May 22, 2023

Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

ABSTRACT

Citation

Copyright