Resources

DoQA dataset

This dataset is for accessing Domain Specific FAQs via conversational QA that contains 2,437 information-seeking question/answer dialogues (10,917 questions in total) on three different domains: cooking, travel and movies. These dialogues are created by crowd workers that play the following two roles: the user who asks questions about a given topic, and the domain expert who replies to the questions by selecting a short span of text from the long textual reply. DoQA enables the development and evaluation of conversational QA systems that help users access the knowledge buried in domain specific FAQs. [see reference paper Campos et al., 2020]

ElkarHizketak dataset

This dataset is a low resource conversational Question Answering (QA) dataset in Basque created by Basque speaker volunteers. The dataset contains close to 400 dialogues and more than 1600 question and answers, and its small size presents a realistic low-resource scenario for conversational QA systems. The dataset is built on top of Wikipedia sections about popular people and organizations. The dialogues involve two crowd workers: (1) a student ask questions after reading a small introduction about the person, but without seeing the section text; and (2) a teacher answers the questions selecting a span of text of the section. [see reference paper Otegi et al., 2020]

OTTA corpus

OTTA (Operation Trees and Token Assignment) corpus consists of complex natural language questions and their corresponding representations of the structured queries. For the annotation of this corpus, we used a new procedure to increase the speed of annotating them. For this, we introduced an intermediate representation of the structured queries, which we called Operation Trees (OTs). Our OTs follow a context-free grammar and are based on logical query plans that can easily be mapped to SPARQL or SQL, making our system more versatile. [see reference paper Deriu et al., 2020]

Knowledge Graph in the cooking domain

This knowledge graph has been derived from the English wikibook related to cooking. Additionally, a set of answerable and unanswerable questions (86 questions) over this knowledge graph is available. This set of questions can be used for benchmarking system self-diagnostic about the reasons why the question cannot be answered by the knowledge graph, as the element missing in the knowledge base is given for those questions that an answer cannot be found.