Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool

Lately, discourse structure has received considerable attention due to the benefits carried out by its application in several NLP task such as opinion mining, summarization, question answering, text simplification, among others. With the aim of automatically analyzing texts, discourse parsers typically perform two different tasks: i) identification of basic discourse units (text segmentation) ii) linking of discourse units by means of discourse relations, building trees or graphs as structures. The resulting discourse structures are, in general terms, accurate at intra-sentence discourse level relations, while they fail to capture the right inter-sentence relations. Detecting the main discourse unit or the central unit is helpful for discourse analyzers (and also for manual annotation) to improve their results in rhetorical labelling. Keeping this in mind, we propose to build the first two steps of a discourse parser following a top-down strategy: i) to find discourse units, ii) to detect the central unit. Third step, and last, for assigning rhetorical relations, rests for the immediate future work. According to our strategy, this paper presents a tool consisting of a discourse segmenter and an automatic central discourse unit detector. The results obtained outperform those of previous tools developed for Basque: a gain of 0.12 F1 in segmentation and 0.071 F1 in the central unit detector.
Egileak: 
Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta
Fitxategi publikoak: 
Urtea: 
2019
Artikuluaren erreferentzia: 
PLoS ONE 14(9): e0221639
ISBN edo ISSN (aldizkari, kongresu, liburu edo liburu atalak): 
https://doi.org/10.1371/journal.pone.0221639

Argitalpen mota fina (argitalpen_sailkapen_ohia):