3rd South American International Conference on Industrial Engineering and Operations Management

Natural Language Processing to Improve Information Retrieval for Meeting Minutes Written in Brazilian Portuguese

Ovídio Francisco
Publisher: IEOM Society International
0 Paper Citations
Track: Artificial Intelligence

A multi-thematic document is composed of many subjects in a continuous text. In this context, a meeting minute concentrates very important information such as guidelines and decisions. Therefore, it is often used as information source. Most information retrieval techniques do not deal well with multi-thematic documents, once it has non-structured data, lack of metadata and little subject delimitation. Furthermore, it is hard to assign them the main subject or point to a specific snippet. Here we had two main challenges. First, knowing where a subject goes to another, and second, how to identify them. To solve these needs we used text segmentation and topic extraction methods. The text segmentation technique splits a document into segments where each part contains a coherent subject, while topic extraction methods aim to group and describe them. Many researchers evaluate these methods using long texts such as concatenation of documents or transcriptions of discourse and multipart meetings written exclusively in English. In this work, we collected a corpus of meeting minutes written in Brazilian Portuguese which besides being the language less studied, has a more formal and succinct style. As a result, it generates a structure formed by segments represented by descriptors and grouped by topics which adds extra information about the subject that each segment deals with. Finally, we present a method to connect the text segmentation and topic extraction methods to improve the performance of information retrieval techniques as well as provide an annotated corpus for this domain.

Multi thematic, Text Segmentation, Topic Extraction, Information Retrieval.


Published in: 3rd South American International Conference on Industrial Engineering and Operations Management

Publisher: IEOM Society International
Date of Conference: May 10-12, 2022

ISBN: 978-1-7923-9159-0
ISSN/E-ISSN: 2169-8767