EUROPA: A Legal Multilingual Keyphrase Generation Dataset
Authors:
Olivier Salaün,
Frédéric Piedboeuf,
Guillaume Le Berre,
David Alfonso Hermelo,
Philippe Langlais
Abstract:
Keyphrase generation has primarily been explored within the context of academic research articles, with a particular focus on scientific domains and the English language. In this work, we present EUROPA, a dataset for multilingual keyphrase generation in the legal domain. It is derived from legal judgments from the Court of Justice of the European Union (EU), and contains instances in all 24 EU of…
▽ More
Keyphrase generation has primarily been explored within the context of academic research articles, with a particular focus on scientific domains and the English language. In this work, we present EUROPA, a dataset for multilingual keyphrase generation in the legal domain. It is derived from legal judgments from the Court of Justice of the European Union (EU), and contains instances in all 24 EU official languages. We run multilingual models on our corpus and analyze the results, showing room for improvement on a domain-specific multilingual corpus such as the one we present.
△ Less
Submitted 14 June, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
The state of OAI-PMH repositories in Canadian Universities
Authors:
Frédéric Piedboeuf,
Guillaume Le Berre,
David Alfonso-Hermelo,
Olivier Charbonneau,
Philippe Langlais
Abstract:
This article presents a study of the current state of Universities Institutional Repositories (UIRs) in Canada. UIRs are vital to sharing information and documents, mainly Electronic Thesis and Dissertation (ETDs), and theoretically allow anyone, anywhere, to access the documents contained within the repository. Despite calls for consistent and shareable metadata in these repositories, our literat…
▽ More
This article presents a study of the current state of Universities Institutional Repositories (UIRs) in Canada. UIRs are vital to sharing information and documents, mainly Electronic Thesis and Dissertation (ETDs), and theoretically allow anyone, anywhere, to access the documents contained within the repository. Despite calls for consistent and shareable metadata in these repositories, our literature review shows inconsistencies in UIRs, including incorrect use of metadata fields and the omission of crucial information, rendering the systematic analysis of UIR complex. Nonetheless, we collected the data of 57 Canadian UIRs with the aim of analyzing Canadian data and to assess the quality of its UIRs. This was surprisingly difficult due to the lack of information about the UIRs, and we attempt to ease future collection efforts by organizing vital information which are difficult to find, starting from addresses of UIRs. We furthermore present and analyze the main characteristics of the UIRs we managed to collect, using this dataset to create recommendations for future practitioners.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.