Natural Language Processing and Language Technologies for the Basque Language

Keywords: basque, natural language processing, language technologies, language digitalisation

Abstract

The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years. In order to develop advanced systems that are considered the basics for an efficient digital communication (e.g. machine translation systems, text-to-speech and speech-to-text converters and digital assistants), it is necessary to digitalise linguistic resources and create tools. In the case of Basque, scholars have studied the creation of digital linguistic resources and the tools that allow the development of those systems for the last forty years. In this paper, we present an overview of the natural language processing and language technology resources developed for Basque, their impact in the process of making Basque a “digital language” and the applications and challenges in multilingual communication. More precisely, we present the well-known products for Basque, the basic tools and the resources that are behind the products we use every day. Likewise, we would like that this survey serves as a guide for other minority languages that are making their way to digitalisation.

Received: 05 April 2022
Accepted: 20 May 2022

Downloads

Download data is not yet available.

Author Biographies

Itziar Gonzalez-Dios, HiTZ Basque Center for Language Technologies-Ixa NLP Group, University of the Basque Country, Spain

Assistant professor at the Faculty of Engineering in Bilbao in the department of Basque Language and Communication and researcher of the HiTZ center (Ixa group) from the University of the Basque Country (UPV/EHU). She received her PhD on Language Analysis and Processing (computational linguistics) in 2016, her M.A. on the same topic in 2011 and her B.A. on German Philology in 2010, all of them at the University of the Basque Country (UPV/EHU). She has published over 45 international peer-reviewed articles and conference papers in Natural Language Processing, mainly in the areas of readability assessment and automatic text simplification. Her research is also focused on developing lexical, semantic and terminological resources for less resourced languages. She has participated in national and international research projects. She has also served as reviewer in various international journals, conferences and workshops and has experience organizing international scientific conferences, workshops and hackathons. She speaks fluently Spanish, Basque, English, German, French and Italian.

Begoña Altuna, University of the Basque Country, Spain

Postdoctoral researcher at the HiTZ center (Ixa group) of the University of the Basque Country (UPV/EHU). She has a PhD in Language Analysis and Processing (University of the Basque Country, 2018), as well as a degree in Basque Philology (University of Deusto, 2011) and a master’s degree in Language Analysis and Processing (University of the Basque Country, 2013). She has done two stays as a visiting researcher at the Fondazione Bruno Kessler (Trento, Italy), one as a predoctoral researcher (2016) and the second as a postdoctoral researcher (2020-2022) as a beneficiary of the Basque Government postdoctoral fellowship. Her main line of research is the analysis of temporal information in Basque, the creation of annotated corpora and the development of tools for the extraction of temporal information. In addition, she collaborates in research on the analysis of neural networks and in the field of digital humanities. She is the author of more than 20 peer-reviewed publications in international journals and conferences in the field of natural language processing. She has participated in several national and international projects, having special responsibility in the European Clinical Case Corpus (E3C) project. In addition, she has organized seminars and workshops and is a reviewer at several international conferences.

Published
2022-07-22
How to Cite
Gonzalez-Dios, Itziar, and Begoña Altuna. 2022. “Natural Language Processing and Language Technologies for the Basque Language”. Deusto Journal of European Studies, no. 04 (July), 203-30. https://doi.org/10.18543/ced.2477.