-
eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey
Authors:
Krzysztof Nowak,
Jędrzej Ziębura,
Krzysztof Wróbel,
Aleksander Smywiński-Pohl
Abstract:
This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performan…
▽ More
This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performance, addressing challenges such as orthographic variations and the integration of Latinized vernacular terms. The models achieved high accuracy rates: lemmatization at 92.60%, part-of-speech tagging at 83.29%, and morphological feature determination at 88.57%. The findings underscore the importance of high-quality annotated corpora and propose future enhancements, including extending the models to Named Entity Recognition.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Recurrent and Convolutional Neural Networks in Classification of EEG Signal for Guided Imagery and Mental Workload Detection
Authors:
Filip Postepski,
Grzegorz M. Wojcik,
Krzysztof Wrobel,
Andrzej Kawiak,
Katarzyna Zemla,
Grzegorz Sedek
Abstract:
The Guided Imagery technique is reported to be used by therapists all over the world in order to increase the comfort of patients suffering from a variety of disorders from mental to oncology ones and proved to be successful in numerous of ways. Possible support for the therapists can be estimation of the time at which subject goes into deep relaxation. This paper presents the results of the inves…
▽ More
The Guided Imagery technique is reported to be used by therapists all over the world in order to increase the comfort of patients suffering from a variety of disorders from mental to oncology ones and proved to be successful in numerous of ways. Possible support for the therapists can be estimation of the time at which subject goes into deep relaxation. This paper presents the results of the investigations of a cohort of 26 students exposed to Guided Imagery relaxation technique and mental task workloads conducted with the use of dense array electroencephalographic amplifier. The research reported herein aimed at verification whether it is possible to detect differences between those two states and to classify them using deep learning methods and recurrent neural networks such as EEGNet, Long Short-Term Memory-based classifier, 1D Convolutional Neural Network and hybrid model of 1D Convolutional Neural Network and Long Short-Term Memory. The data processing pipeline was presented from the data acquisition, through the initial data cleaning, preprocessing and postprocessing. The classification was based on two datasets: one of them using 26 so-called cognitive electrodes and the other one using signal collected from 256 channels. So far there have not been such comparisons in the application being discussed. The classification results are presented by the validation metrics such as: accuracy, recall, precision, F1-score and loss for each case. It turned out that it is not necessary to collect signals from all electrodes as classification of the cognitive ones gives the results similar to those obtained for the full signal and extending input to 256 channels does not add much value. In Disscussion there were proposed an optimal classifier as well as some suggestions concerning the prospective development of the project.
△ Less
Submitted 28 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
On multidimensional generalization of binary search
Authors:
Dariusz Dereniowski,
Przemysław Gordinowicz,
Karolina Wróbel
Abstract:
This work generalizes the binary search problem to a $d$-dimensional domain $S_1\times\cdots\times S_d$, where $S_i=\{0, 1, \ldots,n_i-1\}$ and $d\geq 1$, in the following way. Given $(t_1,\ldots,t_d)$, the target element to be found, the result of a comparison of a selected element $(x_1,\ldots,x_d)$ is the sequence of inequalities each stating that either $t_i < x_i$ or $t_i>x_i$, for…
▽ More
This work generalizes the binary search problem to a $d$-dimensional domain $S_1\times\cdots\times S_d$, where $S_i=\{0, 1, \ldots,n_i-1\}$ and $d\geq 1$, in the following way. Given $(t_1,\ldots,t_d)$, the target element to be found, the result of a comparison of a selected element $(x_1,\ldots,x_d)$ is the sequence of inequalities each stating that either $t_i < x_i$ or $t_i>x_i$, for $i\in\{1,\ldots,d\}$, for which at least one is correct, and the algorithm does not know the coordinate $i$ on which the correct direction to the target is given. Among other cases, we show asymptotically almost matching lower and upper bounds of the query complexity to be in $Ω(n^{d-1}/d)$ and $O(n^d)$ for the case of $n_i=n$. In particular, for fixed $d$ these bounds asymptotically do match. This problem is equivalent to the classical binary search in case of one dimension and shows interesting differences for higher dimensions. For example, if one would impose that each of the $d$ inequalities is correct, then the search can be completed in $\log_2\max\{n_1,\ldots,n_d\}$ queries. In an intermediate model when the algorithm knows which one of the inequalities is correct the sufficient number of queries is $\log_2(n_1\cdot\ldots\cdot n_d)$. The latter follows from a graph search model proposed by Emamjomeh-Zadeh et al. [STOC 2016].
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Convolutional neural network compression for natural language processing
Authors:
Krzysztof Wróbel,
Marcin Pietroń,
Maciej Wielgosz,
Michał Karwatowski,
Kazimierz Wiatr
Abstract:
Convolutional neural networks are modern models that are very efficient in many classification tasks. They were originally created for image processing purposes. Then some trials were performed to use them in different domains like natural language processing. The artificial intelligence systems (like humanoid robots) are very often based on embedded systems with constraints on memory, power consu…
▽ More
Convolutional neural networks are modern models that are very efficient in many classification tasks. They were originally created for image processing purposes. Then some trials were performed to use them in different domains like natural language processing. The artificial intelligence systems (like humanoid robots) are very often based on embedded systems with constraints on memory, power consumption etc. Therefore convolutional neural network because of its memory capacity should be reduced to be mapped to given hardware. In this paper, results are presented of compressing the efficient convolutional neural networks for sentiment analysis. The main steps are quantization and pruning processes. The method responsible for map** compressed network to FPGA and results of this implementation are presented. The described simulations showed that 5-bit width is enough to have no drop in accuracy from floating point version of the network. Additionally, significant memory footprint reduction was achieved (from 85% up to 93%).
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Improving text classification with vectors of reduced precision
Authors:
Krzysztof Wróbel,
Maciej Wielgosz,
Marcin Pietroń,
Michał Karwatowski,
Aleksander Smywiński-Pohl
Abstract:
This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF-IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision reduction on the classification quality was per…
▽ More
This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF-IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision reduction on the classification quality was performed on 5 corpora, using 4 different classifiers. Also, dimensionality reduction was taken into account. Results indicate that the precision reduction improves classification accuracy for most cases (up to 25% of error reduction). In general, the reduction from 64 to 4 bits gives the best scores and ensures that the results will not be worse than with the full floating-point representation.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.