Search | arXiv e-print repository

MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish

Authors: Stefan Bott, Horacio Saggion, Nelson Peréz Rojas, Martin Solis Salazar, Saul Calderon Ramirez

Abstract: Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification wh… ▽ More Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification which is available for Spanish. Specifically, MultiLS-SP is the first dataset for Spanish which includes scalar ratings of the understanding difficulty of lexical items. In addition, we describe experiments with this dataset, which can serve as a baseline for future work on the same data. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Submitted to the 40th edition of the SEPLN Conference. Under Revision

arXiv:2312.10195 [pdf, other]

SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation

Authors: David C. Jeong, Hongji Liu, Saunder Salazar, Jessie Jiang, Christopher A. Kitts

Abstract: While recent two-stage many-to-one deep learning models have demonstrated great success in 3D human pose estimation, such models are inefficient ways to detect 3D key points in a sequential video relative to one-shot and many-to-many models. Another key drawback of two-stage and many-to-one models is that errors in the first stage will be passed onto the second stage. In this paper, we introduce S… ▽ More While recent two-stage many-to-one deep learning models have demonstrated great success in 3D human pose estimation, such models are inefficient ways to detect 3D key points in a sequential video relative to one-shot and many-to-many models. Another key drawback of two-stage and many-to-one models is that errors in the first stage will be passed onto the second stage. In this paper, we introduce SoloPose, a novel one-shot, many-to-many spatio-temporal transformer model for kinematic 3D human pose estimation of video. SoloPose is further fortified by HeatPose, a 3D heatmap based on Gaussian Mixture Model distributions that factors target key points as well as kinematically adjacent key points. Finally, we address data diversity constraints with the 3D AugMotion Toolkit, a methodology to augment existing 3D human pose datasets, specifically by projecting four top public 3D human pose datasets (Humans3.6M, MADS, AIST Dance++, MPI INF 3DHP) into a novel dataset (Humans7.1M) with a universal coordinate system. Extensive experiments are conducted on Human3.6M as well as the augmented Humans7.1M dataset, and SoloPose demonstrates superior results relative to the state-of-the-art approaches. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 8 pages, 6 figures

arXiv:2212.07432 [pdf, other]

Counterfactual Explanations for Support Vector Machine Models

Authors: Sebastian Salazar, Samuel Denton, Ansaf Salleb-Aouissi

Abstract: We tackle the problem of computing counterfactual explanations -- minimal changes to the features that flip an undesirable model prediction. We propose a solution to this question for linear Support Vector Machine (SVMs) models. Moreover, we introduce a way to account for weighted actions that allow for more changes in certain features than others. In particular, we show how to find counterfactual… ▽ More We tackle the problem of computing counterfactual explanations -- minimal changes to the features that flip an undesirable model prediction. We propose a solution to this question for linear Support Vector Machine (SVMs) models. Moreover, we introduce a way to account for weighted actions that allow for more changes in certain features than others. In particular, we show how to find counterfactual explanations with the purpose of increasing model interpretability. These explanations are valid, change only actionable features, are close to the data distribution, sparse, and take into account correlations between features. We cast this as a mixed integer programming optimization problem. Additionally, we introduce two novel scale-invariant cost functions for assessing the quality of counterfactual explanations and use them to evaluate the quality of our approach with a real medical dataset. Finally, we build a support vector machine model to predict whether law students will pass the Bar exam using protected features, and used our algorithms to uncover the inherent biases of the SVM. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2106.08981 [pdf, other]

doi 10.1016/j.physleta.2021.127756

Nonequilibrium thermodynamics of self-supervised learning

Authors: Domingos S. P. Salazar

Abstract: Self-supervised learning (SSL) of energy based models has an intuitive relation to equilibrium thermodynamics because the softmax layer, map** energies to probabilities, is a Gibbs distribution. However, in what way SSL is a thermodynamic process? We show that some SSL paradigms behave as a thermodynamic composite system formed by representations and self-labels in contact with a nonequilibrium… ▽ More Self-supervised learning (SSL) of energy based models has an intuitive relation to equilibrium thermodynamics because the softmax layer, map** energies to probabilities, is a Gibbs distribution. However, in what way SSL is a thermodynamic process? We show that some SSL paradigms behave as a thermodynamic composite system formed by representations and self-labels in contact with a nonequilibrium reservoir. Moreover, this system is subjected to usual thermodynamic cycles, such as adiabatic expansion and isochoric heating, resulting in a generalized Gibbs ensemble (GGE). In this picture, we show that learning is seen as a demon that operates in cycles using feedback measurements to extract negative work from the system. As applications, we examine some SSL algorithms using this idea. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 6 pages, 1 figure

Showing 1–4 of 4 results for author: Salazar, S