Search | arXiv e-print repository

Enhancing Text-to-SQL Translation for Financial System Design

Authors: Yewei Song, Saad Ezzini, Xunzhu Tang, Cedric Lothritz, Jacques Klein, Tegawendé Bissyandé, Andrey Boytsov, Ulrick Ble, Anne Goujon

Abstract: Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Model… ▽ More Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Models (LLMs), which have achieved state of the art for various NLP tasks. Specifically, we benchmark Text-to-SQL performance, the evaluation methodologies, as well as input optimization (e.g., prompting). In light of the empirical observations that we have made, we propose two novel metrics that were designed to adequately measure the similarity between SQL queries. Overall, we share with the community various findings, notably on how to select the right LLM on Text-to-SQL tasks. We further demonstrate that a tree-based edit distance constitutes a reliable metric for assessing the similarity between generated SQL queries and the oracle for benchmarking Text2SQL approaches. This metric is important as it relieves researchers from the need to perform computationally expensive experiments such as executing generated queries as done in prior works. Our work implements financial domain use cases and, therefore contributes to the advancement of Text2SQL systems and their practical adoption in this domain. △ Less

Submitted 8 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 10 pages, ICSE-SEIP 2024

arXiv:2308.10542 [pdf, other]

Learning Weakly Convex Regularizers for Convergent Image-Reconstruction Algorithms

Authors: Alexis Goujon, Sebastian Neumayer, Michael Unser

Abstract: We propose to learn non-convex regularizers with a prescribed upper bound on their weak-convexity modulus. Such regularizers give rise to variational denoisers that minimize a convex energy. They rely on few parameters (less than 15,000) and offer a signal-processing interpretation as they mimic handcrafted sparsity-promoting regularizers. Through numerical experiments, we show that such denoisers… ▽ More We propose to learn non-convex regularizers with a prescribed upper bound on their weak-convexity modulus. Such regularizers give rise to variational denoisers that minimize a convex energy. They rely on few parameters (less than 15,000) and offer a signal-processing interpretation as they mimic handcrafted sparsity-promoting regularizers. Through numerical experiments, we show that such denoisers outperform convex-regularization methods as well as the popular BM3D denoiser. Additionally, the learned regularizer can be deployed to solve inverse problems with iterative schemes that provably converge. For both CT and MRI reconstruction, the regularizer generalizes well and offers an excellent tradeoff between performance, number of parameters, guarantees, and interpretability when compared to other data-driven approaches. △ Less

Submitted 20 December, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

MSC Class: 26B25; 47A52; 49N45; 68U10; 65D07; 68T05; 90C26

arXiv:2303.01347 [pdf, other]

Letz Translate: Low-Resource Machine Translation for Luxembourgish

Authors: Yewei Song, Saad Ezzini, Jacques Klein, Tegawende Bissyande, Clément Lefebvre, Anne Goujon

Abstract: Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in co… ▽ More Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30\% faster and perform only 4\% lower compared to the large state-of-the-art NLLB model. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: The associated model is published on HuggingFace: https://huggingface.co/etamin/Letz-Translate-OPUS-LB-EN The Dictionary used in this paper is available in Github: https://github.com/Etamin/Ltz_dictionary

arXiv:2211.12461 [pdf, other]

A Neural-Network-Based Convex Regularizer for Inverse Problems

Authors: Alexis Goujon, Sebastian Neumayer, Pakshal Bohra, Stanislas Ducotterd, Michael Unser

Abstract: The emergence of deep-learning-based methods to solve image-reconstruction problems has enabled a significant increase in reconstruction quality. Unfortunately, these new methods often lack reliability and explainability, and there is a growing interest to address these shortcomings while retaining the boost in performance. In this work, we tackle this issue by revisiting regularizers that are the… ▽ More The emergence of deep-learning-based methods to solve image-reconstruction problems has enabled a significant increase in reconstruction quality. Unfortunately, these new methods often lack reliability and explainability, and there is a growing interest to address these shortcomings while retaining the boost in performance. In this work, we tackle this issue by revisiting regularizers that are the sum of convex-ridge functions. The gradient of such regularizers is parameterized by a neural network that has a single hidden layer with increasing and learnable activation functions. This neural network is trained within a few minutes as a multistep Gaussian denoiser. The numerical experiments for denoising, CT, and MRI reconstruction show improvements over methods that offer similar reliability guarantees. △ Less

Submitted 25 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2210.16222 [pdf, other]

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Authors: Stanislas Ducotterd, Alexis Goujon, Pakshal Bohra, Dimitris Perdios, Sebastian Neumayer, Michael Unser

Abstract: Lipschitz-constrained neural networks have several advantages over unconstrained ones and can be applied to a variety of problems, making them a topic of attention in the deep learning community. Unfortunately, it has been shown both theoretically and empirically that they perform poorly when equipped with ReLU activation functions. By contrast, neural networks with learnable 1-Lipschitz linear sp… ▽ More Lipschitz-constrained neural networks have several advantages over unconstrained ones and can be applied to a variety of problems, making them a topic of attention in the deep learning community. Unfortunately, it has been shown both theoretically and empirically that they perform poorly when equipped with ReLU activation functions. By contrast, neural networks with learnable 1-Lipschitz linear splines are known to be more expressive. In this paper, we show that such networks correspond to global optima of a constrained functional optimization problem that consists of the training of a neural network composed of 1-Lipschitz linear layers and 1-Lipschitz freeform activation functions with second-order total-variation regularization. Further, we propose an efficient method to train these neural networks. Our numerical experiments show that our trained networks compare favorably with existing 1-Lipschitz neural architectures. △ Less

Submitted 19 December, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

arXiv:2208.07787 [pdf, other]

Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization

Authors: Mehrsa Pourya, Alexis Goujon, Michael Unser

Abstract: Regression is one of the core problems tackled in supervised learning. Rectified linear unit (ReLU) neural networks generate continuous and piecewise-linear (CPWL) map**s and are the state-of-the-art approach for solving regression problems. In this paper, we propose an alternative method that leverages the expressivity of CPWL functions. In contrast to deep neural networks, our CPWL parameteriz… ▽ More Regression is one of the core problems tackled in supervised learning. Rectified linear unit (ReLU) neural networks generate continuous and piecewise-linear (CPWL) map**s and are the state-of-the-art approach for solving regression problems. In this paper, we propose an alternative method that leverages the expressivity of CPWL functions. In contrast to deep neural networks, our CPWL parameterization guarantees stability and is interpretable. Our approach relies on the partitioning of the domain of the CPWL function by a Delaunay triangulation. The function values at the vertices of the triangulation are our learnable parameters and identify the CPWL function uniquely. Formulating the learning scheme as a variational problem, we use the Hessian total variation (HTV) as regularizer to favor CPWL functions with few affine pieces. In this way, we control the complexity of our model through a single hyperparameter. By develo** a computational framework to compute the HTV of any CPWL function parameterized by a triangulation, we discretize the learning problem as the generalized least absolute shrinkage and selection operator (LASSO). Our experiments validate the usage of our method in low-dimensional scenarios. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2206.08615 [pdf, other]

On the Number of Regions of Piecewise Linear Neural Networks

Authors: Alexis Goujon, Arian Etemadi, Michael Unser

Abstract: Many feedforward neural networks (NNs) generate continuous and piecewise-linear (CPWL) map**s. Specifically, they partition the input domain into regions on which the map** is affine. The number of these so-called linear regions offers a natural metric to characterize the expressiveness of CPWL NNs. The precise determination of this quantity is often out of reach in practice, and bounds have b… ▽ More Many feedforward neural networks (NNs) generate continuous and piecewise-linear (CPWL) map**s. Specifically, they partition the input domain into regions on which the map** is affine. The number of these so-called linear regions offers a natural metric to characterize the expressiveness of CPWL NNs. The precise determination of this quantity is often out of reach in practice, and bounds have been proposed for specific architectures, including for ReLU and Maxout NNs. In this work, we generalize these bounds to NNs with arbitrary and possibly multivariate CPWL activation functions. We first provide upper and lower bounds on the maximal number of linear regions of a CPWL NN given its depth, width, and the number of linear regions of its activation functions. Our results rely on the combinatorial structure of convex partitions and confirm the distinctive role of depth which, on its own, is able to exponentially increase the number of regions. We then introduce a complementary stochastic framework to estimate the average number of linear regions produced by a CPWL NN. Under reasonable assumptions, the expected density of linear regions along any 1D path is bounded by the product of depth, width, and a measure of activation complexity (up to a scaling factor). This yields an identical role to the three sources of expressiveness: no exponential growth with depth is observed anymore. △ Less

Submitted 20 December, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2204.06233 [pdf, other]

Approximation of Lipschitz Functions using Deep Spline Neural Networks

Authors: Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, Michael Unser

Abstract: Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation func… ▽ More Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Showing 1–8 of 8 results for author: Goujon, A