-
TQCompressor: improving tensor decomposition methods in neural networks via permutations
Authors:
V. Abronin,
A. Naumov,
D. Mazur,
D. Bystrov,
K. Tsarova,
Ar. Melnikov,
I. Oseledets,
S. Dolgov,
R. Brasher,
M. Perelshtein
Abstract:
We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associ…
▽ More
We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associated with factorization. We demonstrate this method applied to the GPT-2$_{small}$. The result of the compression is TQCompressedGPT-2 model, featuring 81 mln. parameters compared to 124 mln. in the GPT-2$_{small}$. We make TQCompressedGPT-2 publicly available. We further enhance the performance of the TQCompressedGPT-2 through a training strategy involving multi-step knowledge distillation, using only a 3.1% of the OpenWebText. TQCompressedGPT-2 surpasses DistilGPT-2 and KnGPT-2 in comparative evaluations, marking an advancement in the efficient and effective deployment of models in resource-constrained environments.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
A weighted subspace exponential kernel for support tensor machines
Authors:
Kirandeep Kour,
Sergey Dolgov,
Peter Benner,
Martin Stoll,
Max Pfeffer
Abstract:
High-dimensional data in the form of tensors are challenging for kernel classification methods. To both reduce the computational complexity and extract informative features, kernels based on low-rank tensor decompositions have been proposed. However, what decisive features of the tensors are exploited by these kernels is often unclear. In this paper we propose a novel kernel that is based on the T…
▽ More
High-dimensional data in the form of tensors are challenging for kernel classification methods. To both reduce the computational complexity and extract informative features, kernels based on low-rank tensor decompositions have been proposed. However, what decisive features of the tensors are exploited by these kernels is often unclear. In this paper we propose a novel kernel that is based on the Tucker decomposition. For this kernel the Tucker factors are computed based on re-weighting of the Tucker matrices with tuneable powers of singular values from the HOSVD decomposition. This provides a mechanism to balance the contribution of the Tucker core and factors of the data. We benchmark support tensor machines with this new kernel on several datasets. First we generate synthetic data where two classes differ in either Tucker factors or core, and compare our novel and previously existing kernels. We show robustness of the new kernel with respect to both classification scenarios. We further test the new method on real-world datasets. The proposed kernel has demonstrated a higher test accuracy than the state-of-the-art tensor train multi-way multi-level kernel, and a significantly lower computational time.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Deep importance sampling using tensor trains with application to a priori and a posteriori rare event estimation
Authors:
Tiangang Cui,
Sergey Dolgov,
Robert Scheichl
Abstract:
We propose a deep importance sampling method that is suitable for estimating rare event probabilities in high-dimensional problems. We approximate the optimal importance distribution in a general importance sampling problem as the pushforward of a reference distribution under a composition of order-preserving transformations, in which each transformation is formed by a squared tensor-train decompo…
▽ More
We propose a deep importance sampling method that is suitable for estimating rare event probabilities in high-dimensional problems. We approximate the optimal importance distribution in a general importance sampling problem as the pushforward of a reference distribution under a composition of order-preserving transformations, in which each transformation is formed by a squared tensor-train decomposition. The squared tensor-train decomposition provides a scalable ansatz for building order-preserving high-dimensional transformations via density approximations. The use of composition of maps moving along a sequence of bridging densities alleviates the difficulty of directly approximating concentrated density functions. To compute expectations over unnormalized probability distributions, we design a ratio estimator that estimates the normalizing constant using a separate importance distribution, again constructed via a composition of transformations in tensor-train format. This offers better theoretical variance reduction compared with self-normalized importance sampling, and thus opens the door to efficient computation of rare event probabilities in Bayesian inference problems. Numerical experiments on problems constrained by differential equations show little to no increase in the computational complexity with the event probability going to zero, and allow to compute hitherto unattainable estimates of rare event probabilities for complex, high-dimensional posterior densities.
△ Less
Submitted 24 May, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Conditional Deep Inverse Rosenblatt Transports
Authors:
Tiangang Cui,
Sergey Dolgov,
Olivier Zahm
Abstract:
We present a novel offline-online method to mitigate the computational burden of Bayesian inference, particularly in the regime where the posterior densities are computationally demanding to evaluate while real-time inference results are needed. In the offline phase, the proposed method learns the joint law of the parameter random variables and the observable random variables in the tensor-train (…
▽ More
We present a novel offline-online method to mitigate the computational burden of Bayesian inference, particularly in the regime where the posterior densities are computationally demanding to evaluate while real-time inference results are needed. In the offline phase, the proposed method learns the joint law of the parameter random variables and the observable random variables in the tensor-train (TT) format. Then, in the online phase, the resulting order-preserving transport can be conditioned on newly observed data to characterize the posterior random variables in real-time. Compared with the state-of-the-art normalizing flows techniques, our proposed method relies on function approximation, for which we can provide a thorough performance analysis. The function approximation perspective allows us to significantly improve the capability of transport maps in challenging problems with high-dimensional observations and high-dimensional parameters. Capitalizing on this, we present novel heuristics to either reorder or reparametrize the variables to enhance the approximation power of TT. We then integrate the TT-based transport maps and the parameter reordering/reparametrization into a layered composite map to further improve the performance of the resulting inference. We demonstrate the efficiency of the proposed method on various statistical learning tasks involving ordinary differential equations (ODEs) and partial differential equations (PDEs).
△ Less
Submitted 28 January, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Deep composition of tensor-trains using squared inverse Rosenblatt transports
Authors:
Tiangang Cui,
Sergey Dolgov
Abstract:
Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the functional tensor-train approximation of the invers…
▽ More
Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the functional tensor-train approximation of the inverse Rosenblatt transport recently developed by Dolgov et al. (Stat Comput 30:603--625, 2020) to a wide class of high-dimensional non-negative functions, such as unnormalised probability density functions. First, we extend the inverse Rosenblatt transform to enable the transport to general reference measures other than the uniform measure. We develop an efficient procedure to compute this transport from a squared tensor-train decomposition which preserves the monotonicity. More crucially, we integrate the proposed order-preserving functional tensor-train transport into a nested variable transformation framework inspired by the layered structure of deep neural networks. The resulting deep inverse Rosenblatt transport significantly expands the capability of tensor approximations and transport maps to random variables with complicated nonlinear interactions and concentrated density functions. We demonstrate the efficiency of the proposed approach on a range of applications in statistical learning and uncertainty quantification, including parameter estimation for dynamical systems and inverse problems constrained by partial differential equations.
△ Less
Submitted 18 October, 2021; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Efficient Structure-preserving Support Tensor Train Machine
Authors:
Kirandeep Kour,
Sergey Dolgov,
Martin Stoll,
Peter Benner
Abstract:
An increasing amount of collected data are high-dimensional multi-way arrays (tensors), and it is crucial for efficient learning algorithms to exploit this tensorial structure as much as possible. The ever-present curse of dimensionality for high dimensional data and the loss of structure when vectorizing the data motivates the use of tailored low-rank tensor classification methods. In the presenc…
▽ More
An increasing amount of collected data are high-dimensional multi-way arrays (tensors), and it is crucial for efficient learning algorithms to exploit this tensorial structure as much as possible. The ever-present curse of dimensionality for high dimensional data and the loss of structure when vectorizing the data motivates the use of tailored low-rank tensor classification methods. In the presence of small amounts of training data, kernel methods offer an attractive choice as they provide the possibility for a nonlinear decision boundary. We develop the Tensor Train Multi-way Multi-level Kernel (TT-MMK), which combines the simplicity of the Canonical Polyadic decomposition, the classification power of the Dual Structure-preserving Support Vector Machine, and the reliability of the Tensor Train (TT) approximation. We show by experiments that the TT-MMK method is usually more reliable computationally, less sensitive to tuning parameters, and gives higher prediction accuracy in the SVM classification when benchmarked against other state-of-the-art techniques.
△ Less
Submitted 3 August, 2021; v1 submitted 12 February, 2020;
originally announced February 2020.