-
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Authors:
Nikola Zubić,
Federico Soldá,
Aurelio Sulser,
Davide Scaramuzza
Abstract:
Deep learning models have achieved significant success across various applications but continue to struggle with tasks requiring complex reasoning over sequences, such as function composition and compositional tasks. Despite advancements, models like Structured State Space Models (SSMs) and Transformers underperform in deep compositionality tasks due to inherent architectural and training limitati…
▽ More
Deep learning models have achieved significant success across various applications but continue to struggle with tasks requiring complex reasoning over sequences, such as function composition and compositional tasks. Despite advancements, models like Structured State Space Models (SSMs) and Transformers underperform in deep compositionality tasks due to inherent architectural and training limitations. Maintaining accuracy over multiple reasoning steps remains a primary challenge, as current models often rely on shortcuts rather than genuine multi-step reasoning, leading to performance degradation as task complexity increases. Existing research highlights these shortcomings but lacks comprehensive theoretical and empirical analysis for SSMs. Our contributions address this gap by providing a theoretical framework based on complexity theory to explain SSMs' limitations. Moreover, we present extensive empirical evidence demonstrating how these limitations impair function composition and algorithmic task performance. Our experiments reveal significant performance drops as task complexity increases, even with Chain-of-Thought (CoT) prompting. Models frequently resort to shortcuts, leading to errors in multi-step reasoning. This underscores the need for innovative solutions beyond current deep learning paradigms to achieve reliable multi-step reasoning and compositional task-solving in practical applications.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Scalar and Matrix Chernoff Bounds from $\ell_{\infty}$-Independence
Authors:
Tali Kaufman,
Rasmus Kyng,
Federico Soldá
Abstract:
We present new scalar and matrix Chernoff-style concentration bounds for a broad class of probability distributions over the binary hypercube $\{0,1\}^n$. Motivated by recent tools developed for the study of mixing times of Markov chains on discrete distributions, we say that a distribution is $\ell_\infty$-independent when the infinity norm of its influence matrix $\mathcal{I}$ is bounded by a co…
▽ More
We present new scalar and matrix Chernoff-style concentration bounds for a broad class of probability distributions over the binary hypercube $\{0,1\}^n$. Motivated by recent tools developed for the study of mixing times of Markov chains on discrete distributions, we say that a distribution is $\ell_\infty$-independent when the infinity norm of its influence matrix $\mathcal{I}$ is bounded by a constant. We show that any distribution which is $\ell_\infty$-independent satisfies a matrix Chernoff bound that matches the matrix Chernoff bound for independent random variables due to Tropp. Our matrix Chernoff bound is a broad generalization and strengthening of the matrix Chernoff bound of Kyng and Song (FOCS'18). Using our bound, we can conclude as a corollary that a union of $O(\log|V|)$ random spanning trees gives a spectral graph sparsifier of a graph with $|V|$ vertices with high probability, matching results for independent edge sampling, and matching lower bounds from Kyng and Song.
△ Less
Submitted 6 January, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Coreset-based Strategies for Robust Center-type Problems
Authors:
Andrea Pietracaprina,
Geppino Pucci,
Federico Soldà
Abstract:
Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance…
▽ More
Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance from the centers. In this paper, we focus on two important constrained variants of the robust $k$-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank $k$ built on $V$, and the Robust Knapsack Center (RKC) problem, where each element $i\in V$ is given a positive weight $w_i<1$ and the aggregate weight of the returned centers must be at most 1. We devise coreset-based strategies for the two problems which yield efficient sequential, MapReduce, and Streaming algorithms. More specifically, for any fixed $ε>0$, the algorithms return solutions featuring a $(3+ε)$-approximation ratio, which is a mere additive term $ε$ away from the 3-approximations achievable by the best known polynomial-time sequential algorithms for the two problems. Moreover, the algorithms obliviously adapt to the intrinsic complexity of the dataset, captured by its doubling dimension $D$. For wide ranges of the parameters $k,z,ε, D$, we obtain a sequential algorithm with running time linear in $|V|$, and MapReduce/Streaming algorithms with few rounds/passes and substantially sublinear local/working memory.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.