Search | arXiv e-print repository

arXiv:2405.19585 [pdf, other]

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Authors: Elizabeth Collins-Woodfin, Inbar Seroussi, Begoña García Malaxechebarría, Andrew W. Mackenzie, Elliot Paquette, Courtney Paquette

Abstract: We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in de… ▽ More We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm -- on the least squares problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity) for line search in the setting where the data covariance has only two distinct eigenvalues. For noiseless targets, we further demonstrate that the AdaGrad-Norm learning rate converges to a deterministic constant inversely proportional to the average eigenvalue of the data covariance matrix, and identify a phase transition when the covariance density of eigenvalues follows a power law distribution. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.15074 [pdf, other]

4+3 Phases of Compute-Optimal Neural Scaling Laws

Authors: Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington

Abstract: We consider the three parameter solvable neural scaling model introduced by Maloney, Roberts, and Sully. The model has three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent o… ▽ More We consider the three parameter solvable neural scaling model introduced by Maloney, Roberts, and Sully. The model has three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.02912 [pdf, ps, other]

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems

Authors: Tomás González, Cristóbal Guzmán, Courtney Paquette

Abstract: We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the polyhedral setting. We propose $(\varepsilon, δ)$-DP algorithms based on stochastic mirror descent that attain nearly dimension-independent convergence rates for the expected duality gap, a type of guarantee that was known before only for bilinear objectives. For convex-concave and first-order-smoo… ▽ More We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the polyhedral setting. We propose $(\varepsilon, δ)$-DP algorithms based on stochastic mirror descent that attain nearly dimension-independent convergence rates for the expected duality gap, a type of guarantee that was known before only for bilinear objectives. For convex-concave and first-order-smooth stochastic objectives, our algorithms attain a rate of $\sqrt{\log(d)/n} + (\log(d)^{3/2}/[n\varepsilon])^{1/3}$, where $d$ is the dimension of the problem and $n$ the dataset size. Under an additional second-order-smoothness assumption, we improve the rate on the expected gap to $\sqrt{\log(d)/n} + (\log(d)^{3/2}/[n\varepsilon])^{2/5}$. Under this additional assumption, we also show, by using bias-reduced gradient estimators, that the duality gap is bounded by $\log(d)/\sqrt{n} + \log(d)/[n\varepsilon]^{1/2}$ with constant success probability. This result provides evidence of the near-optimality of the approach. Finally, we show that combining our methods with acceleration techniques from online learning leads to the first algorithm for DP Stochastic Convex Optimization in the polyhedral setting that is not based on Frank-Wolfe methods. For convex and first-order-smooth stochastic objectives, our algorithms attain an excess risk of $\sqrt{\log(d)/n} + \log(d)^{7/10}/[n\varepsilon]^{2/5}$, and when additionally assuming second-order-smoothness, we improve the rate to $\sqrt{\log(d)/n} + \log(d)/\sqrt{n\varepsilon}$. Instrumental to all of these results are various extensions of the classical Maurey Sparsification Lemma, which may be of independent interest. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.05468 [pdf, other]

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Authors: Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet

Abstract: We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is in… ▽ More We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is inspired by recent advances in bilevel optimization and automatic implicit differentiation, leveraging the point of view of sampling as optimization over the space of probability distributions. We provide theoretical guarantees on the performance of our method, as well as experimental results demonstrating its effectiveness. We apply it to training energy-based models and finetuning denoising diffusions. △ Less

Submitted 22 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 38 pages, 16 figures. Updated with additional experiments

arXiv:2311.14863 [pdf, ps, other]

Geometric interactions between bricks and $τ$-rigidity

Authors: Kaveh Mousavand, Charles Paquette

Abstract: For finite dimensional algebras over algebraically closed fields, we prove some new results on the interactions between bricks and $τ$-rigid modules, and also on their geometric counterparts -- brick components and generically $τ$-reduced components. First, we give a new characterization of the locally representation-directed algebras and show that these are exactly the algebras for which every br… ▽ More For finite dimensional algebras over algebraically closed fields, we prove some new results on the interactions between bricks and $τ$-rigid modules, and also on their geometric counterparts -- brick components and generically $τ$-reduced components. First, we give a new characterization of the locally representation-directed algebras and show that these are exactly the algebras for which every brick is $τ$-rigid (hence, all indecomposable modules are bricks). We then consider analogous phenomena using some irreducible components of module varieties: the indecomposable components, brick components, and indecomposable generically $τ$-reduced components. For an algebra $A$, we prove that if two of these three sets of components coincide, then every indecomposable $τ$-rigid $A$-module is a brick. For brick-infinite algebras, we also provide a new construction that yields, in some cases, a rational ray outside of the $τ$-tilting fan (also called $g$-vector fan). For tame and $E$-tame algebras, we obtain new results on some open conjectures on $τ$-tilting fans and bricks. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 29 pages

MSC Class: 16P10; 16G20; 16D80; 16G60

arXiv:2310.16767 [pdf, ps, other]

Inversion Sets and Quotient Root Systems

Authors: Ivan Dimitrov, Cole Gigliotti, Etan Ossip, Charles Paquette, David Wehlau

Abstract: We provide a recursive description of all decompositions of the positive roots $R^+$ of a quotient root system $R$ into disjoint unions of inversion sets. Our description is type-independent and generalizes the analogous result for type $\mathbb A$ root systems in [USRA]. The main tool is the notion of an inflation of a subset of a quotient root system. This new notion allows us to treat all root… ▽ More We provide a recursive description of all decompositions of the positive roots $R^+$ of a quotient root system $R$ into disjoint unions of inversion sets. Our description is type-independent and generalizes the analogous result for type $\mathbb A$ root systems in [USRA]. The main tool is the notion of an inflation of a subset of a quotient root system. This new notion allows us to treat all root systems (and their quotients) uniformly. We also obtain some numerical results about the number of special decompositions. The new sequences we obtain may be considered as extensions of Catalan numbers. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Preliminary Version

MSC Class: 17B22

arXiv:2310.09591 [pdf, ps, other]

Idempotents in the group algebra of the infinite dihedral group

Authors: Ivan Dimitrov, Charles Paquette, David Wehlau, Tianyuan Xu

Abstract: We prove that over an algebraically closed field $\mathbb{K}$ of characteristic different from $2$, the group algebra $R=\mathbb{K} D_\infty$ of the infinite dihedral group $D_\infty$ has exactly six conjugacy classes of involutions (equivalently, of idempotents). This allows us to recover the fact that $R$ admits exactly four non-isomorphic indecomposable projective modules of the form $eR$ where… ▽ More We prove that over an algebraically closed field $\mathbb{K}$ of characteristic different from $2$, the group algebra $R=\mathbb{K} D_\infty$ of the infinite dihedral group $D_\infty$ has exactly six conjugacy classes of involutions (equivalently, of idempotents). This allows us to recover the fact that $R$ admits exactly four non-isomorphic indecomposable projective modules of the form $eR$ where $e$ is an idempotent, a result that was first established by Berman and Buzási. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 6 pages

MSC Class: Primary: 20C07; Secondary: 16D40; 16S34; 16U40

arXiv:2308.08977 [pdf, other]

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

Authors: Elizabeth Collins-Woodfin, Courtney Paquette, Elliot Paquette, Inbar Seroussi

Abstract: We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statis… ▽ More We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statistics, such as the risk and other measures of sub-optimality. This equivalence holds with overwhelming probability when the model parameter count grows proportionally to the number of data. This framework allows us to obtain learning rate thresholds for stability of SGD as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we illustrate this theory on some standard examples and show numerical simulations which give an excellent match to the theory. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Preliminary version

arXiv:2306.08263 [pdf, ps, other]

Semi-Invariant Rings: UFD and Codimension One Orbits

Authors: Charles Paquette, Deepanshu Prasad, David Wehlau

Abstract: Let $A$ be a finite dimensional associative $\mathbb{K}$-algebra over an algebraically closed field $\mathbb{K}$ of characteristic zero. To $A$, we can associate its basic form that is given by a quiver $Q = (Q_0, Q_1)$ with an admissible ideal $R$. For a dimension vector $β$, we consider an irreducible component $\mathcal{C}$ of the module variety of $β$-dimensional representations of $A$. The re… ▽ More Let $A$ be a finite dimensional associative $\mathbb{K}$-algebra over an algebraically closed field $\mathbb{K}$ of characteristic zero. To $A$, we can associate its basic form that is given by a quiver $Q = (Q_0, Q_1)$ with an admissible ideal $R$. For a dimension vector $β$, we consider an irreducible component $\mathcal{C}$ of the module variety of $β$-dimensional representations of $A$. The reductive group ${\rm GL}_β(\mathbb{K}):= \prod_{i \in Q_0}{\rm GL}_{β_i}(\mathbb{K})$ acts on $\mathcal{C}$ by change of basis, and has a unique closed orbit. We consider the corresponding ring of semi-invariants ${\rm SI}(Q, \mathcal{C})$. We prove that if $\mathcal{C}$ is factorial and has maximal orbits of codimension one, then ${\rm SI}(Q, \mathcal{C})$ is a complete intersection and is not multiplicity free. If $\mathcal{C}$ is not factorial, then this conclusion does not necessarily hold. We present examples showing that the codimension of the complete intersection can be arbitrarily large. Finally, we interpret our results in the case of hereditary algebras. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2209.05696 [pdf, ps, other]

Biserial algebras and generic bricks

Authors: Kaveh Mousavand, Charles Paquette

Abstract: We consider generic bricks and use them in the study of arbitrary biserial algebras over algebraically closed fields. For a biserial algebra $Λ$, we show that $Λ$ is brick-infinite if and only if it admits a generic brick, that is, there exists a generic $Λ$-module $G$ with $End_Λ(G)=k(x)$. Furthermore, we give an explicit numerical condition for brick-infiniteness of biserial algebras: If $Λ$ is… ▽ More We consider generic bricks and use them in the study of arbitrary biserial algebras over algebraically closed fields. For a biserial algebra $Λ$, we show that $Λ$ is brick-infinite if and only if it admits a generic brick, that is, there exists a generic $Λ$-module $G$ with $End_Λ(G)=k(x)$. Furthermore, we give an explicit numerical condition for brick-infiniteness of biserial algebras: If $Λ$ is of rank $n$, then $Λ$ is brick-infinite if and only if there exists an infinite family of bricks of length $d$, for some $2\leq d\leq 2n$. This also results in an algebro-geometric realization of $τ$-tilting finiteness of this family: $Λ$ is $τ$-tilting finite if and only if $Λ$ is brick-discrete, meaning that in every representation variety $mod(Λ, \underline{d})$, there are only finitely many orbits of bricks. Our results rely on our full classification of minimal brick-infinite biserial algebras in terms of quivers and relations. This is the modern analogue of the recent classification of minimal representation-infinite (special) biserial algebras, given by Ringel. In particular, we show that every minimal brick-infinite biserial algebra is gentle and admits exactly one generic brick. Furthermore, we describe the spectrum of such algebras, which is very similar to that of a tame hereditary algebra. In other words, $Brick(Λ)$ is the disjoint union of a unique generic brick with a countable infinite set of bricks of finite length, and a family of bricks of the same finite length parametrized by the ground field. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 26 pages

MSC Class: 16G20; 16G60; 16D80; 05E10

arXiv:2206.09901 [pdf, other]

Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime

Authors: Leonardo Cunha, Gauthier Gidel, Fabian Pedregosa, Damien Scieur, Courtney Paquette

Abstract: The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generating process, namely assuming knowledge of the expected spectral distribution (ESD) of the random matrix associated with the problem. This work shows t… ▽ More The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generating process, namely assuming knowledge of the expected spectral distribution (ESD) of the random matrix associated with the problem. This work shows that the concentration of eigenvalues near the edges of the ESD determines a problem's asymptotic average complexity. This a priori information on this concentration is a more grounded assumption than complete knowledge of the ESD. This approximate concentration is effectively a middle ground between the coarseness of the worst-case scenario convergence and the restrictive previous average-case analysis. We also introduce the Generalized Chebyshev method, asymptotically optimal under a hypothesis on this concentration and globally optimal when the ESD follows a Beta distribution. We compare its performance to classical optimization algorithms, such as gradient descent or Nesterov's scheme, and we show that, in the average-case context, Nesterov's method is universally nearly optimal asymptotically. △ Less

Submitted 22 June, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: To be published in ICML 2022

arXiv:2206.07252 [pdf, other]

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Authors: Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Abstract: Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quad… ▽ More Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk). △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2205.07069

arXiv:2206.01029 [pdf, other]

Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions

Authors: Kiwon Lee, Andrew N. Cheng, Courtney Paquette, Elliot Paquette

Abstract: We analyze the dynamics of large batch stochastic gradient descent with momentum (SGD+M) on the least squares problem when both the number of samples and dimensions are large. In this setting, we show that the dynamics of SGD+M converge to a deterministic discrete Volterra equation as dimension increases, which we analyze. We identify a stability measurement, the implicit conditioning ratio (ICR),… ▽ More We analyze the dynamics of large batch stochastic gradient descent with momentum (SGD+M) on the least squares problem when both the number of samples and dimensions are large. In this setting, we show that the dynamics of SGD+M converge to a deterministic discrete Volterra equation as dimension increases, which we analyze. We identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $\mathcal{O}(1/\sqrtκ)$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.08917 [pdf, ps, other]

Representations of free products of semisimple algebras via quivers

Authors: Andrew Buchanan, Ivan Dimitrov, Olivia Grace, Charles Paquette, David Wehlau, Tianyuan Xu

Abstract: Let $\mathbb{K}$ denote an algebraically closed field and $A$ a free product of finitely many semisimple associative $\mathbb{K}$-algebras. We associate to $A$ a finite acyclic quiver $Γ$ and show that the category of finite dimensional $A$-modules is equivalent to a full subcategory of the category ${\rm rep}(Γ)$ of finite dimensional representations of $Γ$. Under this equivalence, the simple… ▽ More Let $\mathbb{K}$ denote an algebraically closed field and $A$ a free product of finitely many semisimple associative $\mathbb{K}$-algebras. We associate to $A$ a finite acyclic quiver $Γ$ and show that the category of finite dimensional $A$-modules is equivalent to a full subcategory of the category ${\rm rep}(Γ)$ of finite dimensional representations of $Γ$. Under this equivalence, the simple $A$-modules correspond exactly to the $θ$-stable representations of $Γ$ for some stability parameter $θ$. This gives us necessary conditions for an $A$-module to be simple, conditions which are also sufficient if the module is in general position. Even though there are indecomposable modules that are not simple, we prove that a module in general position is always semisimple. We also discuss the construction of arbitrary finite dimensional modules using nilpotent representations of quivers. Finally, we apply our results to the case of a free product of finite groups when $\mathbb{K}$ has characteristic zero. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: 25 pages

MSC Class: 16G20; 16D60 (Primary) 16S10; 16E60; 16G60; 20E06 (Secondary)

arXiv:2205.07069 [pdf, other]

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

Authors: Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Abstract: We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares problem with $\ell^2$-regularization. We show that homogenized SGD is the high-dimensional equivalence of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of… ▽ More We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares problem with $\ell^2$-regularization. We show that homogenized SGD is the high-dimensional equivalence of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples $n$ and number of features $d$ are polynomially related ($d^c < n < d^{1/c}$ for some $c > 0$). By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation. Further we provide the exact value of the limiting excess risk in the case of quadratic losses when trained by SGD. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak (non-quantitative) form of delocalization of sample-side singular vectors of the data. Several motivating applications are provided including sample covariance matrices with independent samples and random features with non-generative model targets. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2106.03696 [pdf, other]

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Authors: Courtney Paquette, Elliot Paquette

Abstract: We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal… ▽ More We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters. △ Less

Submitted 25 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 39 pages, 7 figures

arXiv:2103.12700 [pdf, ps, other]

Minimal ($τ$-)tilting infinite algebras

Authors: Kaveh Mousavand, Charles Paquette

Abstract: Motivated by a new conjecture on the behavior of bricks, we start a systematic study of minimal $τ$-tilting infinite algebras. In particular, we treat minimal $τ$-tilting infinite algebras as a modern counterpart of minimal representation infinite algebras and show some of the fundamental similarities and differences between these families. We then relate our studies to the classical tilting theor… ▽ More Motivated by a new conjecture on the behavior of bricks, we start a systematic study of minimal $τ$-tilting infinite algebras. In particular, we treat minimal $τ$-tilting infinite algebras as a modern counterpart of minimal representation infinite algebras and show some of the fundamental similarities and differences between these families. We then relate our studies to the classical tilting theory and observe that this modern approach can provide fresh impetus to the study of some old problems. We further show that in order to verify the conjecture it is sufficient to treat those minimal $τ$-tilting infinite algebras where almost all bricks are faithful. Finally, we also prove that minimal extending bricks have open orbits, and consequently obtain a simple proof of the brick analogue of the First Brauer-Thrall Conjecture, recently shown by Schroll and Treffinger using some different techniques. △ Less

Submitted 28 April, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: Version 2 (18 pages): A short section is added to the end of the previous version in which we prove that all minimal extending bricks of functorially finite torsion classes have open orbits. From this we deduce a simple proof of the brick-analogue of the first Brauer-Thrall conjecture, recently shown by Schroll and Treffinger

MSC Class: 16D80; 16G20; 16G60; 05E10

arXiv:2102.04396 [pdf, other]

SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

Authors: Courtney Paquette, Kiwon Lee, Fabian Pedregosa, Elliot Paquette

Abstract: We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and the finite sum setting. Using this new framework, we show that the dynamics of SGD on a least squares problem with random data become deterministic in the large sample and… ▽ More We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and the finite sum setting. Using this new framework, we show that the dynamics of SGD on a least squares problem with random data become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which we also verify experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over the worst-case complexities. △ Less

Submitted 8 February, 2021; originally announced February 2021.

arXiv:2101.06851 [pdf, ps, other]

Subregular $J$-rings of Coxeter systems via quiver path algebras

Authors: Ivan Dimitrov, Charles Paquette, David Wehlau, Tianyuan Xu

Abstract: We study the subregular $J$-ring $J_C$ of a Coxeter system $(W,S)$, a subring of Lusztig's $J$-ring. We prove that $J_C$ is isomorphic to a quotient of the path algebra of the double quiver of $(W,S)$ by a suitable ideal that we associate to a family of Chebyshev polynomials. As applications, we use quiver representations to study the category mod-$A_K$ of finite dimensional right modules of the a… ▽ More We study the subregular $J$-ring $J_C$ of a Coxeter system $(W,S)$, a subring of Lusztig's $J$-ring. We prove that $J_C$ is isomorphic to a quotient of the path algebra of the double quiver of $(W,S)$ by a suitable ideal that we associate to a family of Chebyshev polynomials. As applications, we use quiver representations to study the category mod-$A_K$ of finite dimensional right modules of the algebra $A_K=K\otimes_\Z J_C$ over an algebraically closed field $K$ of characteristic zero. Our results include classifications of Coxeter systems for which mod-$A_K$ is semisimple, has finitely many simple modules up to isomorphism, or has a bound on the dimensions of simple modules. Incidentally, we show that every group algebra of a free product of finite cyclic groups is Morita equivalent to the algebra $A_K$ for a suitable Coxeter system; this allows us to specialize the classifications to the module categories of such group algebras. △ Less

Submitted 17 January, 2021; originally announced January 2021.

Comments: 49 pages, 7 figures

MSC Class: Primary: 20C08; 16G20; Secondary: 16D60; 20C07; 20E06

arXiv:2006.07285 [pdf, other]

doi 10.1112/tlm3.12025

Completions of discrete cluster categories of type $\mathbb{A}$

Authors: Charles Paquette, Emine Yildirim

Abstract: We complete the discrete cluster categories of type $\mathbb{A}$ as defined by Igusa and Todorov, by embedding such a discrete cluster category inside a larger one, and then taking a certain Verdier quotient. The resulting category is a Hom-finite Krull-Schmidt triangulated category containing the discrete cluster category as a full subcategory. The objects and Hom-spaces in this new category can… ▽ More We complete the discrete cluster categories of type $\mathbb{A}$ as defined by Igusa and Todorov, by embedding such a discrete cluster category inside a larger one, and then taking a certain Verdier quotient. The resulting category is a Hom-finite Krull-Schmidt triangulated category containing the discrete cluster category as a full subcategory. The objects and Hom-spaces in this new category can be described geometrically, even though the category is not $2$-Calabi-Yau and Ext-spaces are not always symmetric. We describe all cluster-tilting subcategories. Given such a subcategory, we define a cluster character that takes values in a ring with infinitely many indeterminates. Our cluster character is new in that it takes into account infinite dimensional sub-representations of infinite dimensional ones. We show that it satisfies the multiplication formula and also the exchange formula, provided that the objects being exchanged satisfy some local Calabi-Yau conditions. △ Less

Submitted 6 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: 33 pages, 11 figures

MSC Class: 18E30; 16G20

arXiv:2006.04299 [pdf, other]

Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis

Authors: Courtney Paquette, Bart van Merriënboer, Elliot Paquette, Fabian Pedregosa

Abstract: Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a… ▽ More Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems. △ Less

Submitted 2 October, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

arXiv:2003.10312 [pdf, ps, other]

A termination criterion for stochastic gradient descent for binary classification

Authors: Sina Baghal, Courtney Paquette, Stephen A. Vavasis

Abstract: We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification on the logistic and hinge loss with homogeneous linear predictors. Our theoretical results support the effectiveness of our stop** criterion when the data is Gaussian distributed. This presence of noise allows for the possibility of… ▽ More We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification on the logistic and hinge loss with homogeneous linear predictors. Our theoretical results support the effectiveness of our stop** criterion when the data is Gaussian distributed. This presence of noise allows for the possibility of non-separable data. We show that our test terminates in a finite number of iterations and when the noise in the data is not too large, the expected classifier at termination nearly minimizes the probability of misclassification. Finally, numerical experiments indicate for both real and synthetic data sets that our termination test exhibits a good degree of predictability on accuracy and running time. △ Less

Submitted 23 March, 2020; originally announced March 2020.

arXiv:1903.08497 [pdf, ps, other]

Potential-based analyses of first-order methods for constrained and composite optimization

Authors: Courtney Paquette, Stephen Vavasis

Abstract: We propose potential-based analyses for first-order algorithms applied to constrained and composite minimization problems. We first propose ``idealized'' frameworks for algorithms in the strongly and non-strongly convex cases and argue based on a potential that methods following the framework achieve the best possible rate. Then we show that the geometric descent (GD) algorithm by Bubeck et al.\ a… ▽ More We propose potential-based analyses for first-order algorithms applied to constrained and composite minimization problems. We first propose ``idealized'' frameworks for algorithms in the strongly and non-strongly convex cases and argue based on a potential that methods following the framework achieve the best possible rate. Then we show that the geometric descent (GD) algorithm by Bubeck et al.\ as extended to the constrained and composite setting by Chen et al.\ achieves this rate using the potential-based analysis for the strongly convex case. Next, we extend the GD algorithm to the case of non-strongly convex problems. We show using a related potential-based argument that our extension achieves the best possible rate in this case as well. The new GD algorithm achieves the best possible rate in the nonconvex case also. We also analyze accelerated gradient using the new potentials. We then turn to the special case of a quadratic function with a single ball constraint, the famous trust-region subproblem. For this case, the first-order trust-region Lanczos method by Gould et al.\ finds the optimal point in an increasing sequence of Krylov spaces. Our results for the general case immediately imply convergence rates for their method in both the strongly convex and non-strongly convex cases. We also establish the same convergence rates for their method using arguments based on Chebyshev polynomial approximation. To the best of our knowledge, no convergence rate has previously been established for the trust-region Lanczos method. △ Less

Submitted 20 March, 2019; originally announced March 2019.

MSC Class: 65K10

arXiv:1902.00317 [pdf, ps, other]

Idempotent reduction for the finitistic dimension conjecture

Authors: Diego Bravo, Charles Paquette

Abstract: In this note, we prove that if $Λ$ is an Artin algebra with a simple module $S$ of finite projective dimension, then the finiteness of the finitistic dimension of $Λ$ implies that of $(1-e)Λ(1-e)$ where $e$ is the primitive idempotent supporting $S$. We derive some consequences of this. In particular, we recover a result of Green-Solberg-Psaroudakis: if $Λ$ is the quotient of a path algebra by an… ▽ More In this note, we prove that if $Λ$ is an Artin algebra with a simple module $S$ of finite projective dimension, then the finiteness of the finitistic dimension of $Λ$ implies that of $(1-e)Λ(1-e)$ where $e$ is the primitive idempotent supporting $S$. We derive some consequences of this. In particular, we recover a result of Green-Solberg-Psaroudakis: if $Λ$ is the quotient of a path algebra by an admissible ideal $I$ whose defining relations do not involve a certain arrow $α$, then the finitistic dimension of $Λ$ is finite if and only if the finitistic dimension of $Λ/ΛαΛ$ is finite. △ Less

Submitted 4 November, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: 9 pages

MSC Class: 16E10; 16G20

arXiv:1807.07994 [pdf, ps, other]

A Stochastic Line Search Method with Convergence Rate Analysis

Authors: Courtney Paquette, Katya Scheinberg

Abstract: For deterministic optimization, line-search methods augment algorithms by providing stability and improved efficiency. We adapt a classical backtracking Armijo line-search to the stochastic optimization setting. While traditional line-search relies on exact computations of the gradient and values of the objective function, our method assumes that these values are available up to some dynamically a… ▽ More For deterministic optimization, line-search methods augment algorithms by providing stability and improved efficiency. We adapt a classical backtracking Armijo line-search to the stochastic optimization setting. While traditional line-search relies on exact computations of the gradient and values of the objective function, our method assumes that these values are available up to some dynamically adjusted accuracy which holds with some sufficiently large, but fixed, probability. We show the expected number of iterations to reach a near stationary point matches the worst-case efficiency of typical first-order methods, while for convex and strongly convex objective, it achieves rates of deterministic gradient descent in function values. △ Less

Submitted 20 July, 2018; originally announced July 2018.

arXiv:1803.02461 [pdf, other]

Subgradient methods for sharp weakly convex functions

Authors: Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee, Courtney Paquette

Abstract: Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and prova… ▽ More Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems. △ Less

Submitted 6 March, 2018; originally announced March 2018.

Comments: 16 pages, 3 figures

MSC Class: 65K05; 65K10; 90C15; 90C30

arXiv:1711.03247 [pdf, other]

The nonsmooth landscape of phase retrieval

Authors: Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette

Abstract: We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that as the number of Gaussia… ▽ More We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that as the number of Gaussian measurements increases, the stationary points converge to a codimension two set, at a controlled rate. Experiments on image recovery problems illustrate the developed algorithm and theory. △ Less

Submitted 6 January, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

Comments: 42 Pages, 15 figures

MSC Class: 65K10; 90C06

arXiv:1710.07239 [pdf, ps, other]

Generators versus projective generators in abelian categories

Authors: Charles Paquette

Abstract: Let $\mathcal{A}$ be an essentially small abelian category. We prove that if $\mathcal{A}$ admits a generator $M$ with ${\rm End}_{\mathcal{A}}(M)$ right artinian, then $\mathcal{A}$ admits a projective generator. If $\mathcal{A}$ is further assumed to be Grothendieck, then this implies that $\mathcal{A}$ is equivalent to a module category. When $\mathcal{A}$ is Hom-finite over a field $k$, the ex… ▽ More Let $\mathcal{A}$ be an essentially small abelian category. We prove that if $\mathcal{A}$ admits a generator $M$ with ${\rm End}_{\mathcal{A}}(M)$ right artinian, then $\mathcal{A}$ admits a projective generator. If $\mathcal{A}$ is further assumed to be Grothendieck, then this implies that $\mathcal{A}$ is equivalent to a module category. When $\mathcal{A}$ is Hom-finite over a field $k$, the existence of a generator is the same as the existence of a projective generator, and in case there is such a generator, $\mathcal{A}$ has to be equivalent to the category of finite dimensional right modules over a finite dimensional $k$-algebra. We also show that when $\mathcal{A}$ is a length category, then there is a one-to-one correspondence between exact abelian extension closed subcategories of $\mathcal{A}$ and collections of Hom-orthogonal Schur objects in $\mathcal{A}$. △ Less

Submitted 19 October, 2017; originally announced October 2017.

Comments: 10 pages

MSC Class: 16G20; 18E15; 18E10

arXiv:1703.10993 [pdf, other]

Catalyst Acceleration for Gradient-Based Non-Convex Optimization

Authors: Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui

Abstract: We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and sign… ▽ More We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and signal processing. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. These properties are achieved without assuming any knowledge about the convexity of the objective, by automatically adapting to the unknown weak convexity constant. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks. △ Less

Submitted 31 December, 2018; v1 submitted 31 March, 2017; originally announced March 2017.

arXiv:1703.08725 [pdf, ps, other]

Homological behavior of idempotent subalgebras and Ext algebras

Authors: Colin Ingalls, Charles Paquette

Abstract: Let $A$ be a (left and right) Noetherian ring that is semiperfect. Let $e$ be an idempotent of $A$ and consider the ring $Γ:=(1-e)A(1-e)$ and the semi-simple right $A$-module $S_e : = eA/e{\rm rad}A$. In this paper, we investigate the relationship between the global dimensions of $A$ and $Γ$, by using the homological properties of $S_e$. More precisely, we consider the Yoneda ring… ▽ More Let $A$ be a (left and right) Noetherian ring that is semiperfect. Let $e$ be an idempotent of $A$ and consider the ring $Γ:=(1-e)A(1-e)$ and the semi-simple right $A$-module $S_e : = eA/e{\rm rad}A$. In this paper, we investigate the relationship between the global dimensions of $A$ and $Γ$, by using the homological properties of $S_e$. More precisely, we consider the Yoneda ring $Y(e):={\rm Ext}^*_A(S_e,S_e)$ of $e$. We prove that if $Y(e)$ is artinian of finite global dimension, then $A$ has finite global dimension if and only if so is $Γ$. We also investigate the situation where both $A,Γ$ have finite global dimension. When $A$ is Koszul and finite dimensional, this implies that $Y(e)$ has finite global dimension. We end the paper with a reduction technique to compute the Cartan determiant of artin algebras. We prove that if $Y(e)$ has finite global dimension, then the Cartan determinants of $A$ and $Γ$ coincide. This provides a new way to approach the long-standing Cartan determinant conjecture. △ Less

Submitted 11 October, 2017; v1 submitted 25 March, 2017; originally announced March 2017.

Comments: 14 pages

MSC Class: 16E10; 16G10

arXiv:1703.06174 [pdf, other]

Group actions on cluster algebras and cluster categories

Authors: Charles Paquette, Ralf Schiffler

Abstract: We introduce admissible group actions on cluster algebras, cluster categories and quivers with potential and study the resulting orbit spaces. The orbit space of the cluster algebra has the structure of a generalized cluster algebra. This generalized cluster structure is different from those introduced by Chekhov-Shapiro and Lam-Pylyavskyy. For group actions on cluster algebras from surfaces, we d… ▽ More We introduce admissible group actions on cluster algebras, cluster categories and quivers with potential and study the resulting orbit spaces. The orbit space of the cluster algebra has the structure of a generalized cluster algebra. This generalized cluster structure is different from those introduced by Chekhov-Shapiro and Lam-Pylyavskyy. For group actions on cluster algebras from surfaces, we describe the generalized cluster structure of the orbit space in terms of a triangulated orbifold. In this case, we give a complete list of exchange polynomials, and we classify the algebras of rank 1 and 2. We also show that every admissible group action on a cluster category induces a precovering from the cluster category to the cluster category of orbits. Moreover this precovering is dense if the categories are of finite type. △ Less

Submitted 20 December, 2018; v1 submitted 17 March, 2017; originally announced March 2017.

Comments: 51 pages

MSC Class: 16G20; 16G60; 18E30; 13F60

arXiv:1605.05719 [pdf, ps, other]

Isotropic Schur roots

Authors: Charles Paquette, Jerzy Weyman

Abstract: In this paper, we study the isotropic Schur roots of an acyclic quiver $Q$ with $n$ vertices. We study the perpendicular category $\mathcal{A}(d)$ of a dimension vector $d$ and give a complete description of it when $d$ is an isotropic Schur $δ$. This is done by using exceptional sequences and by defining a subcategory $\mathcal{R}(Q,δ)$ attached to the pair $(Q,δ)$. The latter category is always… ▽ More In this paper, we study the isotropic Schur roots of an acyclic quiver $Q$ with $n$ vertices. We study the perpendicular category $\mathcal{A}(d)$ of a dimension vector $d$ and give a complete description of it when $d$ is an isotropic Schur $δ$. This is done by using exceptional sequences and by defining a subcategory $\mathcal{R}(Q,δ)$ attached to the pair $(Q,δ)$. The latter category is always equivalent to the category of representations of a connected acyclic quiver $Q_{\mathcal{R}}$ of tame type, having a unique isotropic Schur root, say $δ_{\mathcal{R}}$. The understanding of the simple objects in $\mathcal{A}(δ)$ allows us to get a finite set of generators for the ring of semi-invariants SI$(Q,δ)$ of $Q$ of dimension vector $δ$. The relations among these generators come from the representation theory of the category $\mathcal{R}(Q,δ)$ and from a beautiful description of the cone of dimension vectors of $\mathcal{A}(δ)$. Indeed, we show that SI$(Q,δ)$ is isomorphic to the ring of semi-invariants SI$(Q_{\mathcal{R}},δ_{\mathcal{R}})$ to which we adjoin variables. In particular, using a result of Skowroński and Weyman, the ring SI$(Q,δ)$ is a polynomial ring or a hypersurface. Finally, we provide an algorithm for finding all isotropic Schur roots of $Q$. This is done by an action of the braid group $B_{n-1}$ on some exceptional sequences. This action admits finitely many orbits, each such orbit corresponding to an isotropic Schur root of a tame full subquiver of $Q$. △ Less

Submitted 18 May, 2016; originally announced May 2016.

Comments: 31 pages

MSC Class: 16G20

arXiv:1605.00125 [pdf, ps, other]

Efficiency of minimizing compositions of convex functions and smooth maps

Authors: Dmitriy Drusvyatskiy, Courtney Paquette

Abstract: We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency… ▽ More We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency $\mathcal{O}(\varepsilon^{-2})$, akin to gradient descent for smooth minimization. We show that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity $\widetilde{\mathcal{O}}(\varepsilon^{-3})$. The technique readily extends to minimizing an average of $m$ composite functions, with complexity $\widetilde{\mathcal{O}}(m/\varepsilon^{2}+\sqrt{m}/\varepsilon^{3})$ in expectation. We round off the paper with an inertial prox-linear method that automatically accelerates in presence of convexity. △ Less

Submitted 14 August, 2017; v1 submitted 30 April, 2016; originally announced May 2016.

MSC Class: 97N60; 90C25; 90C06; 90C30

arXiv:1508.04353 [pdf, ps, other]

Irreducible morphisms and locally finite dimensional representations

Authors: Charles Paquette

Abstract: Let $\mathcal{A}$ be a Hom-finite additive Krull-Schmidt $k$-category where $k$ is an algebraically closed field. Let ${\rm mod} \mathcal{A}$ denote the category of locally finite dimensional $\mathcal{A}$-modules, that is, the category of covariant functors $\mathcal{A} \to {\rm mod} k$. We prove that an irreducible monomorphism in ${\rm mod} \mathcal{A}$ has a finitely generated cokernel, and th… ▽ More Let $\mathcal{A}$ be a Hom-finite additive Krull-Schmidt $k$-category where $k$ is an algebraically closed field. Let ${\rm mod} \mathcal{A}$ denote the category of locally finite dimensional $\mathcal{A}$-modules, that is, the category of covariant functors $\mathcal{A} \to {\rm mod} k$. We prove that an irreducible monomorphism in ${\rm mod} \mathcal{A}$ has a finitely generated cokernel, and that an irreducible epimorphism in ${\rm mod} \mathcal{A}$ has a finitely co-generated kernel. Using this, we get that an almost split sequence in ${\rm mod} \mathcal{A}$ has to start with a finitely co-presented module and end with a finitely presented one. Finally, we apply our results in the study of ${\rm rep}(Q)$, the category of locally finite dimensional representations of a strongly locally finite quiver. We describe all possible shapes of the Auslander-Reiten quiver of ${\rm rep}(Q)$. △ Less

Submitted 4 January, 2016; v1 submitted 18 August, 2015; originally announced August 2015.

Comments: 17 pages, 1 figure

MSC Class: 16G20; 16G70; 16D90

arXiv:1505.06062 [pdf, ps, other]

Cluster categories of type $\mathbb{A}_\infty^\infty$ and triangulations of the infinite strip

Authors: Shi** Liu, Charles Paquette

Abstract: We first study the (canonical) orbit category of the bounded derived category of finite dimensional representations of a quiver with no infinite path, and we pay more attention on the case where the quiver is of infinite Dynkin type. In particular, its Auslander-Reiten components are explicitly described. When the quiver is of type $\mathbb{A}_\infty$ or $\mathbb{A}_\infty^\infty$, we show that th… ▽ More We first study the (canonical) orbit category of the bounded derived category of finite dimensional representations of a quiver with no infinite path, and we pay more attention on the case where the quiver is of infinite Dynkin type. In particular, its Auslander-Reiten components are explicitly described. When the quiver is of type $\mathbb{A}_\infty$ or $\mathbb{A}_\infty^\infty$, we show that this orbit category is a cluster category, that is, its cluster-tilting subcategories form a cluster structure. When the quiver is of type $\mathbb{A}_\infty^\infty$, we shall give a geometrical description of the cluster structure of the cluster category by using triangulations of the infinite strip in the plane. In particular, we shall show that the cluster-tilting subcategories are precisely given by compact triangulations. △ Less

Submitted 22 May, 2015; originally announced May 2015.

Comments: 41 pages

MSC Class: 16G20; 16G70

arXiv:1503.02054 [pdf, ps, other]

Accumulation points of real Schur roots

Authors: Charles Paquette

Abstract: Let $k$ be an algebraically closed field and $Q$ be an acyclic quiver with $n$ vertices. Consider the category ${\rm rep}(Q)$ of finite dimensional representations of $Q$ over $k$. The exceptional representations of $Q$, that is, the indecomposable objects of ${\rm rep}(Q)$ without self-extensions, correspond to the so-called real Schur roots of the usual root system attached to $Q$. These roots a… ▽ More Let $k$ be an algebraically closed field and $Q$ be an acyclic quiver with $n$ vertices. Consider the category ${\rm rep}(Q)$ of finite dimensional representations of $Q$ over $k$. The exceptional representations of $Q$, that is, the indecomposable objects of ${\rm rep}(Q)$ without self-extensions, correspond to the so-called real Schur roots of the usual root system attached to $Q$. These roots are special elements of the Grothendieck group $\mathbb{Z}^n$ of ${\rm rep}(Q)$. When we identify the dimension vectors of the representations (that is, the non-negative vectors of $\mathbb{Z}^n$) up to positive multiple, we see that the real Schur roots can accumulate in some directions of $\mathbb{R}^n \supset \mathbb{Z}^n$. This paper is devoted to the study of these accumulation points. After giving new properties of the canonical decomposition of dimension vectors, we show how to use this decomposition to describe the rational accumulation points. Finally, we study the irrational accumulation points and we give a complete description of them in case $Q$ is of weakly hyperbolic type. △ Less

Submitted 6 March, 2015; originally announced March 2015.

Comments: 27 pages

MSC Class: 16G20

arXiv:1405.5429 [pdf, ps, other]

Homological dimensions for co-rank one idempotent subalgebras

Authors: Colin Ingalls, Charles Paquette

Abstract: Let $k$ be an algebraically closed field and $A$ be a (left and right) Noetherian associative $k$-algebra. Assume further that $A$ is either positively graded or semiperfect (this includes the class of finite dimensional $k$-algebras, and $k$-algebras that are finitely generated modules over a Noetherian central Henselian ring). Let $e$ be a primitive idempotent of $A$, which we assume is of degre… ▽ More Let $k$ be an algebraically closed field and $A$ be a (left and right) Noetherian associative $k$-algebra. Assume further that $A$ is either positively graded or semiperfect (this includes the class of finite dimensional $k$-algebras, and $k$-algebras that are finitely generated modules over a Noetherian central Henselian ring). Let $e$ be a primitive idempotent of $A$, which we assume is of degree $0$ if $A$ is positively graded. We consider the idempotent subalgebra $Γ= (1-e)A(1-e)$ and $S_e$ the simple right $A$-module $S_e = eA/e{\rm rad}A$, where ${\rm rad}A$ is the Jacobson radical of $A$, or the graded Jacobson radical of $A$ if $A$ is positively graded. In this paper, we relate the homological dimensions of $A$ and $Γ$, using the homological properties of $S_e$. First, if $S_e$ has no self-extensions of any degree, then the global dimension of $A$ is finite if and only if that of $Γ$ is. On the other hand, if the global dimensions of both $A$ and $Γ$ are finite, then $S_e$ cannot have self-extensions of degree greater than one, provided $A/{\rm rad}A$ is finite dimensional. △ Less

Submitted 22 August, 2015; v1 submitted 21 May, 2014; originally announced May 2014.

Comments: 24 pages

MSC Class: 16E10; 16G10

arXiv:1212.1424 [pdf, ps, other]

Semi-stable subcategories for Euclidean quivers

Authors: Colin Ingalls, Charles Paquette, Hugh Thomas

Abstract: In this paper, we study the semi-stable subcategories of the category of representations of a Euclidean quiver, and the possible intersections of these subcategories. Contrary to the Dynkin case, we find out that the intersection of semi-stable subcategories may not be semi-stable. However, only a finite number of exceptions occur, and we give a description of these subcategories. Moreover, one ca… ▽ More In this paper, we study the semi-stable subcategories of the category of representations of a Euclidean quiver, and the possible intersections of these subcategories. Contrary to the Dynkin case, we find out that the intersection of semi-stable subcategories may not be semi-stable. However, only a finite number of exceptions occur, and we give a description of these subcategories. Moreover, one can attach a simplicial fan in $\mathbb{Q}^n$ to any acyclic quiver $Q$, and this simplicial fan allows one to completely determine the canonical presentation of any element in $\mathbb{Z}^n$. This fan has a nice description in the Dynkin and Euclidean cases: it is described using an arrangement of convex codimension-one subsets of $\mathbb{Q}^n$, each such subset being indexed by a real Schur root or a set of quasi-simple objects. This fan also characterizes when two different stability conditions give rise to the same semi-stable subcategory. △ Less

Submitted 13 March, 2015; v1 submitted 6 December, 2012; originally announced December 2012.

Comments: 39 pages

MSC Class: 16G20; 05E10

arXiv:1208.5032 [pdf, ps, other]

Standard components of a Krull-Schmidt category

Authors: Shi** Liu, Charles Paquette

Abstract: We provide criteria for an Auslander-Reiten component having sections of a Krull-Schmidt category to be standard. Specializing to the category of finitely presented representations of a strongly locally finite quiver and its bounded derived category, we obtain many new types of standard Auslander-Reiten components. An application to the module category of a finite-dimensional algebra yields some i… ▽ More We provide criteria for an Auslander-Reiten component having sections of a Krull-Schmidt category to be standard. Specializing to the category of finitely presented representations of a strongly locally finite quiver and its bounded derived category, we obtain many new types of standard Auslander-Reiten components. An application to the module category of a finite-dimensional algebra yields some interesting results. △ Less

Submitted 24 August, 2012; originally announced August 2012.

Comments: 14 pages

MSC Class: 16G70; 16G20; 16G10

arXiv:1203.5009 [pdf, ps, other]

Almost split sequences and approximations

Authors: Shi** Liu, Puiman Ng, Charles Paquette

Abstract: Let A be an exact category, that is, an extension-closed full subcategory of an abelian category. Firstly, we give some necessary and sufficient conditions for A to have almost split sequences. Then, we study when an almost split sequence in A induces an almost split sequence in an exact subcategory C of A. In case A has almost split sequences and C is Hom-finite Krull-Schmidt, this provides a nec… ▽ More Let A be an exact category, that is, an extension-closed full subcategory of an abelian category. Firstly, we give some necessary and sufficient conditions for A to have almost split sequences. Then, we study when an almost split sequence in A induces an almost split sequence in an exact subcategory C of A. In case A has almost split sequences and C is Hom-finite Krull-Schmidt, this provides a necessary and sufficient condition for C to have almost split sequences. Finally, we show two applications of these results. △ Less

Submitted 18 March, 2012; originally announced March 2012.

Comments: 16 pages

MSC Class: 16G70; 16G20; 18E10; 18E40

arXiv:1201.4833 [pdf, ps, other]

On the Auslander-Reiten quiver of the representations of an infinite quiver

Authors: Charles Paquette

Abstract: Let Q be a strongly locally finite quiver and denote by rep(Q) the category of locally finite dimensional representations of Q over some fixed field k. The main purpose of this paper is to get a better understanding of rep(Q) by means of its Auslander-Reiten quiver. To achieve this goal, we define a category C(Q) which is a full, abelian and Hom-finite subcategory of rep(Q) containing all the almo… ▽ More Let Q be a strongly locally finite quiver and denote by rep(Q) the category of locally finite dimensional representations of Q over some fixed field k. The main purpose of this paper is to get a better understanding of rep(Q) by means of its Auslander-Reiten quiver. To achieve this goal, we define a category C(Q) which is a full, abelian and Hom-finite subcategory of rep(Q) containing all the almost split sequences of rep(Q). We give a complete description of the Auslander-Reiten quiver of C(Q) by describing its connected components. Finally, we prove that these connected components are also connected components of the Auslander-Reiten quiver of rep(Q). We end the paper by giving a conjecture describing the Auslander-Reiten components of rep(Q) that cannot be obtained as Auslander-Reiten components of C(Q). △ Less

Submitted 6 September, 2012; v1 submitted 23 January, 2012; originally announced January 2012.

Comments: 28 pages, 1 figure

MSC Class: 16G20; 16G70

arXiv:1109.3176 [pdf, ps, other]

Representation Theory of an Infinite Quiver

Authors: Raymundo Bautista, Shi** Liu, Charles Paquette

Abstract: This paper deals with the representation theory of a locally finite quiver in which the number of paths between any two given vertices is finite. We first study some properties of the finitely presented or co-presented representations, and then construct in the category of locally finite dimensional representations some almost split sequences which start with a finitely co-presented representation… ▽ More This paper deals with the representation theory of a locally finite quiver in which the number of paths between any two given vertices is finite. We first study some properties of the finitely presented or co-presented representations, and then construct in the category of locally finite dimensional representations some almost split sequences which start with a finitely co-presented representation and end with a finitely presented representation. Furthermore, we obtain a general description of the shapes of the Auslander-Reiten components of the category of finitely presented representations and prove that the number of regular Auslander-Reiten components is infinite if and only if the quiver is not of finite or infinite Dynkin type. In the infinite Dynkin case, we shall give a complete list of the indecomposable representations and an explicit description of the Auslander-Reiten components. Finally, we apply these results to study the Auslander-Reiten theory in the derived category of bounded complexes of finitely presented representations. △ Less

Submitted 14 September, 2011; originally announced September 2011.

Comments: 66 pages

arXiv:1104.1195 [pdf, ps, other]

A non-existence theorem for almost split sequences

Authors: Charles Paquette

Abstract: Let k be a field, Q a quiver with countably many vertices and I an ideal of kQ such that kQ/I has finite dimensional Hom-spaces. In this note, we prove that there is no almost split sequence ending at an indecomposable not finitely presented representation of the bound quiver (Q,I). We then get that an indecomposable representation M of (Q,I) is the ending term of an almost split sequence if and o… ▽ More Let k be a field, Q a quiver with countably many vertices and I an ideal of kQ such that kQ/I has finite dimensional Hom-spaces. In this note, we prove that there is no almost split sequence ending at an indecomposable not finitely presented representation of the bound quiver (Q,I). We then get that an indecomposable representation M of (Q,I) is the ending term of an almost split sequence if and only if it is finitely presented and not projective. The dual results are also true. △ Less

Submitted 6 April, 2011; originally announced April 2011.

Comments: 8 pages

MSC Class: 16G70

arXiv:1103.5361 [pdf, ps, other]

A proof of the strong no loop conjecture

Authors: Kiyoshi Igusa, Shi** Liu, Charles Paquette

Abstract: The strong no loop conjecture states that a simple module of finite projective dimension over an artin algebra has no non-zero self-extension. The main result of this paper establishes this well known conjecture for finite dimensional algebras over an algebraically closed field. The strong no loop conjecture states that a simple module of finite projective dimension over an artin algebra has no non-zero self-extension. The main result of this paper establishes this well known conjecture for finite dimensional algebras over an algebraically closed field. △ Less

Submitted 28 March, 2011; originally announced March 2011.

Comments: 9 pages

MSC Class: 16G10

Journal ref: Advances in Mathematics 228 (2011) 2731-2742

arXiv:1011.3788 [pdf, ps, other]

doi 10.1088/1751-8113/44/36/368001

Comment on an information theoretic approach to the study of non-equilibrium steady states

Authors: Glenn C. Paquette

Abstract: We argue that there is a fundamental problem regarding the analysis that serves as the foundation for the papers {\it Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states} [R. Dewar, J. Phys. A: Math. Gen. {\bf 36} (2003), 631-641] and {\it Maximum entropy production and the fluctuation theorem} [R… ▽ More We argue that there is a fundamental problem regarding the analysis that serves as the foundation for the papers {\it Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states} [R. Dewar, J. Phys. A: Math. Gen. {\bf 36} (2003), 631-641] and {\it Maximum entropy production and the fluctuation theorem} [R. Dewar, J. Phys. A: Math. Gen. {\bf 38} (2005), L371-L381]. In particular, we demonstrate that this analysis is based on an assumption that is physically unrealistic and that, hence, the results obtained in those papers cannot be regarded as physically meaningful. △ Less

Submitted 16 November, 2010; originally announced November 2010.

Journal ref: J. Phys. A: Math. Theor. 44 (2011) 368001

arXiv:0905.3565 [pdf, ps, other]

Thermodynamics of non-equilibrium steady states

Authors: Glenn C. Paquette

Abstract: We consider the problem of constructing a thermodynamic theory of non-equilibrium steady states as a formal extension of the equilibrium theory. Specifically, studying a particular system, we attempt to construct a phenomenological theory describing the interplay between heat and mechanical work that takes place during operations through which the system undergoes transitions between non-equilib… ▽ More We consider the problem of constructing a thermodynamic theory of non-equilibrium steady states as a formal extension of the equilibrium theory. Specifically, studying a particular system, we attempt to construct a phenomenological theory describing the interplay between heat and mechanical work that takes place during operations through which the system undergoes transitions between non-equilibrium steady states. We find that, in contrast to the case of the equilibrium theory, apparently, there exists no systematic way within a phenomenological formulation to describe the work done by the system during such operations. With this observation, we conclude that the attempt to construct a thermodynamic theory of non-equilibrium steady states in analogy to the equilibrium theory has limited prospects for success and that the pursuit of such a theory should be directed elsewhere. △ Less

Submitted 11 June, 2009; v1 submitted 21 May, 2009; originally announced May 2009.

arXiv:cond-mat/9308037 [pdf, ps, other]

doi 10.1103/PhysRevLett.72.76

Structural Stability and Renormalization Group for Propagating Fronts

Authors: G. C. Paquette, Lin-Yuan Chen, Nigel Goldenfeld, Y. Oono

Abstract: A solution to a given equation is structurally stable if it suffers only an infinitesimal change when the equation (not the solution) is perturbed infinitesimally. We have found that structural stability can be used as a velocity selection principle for propagating fronts. We give examples, using numerical and renormalization group methods. A solution to a given equation is structurally stable if it suffers only an infinitesimal change when the equation (not the solution) is perturbed infinitesimally. We have found that structural stability can be used as a velocity selection principle for propagating fronts. We give examples, using numerical and renormalization group methods. △ Less

Submitted 31 August, 1993; originally announced August 1993.

Comments: 14 pages, uiucmac.tex, no figures

Report number: P-93-08-072

Showing 1–47 of 47 results for author: Paquette, C