Search | arXiv e-print repository

Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes

Authors: Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan Hu

Abstract: In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP c… ▽ More In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward sha** and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers' trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 54 pages, 18 Figures

arXiv:2405.04329 [pdf, ps, other]

On the $K$-theory of $\mathbf{Z}/p^n$

Authors: Benjamin Antieau, Achim Krause, Thomas Nikolaus

Abstract: We give an explicit algebraic description, based on prismatic cohomology, of the algebraic K-groups of rings of the form $O_K/I$ where $K$ is a p-adic field and $I$ is a non-trivial ideal in the ring of integers $O_K$; this class includes the rings $\mathbf{Z}/p^n$ where $p$ is a prime. The algebraic description allows us to describe a practical algorithm to compute individual K-groups as well a… ▽ More We give an explicit algebraic description, based on prismatic cohomology, of the algebraic K-groups of rings of the form $O_K/I$ where $K$ is a p-adic field and $I$ is a non-trivial ideal in the ring of integers $O_K$; this class includes the rings $\mathbf{Z}/p^n$ where $p$ is a prime. The algebraic description allows us to describe a practical algorithm to compute individual K-groups as well as to obtain several theoretical results: the vanishing of the even K-groups in high degrees, the determination of the orders of the odd K-groups in high degrees, and the degree of nilpotence of $v_1$ acting on the mod $p$ syntomic cohomology of $\mathbf{Z}/p^n$. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2402.06562 [pdf, other]

Safe Guaranteed Exploration for Non-linear Systems

Authors: Manish Prajapat, Johannes Köhler, Matteo Turchetta, Andreas Krause, Melanie N. Zeilinger

Abstract: Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind result… ▽ More Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. Based on this framework we propose an efficient algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC improves efficiency by incorporating three techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2312.12971 [pdf, ps, other]

Witt vectors with coefficients and TR

Authors: Emanuele Dotto, Achim Krause, Thomas Nikolaus, Irakli Patchkoria

Abstract: We give a new construction of $p$-typical Witt vectors with coefficients in terms of ghost maps and show that this construction is isomorphic to the one defined in terms of formal power series from the authors' previous paper. We show that our construction recovers Kaledin's polynomial Witt vectors in the case of vector spaces over a perfect field of characteristic $p$. We then identify the compon… ▽ More We give a new construction of $p$-typical Witt vectors with coefficients in terms of ghost maps and show that this construction is isomorphic to the one defined in terms of formal power series from the authors' previous paper. We show that our construction recovers Kaledin's polynomial Witt vectors in the case of vector spaces over a perfect field of characteristic $p$. We then identify the components of the $p$-typical TR with coefficients, originally defined by Lindenstrauss and McCarthy and later reworked by the second and third authors in joint work with McCandless, with the $p$-typical Witt vectors with coefficients. This extends a celebrated result of Hesselholt and Hesselholt-Madsen relating the components of TR with the Witt vectors. As an application, we given an algebraic description of the components of the Hill-Hopkins-Ravenel norm for cyclic $p$-groups in terms of $p$-typical Witt vectors with coefficients. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 51 pages

arXiv:2311.16706 [pdf, ps, other]

Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm

Authors: Mohammad Reza Karimi, Ya-** Hsieh, Andreas Krause

Abstract: Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we b… ▽ More Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schrödinger equation" of (Claisse et al. 2023). △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.02374 [pdf, other]

Riemannian stochastic optimization methods avoid strict saddle points

Authors: Ya-** Hsieh, Mohammad Reza Karimi, Andreas Krause, Panayotis Mertikopoulos

Abstract: Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesical… ▽ More Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 27 pages, 3 figures

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C48

arXiv:2310.19848 [pdf, other]

Efficient Exploration in Continuous-time Model-based Reinforcement Learning

Authors: Lenart Treven, Jonas Hübotter, Bhavya Sukhija, Florian Dörfler, Andreas Krause

Abstract: Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use t… ▽ More Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.18535 [pdf, other]

Contextual Stochastic Bilevel Optimization

Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

Abstract: We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the u… ▽ More We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: The paper is accepted by NeurIPS 2023

arXiv:2310.12770 [pdf, ps, other]

Prismatic cohomology relative to $δ$-rings

Authors: Benjamin Antieau, Achim Krause, Thomas Nikolaus

Abstract: We develop prismatic and syntomic cohomology relative to a $δ$-ring. This simultaneously generalizes Bhatt and Scholze's absolute and relative prismatic cohomology and shows that the latter, which was defined relative to a prism, is in fact independent of the prism structure and only depends on the underlying $δ$-ring. We give several possible definitions of our new version of prismatic cohomology… ▽ More We develop prismatic and syntomic cohomology relative to a $δ$-ring. This simultaneously generalizes Bhatt and Scholze's absolute and relative prismatic cohomology and shows that the latter, which was defined relative to a prism, is in fact independent of the prism structure and only depends on the underlying $δ$-ring. We give several possible definitions of our new version of prismatic cohomology: a site theoretic definition, one using prismatic crystals, and a stack theoretic definition. These are equivalent under mild syntomicity hypotheses. As an application, we note how the theory of prismatic cohomology of filtered rings arises naturally in this context. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2308.01795 [pdf, ps, other]

A note on quadratic forms

Authors: Fabian Hebestreit, Achim Krause, Maxime Ramzi

Abstract: For a field extension $L/K$ we consider maps that are quadratic over $L$ but whose polarisation is only bilinear over $K$. Our main result is that all such are automatically quadratic forms over $L$ in the usual sense if and only if $L/K$ is formally unramified. In particular, this shows that over finite and number fields, one of the axioms in the standard definition of quadratic forms is superflu… ▽ More For a field extension $L/K$ we consider maps that are quadratic over $L$ but whose polarisation is only bilinear over $K$. Our main result is that all such are automatically quadratic forms over $L$ in the usual sense if and only if $L/K$ is formally unramified. In particular, this shows that over finite and number fields, one of the axioms in the standard definition of quadratic forms is superfluous. △ Less

Submitted 6 February, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 13 pages, v3: Minor changes, to appear in Bulletin of the LMS

Report number: CPH-GEOTOP-DNRF151

arXiv:2302.07686 [pdf, ps, other]

Polygonic spectra and TR with coefficients

Authors: Achim Krause, Jonas McCandless, Thomas Nikolaus

Abstract: We introduce the notion of a polygonic spectrum which is designed to axiomatize the structure on topological Hochschild homology $\mathrm{THH}(R,M)$ of an $\mathbb{E}_1$-ring $R$ with coefficients in an $R$-bimodule $M$. For every polygonic spectrum $X$, we define a spectrum $\mathrm{TR}(X)$ as the map** spectrum from the polygonic version of the sphere spectrum $\mathbb{S}$ to $X$. In particula… ▽ More We introduce the notion of a polygonic spectrum which is designed to axiomatize the structure on topological Hochschild homology $\mathrm{THH}(R,M)$ of an $\mathbb{E}_1$-ring $R$ with coefficients in an $R$-bimodule $M$. For every polygonic spectrum $X$, we define a spectrum $\mathrm{TR}(X)$ as the map** spectrum from the polygonic version of the sphere spectrum $\mathbb{S}$ to $X$. In particular if applied to $X = \mathrm{THH}(R,M)$ this gives a conceptual definition of $\mathrm{TR}(R,M)$. Every cyclotomic spectrum gives rise to a polygonic spectrum and we prove that TR agrees with the classical definition of TR in this case. We construct Frobenius and Verschiebung maps on $\mathrm{TR}(X)$ by exhibiting $\mathrm{TR}(X)$ as the $\mathbb{Z}$-fixedpoints of a quasifinitely genuine $\mathbb{Z}$-spectrum. The notion of quasifinitely genuine $\mathbb{Z}$-spectra is a new notion that we introduce and discuss inspired by a similar notion over $\mathbb{Z}$ introduced by Kaledin. Besides the usual coherences for genuine spectra, this notion additionally encodes that $\mathrm{TR}(X)$ admits certain infinite sums of Verschiebung maps. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 61 pages, comments are welcome

Report number: MPIM-Bonn-2023

arXiv:2301.09943 [pdf, other]

Learning To Dive In Branch And Bound

Authors: Max B. Paulus, Andreas Krause

Abstract: Primal heuristics are important for solving mixed integer linear programs, because they find feasible solutions that facilitate branch and bound search. A prominent group of primal heuristics are diving heuristics. They iteratively modify and resolve linear programs to conduct a depth-first search from any node in the search tree. Existing divers rely on generic decision rules that fail to exploit… ▽ More Primal heuristics are important for solving mixed integer linear programs, because they find feasible solutions that facilitate branch and bound search. A prominent group of primal heuristics are diving heuristics. They iteratively modify and resolve linear programs to conduct a depth-first search from any node in the search tree. Existing divers rely on generic decision rules that fail to exploit structural commonality between similar problem instances that often arise in practice. Therefore, we propose L2Dive to learn specific diving heuristics with graph neural networks: We train generative models to predict variable assignments and leverage the duality of linear programs to make diving decisions based on the model's predictions. L2Dive is fully integrated into the open-source solver SCIP. We find that L2Dive outperforms standard divers to find better feasible solutions on a range of combinatorial optimization problems. For real-world applications from server load balancing and neural network verification, L2Dive improves the primal-dual integral by up to 7% (35%) on average over a tuned (default) solver baseline and reduces average solving time by 20% (29%). △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2210.13867 [pdf, ps, other]

A Dynamical System View of Langevin-Based Non-Convex Sampling

Authors: Mohammad Reza Karimi, Ya-** Hsieh, Andreas Krause

Abstract: Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture… ▽ More Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture the scales of the variables such as Wasserstein distances, and (3) mainly apply to elementary schemes such as stochastic gradient Langevin dynamics. In this paper, we develop a new framework that lifts the above issues by harnessing several tools from the theory of dynamical systems. Our key result is that, for a large class of state-of-the-art sampling schemes, their last-iterate convergence in Wasserstein distances can be reduced to the study of their continuous-time counterparts, which is much better understood. Coupled with standard assumptions of MCMC sampling, our theory immediately yields the last-iterate Wasserstein convergence of many advanced sampling schemes such as proximal, randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our framework also motivates more efficient schemes that enjoy the same rigorous guarantees. △ Less

Submitted 13 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: typos corrected, references added

MSC Class: 62D05

arXiv:2210.06380 [pdf, other]

Near-Optimal Multi-Agent Learning for Safe Coverage Control

Authors: Manish Prajapat, Matteo Turchetta, Melanie N. Zeilinger, Andreas Krause

Abstract: In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim… ▽ More In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2207.10415 [pdf, other]

Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning

Authors: Ilnura Usmanova, Yarden As, Maryam Kamgarpour, Andreas Krause

Abstract: Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the fin… ▽ More Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the final output of the algorithm. We introduce a general approach for seeking a stationary point in high dimensional non-linear stochastic optimization problems in which maintaining safety during learning is crucial. Our approach called LB-SGD is based on applying stochastic gradient descent (SGD) with a carefully chosen adaptive step size to a logarithmic barrier approximation of the original problem. We provide a complete convergence analysis of non-convex, convex, and strongly-convex smooth constrained problems, with first-order and zeroth-order feedback. Our approach yields efficient updates and scales better with dimensionality compared to existing approaches. We empirically compare the sample complexity and the computational cost of our method with existing safe learning approaches. Beyond synthetic benchmarks, we demonstrate the effectiveness of our approach on minimizing constraint violation in policy search tasks in safe reinforcement learning (RL). △ Less

Submitted 2 June, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 36 pages, 9 pages of appendix

arXiv:2206.13414 [pdf, other]

Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Authors: Max B. Paulus, Giulia Zarpellon, Andreas Krause, Laurent Charlin, Chris J. Maddison

Abstract: Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvemen… ▽ More Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: ICML 2022

arXiv:2206.06795 [pdf, other]

Riemannian stochastic approximation algorithms

Authors: Mohammad Reza Karimi, Ya-** Hsieh, Panayotis Mertikopoulos, Andreas Krause

Abstract: We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this dif… ▽ More We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this difficulty by introducing a suitable Fermi coordinate frame which allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM) algorithms under study to that of an associated deterministic dynamical system. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, despite the significant complications that arise due to the curvature and topology of the underlying manifold. We showcase the flexibility of the proposed framework by applying it to a range of retraction-based variants of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence. △ Less

Submitted 27 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 33 pages, 2 figures; a one-page abstract of this paper was presented in COLT 2022

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C47; 90C48

arXiv:2205.13627 [pdf, other]

Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces

Authors: Mojmír Mutný, Andreas Krause

Abstract: Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate the optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, whic… ▽ More Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate the optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, which allows estimating parameters without bias. We generalize this framework to RKHSs, and allow for the linear functional to be only approximately inferred, i.e., with a fixed bias. This scenario captures many important modern applications, such as estimation of gradient maps, integrals, and solutions to differential equations. We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise, enabling us to certify estimation with bounded error with high probability. △ Less

Submitted 15 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Journal ref: NeurIPS 2022

arXiv:2204.03420 [pdf, ps, other]

On the K-theory of $\mathbb{Z}/p^n$ -- announcement

Authors: Benjamin Antieau, Achim Krause, Thomas Nikolaus

Abstract: We announce new methods for using prismatic cohomology to compute the K-groups of $\mathbb{Z}/p^n$ and related rings. We use computer algebra methods to compute these K-groups through a large range in specific cases and also obtain explicit formulas for their orders in large degrees. We announce new methods for using prismatic cohomology to compute the K-groups of $\mathbb{Z}/p^n$ and related rings. We use computer algebra methods to compute these K-groups through a large range in specific cases and also obtain explicit formulas for their orders in large degrees. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: Comments welcome!

arXiv:2110.14296 [pdf, other]

Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems

Authors: Andreas Schlaginhaufen, Philippe Wenk, Andreas Krause, Florian Dörfler

Abstract: Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial obser… ▽ More Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial observations are the norm. As we will demonstrate, initialization of unobserved augmented states can become a key problem for neural ODEs. To alleviate this issue, we propose to augment the system's state with its history. Inspired by state augmentation in discrete-time systems, we thus obtain neural delay differential equations. Based on classical time delay stability analysis, we then show how to ensure stability of the learned models, and theoretically analyze our approach. Our experiments demonstrate its applicability to stable system identification of partially observed systems and learning a stabilizing feedback policy in delayed feedback control. △ Less

Submitted 10 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Published at NeurIPS 2021

Journal ref: Advances in Neural Information Processing Systems, 2021

arXiv:2109.09835 [pdf, ps, other]

Fast Projection Onto Convex Smooth Constraints

Authors: Ilnura Usmanova, Maryam Kamgarpour, Andreas Krause, Kfir Yehuda Levy

Abstract: The Euclidean projection onto a convex set is an important problem that arises in numerous constrained optimization tasks. Unfortunately, in many cases, computing projections is computationally demanding. In this work, we focus on projection problems where the constraints are smooth and the number of constraints is significantly smaller than the dimension. The runtime of existing approaches to sol… ▽ More The Euclidean projection onto a convex set is an important problem that arises in numerous constrained optimization tasks. Unfortunately, in many cases, computing projections is computationally demanding. In this work, we focus on projection problems where the constraints are smooth and the number of constraints is significantly smaller than the dimension. The runtime of existing approaches to solving such problems is either cubic in the dimension or polynomial in the inverse of the target accuracy. Conversely, we propose a simple and efficient primal-dual approach, with a runtime that scales only linearly with the dimension, and only logarithmically in the inverse of the target accuracy. We empirically demonstrate its performance, and compare it with standard baselines. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2106.11609 [pdf, other]

Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models

Authors: Lenart Treven, Philippe Wenk, Florian Dörfler, Andreas Krause

Abstract: Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of pre… ▽ More Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate. △ Less

Submitted 15 October, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: Published at NeurIPS 2021

Journal ref: Advances in Neural Information Processing Systems, 2021

arXiv:2106.07445 [pdf, other]

PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

Authors: Carl-Johann Simon-Gabriel, Noman Ahmed Sheikh, Andreas Krause

Abstract: Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed… ▽ More Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed for classifiers with probabilistic outputs. It is based on the HopSkipJump attack by Chen et al. (2019, arXiv:1904.02144v5 ), a strong and query efficient decision-based attack originally designed for deterministic classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries to maintain HopSkipJump's original output quality across various noise levels, while converging to its query efficiency as the noise level decreases. We test our attack on various noise models, including state-of-the-art off-the-shelf randomized defenses, and show that they offer almost no extra robustness to decision-based attacks. Code is available at https://github.com/cjsg/PopSkipJump . △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: ICML'21. Code available at https://github.com/cjsg/PopSkipJump . 9 pages & 7 figures in main part, 14 pages & 10 figures in appendix

arXiv:2106.04443 [pdf, other]

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Authors: Tobias Sutter, Andreas Krause, Daniel Kuhn

Abstract: Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optim… ▽ More Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes. △ Less

Submitted 26 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: 23 pages, 4 figures

Journal ref: NeurIPS 2021

arXiv:2011.09345 [pdf, ps, other]

Map** spaces in homotopy coherent nerves

Authors: Fabian Hebestreit, Achim Krause

Abstract: We give a direct proof that middle map** spaces in coherent nerves of Kan enriched categories have the same homotopy type as the original map** spaces. We give a direct proof that middle map** spaces in coherent nerves of Kan enriched categories have the same homotopy type as the original map** spaces. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: 14 pages

MSC Class: 18N60; 55U10

arXiv:2008.05551 [pdf, ps, other]

The Picard group in equivariant homotopy theory via stable module categories

Authors: Achim Krause

Abstract: We develop a mechanism of "isotropy separation for compact objects" that explicitly describes an invertible $G$-spectrum through its collection of geometric fixed points and gluing data located in certain variants of the stable module category. As an application, we carry out a complete analysis of invertible G-spectra in the case $G=A_5$. A further application is given by showing that the Picard… ▽ More We develop a mechanism of "isotropy separation for compact objects" that explicitly describes an invertible $G$-spectrum through its collection of geometric fixed points and gluing data located in certain variants of the stable module category. As an application, we carry out a complete analysis of invertible G-spectra in the case $G=A_5$. A further application is given by showing that the Picard groups of $\mathrm{Sp}^G$ and a category of derived Mackey functors agree. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 37 pages

arXiv:2003.02658 [pdf, other]

SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives

Authors: Emmanouil Angelis, Philippe Wenk, Bernhard Schölkopf, Stefan Bauer, Andreas Krause

Abstract: Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and expo… ▽ More Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior. To furthermore illustrate the practical applicability of our method, we then apply it to ODIN, a recently developed algorithm for ODE parameter inference. In an extensive experiments section, all results are empirically validated, demonstrating the speed, accuracy, and practical applicability of this approach. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:2002.01538 [pdf, ps, other]

Witt vectors with coefficients and characteristic polynomials over non-commutative rings

Authors: Emanuele Dotto, Achim Krause, Thomas Nikolaus, Irakli Patchkoria

Abstract: For a not-necessarily commutative ring R we define an abelian group W(R;M) of Witt vectors with coefficients in an R-bimodule M. These groups generalize the usual big Witt vectors of commutative rings and we prove that they have analogous formal properties and structure. One main result is that W(R) := W(R;R) is Morita invariant in R. For an R-linear endomorphism f of a finitely generated projec… ▽ More For a not-necessarily commutative ring R we define an abelian group W(R;M) of Witt vectors with coefficients in an R-bimodule M. These groups generalize the usual big Witt vectors of commutative rings and we prove that they have analogous formal properties and structure. One main result is that W(R) := W(R;R) is Morita invariant in R. For an R-linear endomorphism f of a finitely generated projective R-module we define a characteristic element $χ_f \in W(R)$. This element is a non-commutative analogue of the classical characteristic polynomial and we show that it has similar properties. The assignment $f \mapsto χ_f$ induces an isomorphism between a suitable completion of cyclic K-theory and W(R). △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: 39 pages

arXiv:1912.09478 [pdf, other]

Log Barriers for Safe Non-convex Black-box Optimization

Authors: Ilnura Usmanova, Andreas Krause, Maryam Kamgarpour

Abstract: We address the problem of minimizing a smooth function $f^0(x)$ over a compact set $D$ defined by smooth functional constraints $f^i(x)\leq 0,~ i = 1,\ldots, m$ given noisy value measurements of $f^i(x)$. This problem arises in safety-critical applications, where certain parameters need to be adapted online in a data-driven fashion, such as in personalized medicine, robotics, manufacturing, etc. I… ▽ More We address the problem of minimizing a smooth function $f^0(x)$ over a compact set $D$ defined by smooth functional constraints $f^i(x)\leq 0,~ i = 1,\ldots, m$ given noisy value measurements of $f^i(x)$. This problem arises in safety-critical applications, where certain parameters need to be adapted online in a data-driven fashion, such as in personalized medicine, robotics, manufacturing, etc. In such cases, it is important to ensure constraints are not violated while taking measurements and seeking the minimum of the cost function. We propose a new algorithm s0-LBM, which provides provably feasible iterates with high probability and applies to the challenging case of uncertain zero-th order oracle. We also analyze the convergence rate of the algorithm, and empirically demonstrate its effectiveness. △ Less

Submitted 19 December, 2019; originally announced December 2019.

Comments: under review

arXiv:1912.09466 [pdf, other]

Safe non-smooth black-box optimization with application to policy search

Authors: Ilnura Usmanova, Andreas Krause, Maryam Kamgarpour

Abstract: For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $\min f^0(x)$ subject to $f^i(x)\leq 0,~ i = 1,\ldots, m$, at the same time, guaranteeing constraint satisfactio… ▽ More For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $\min f^0(x)$ subject to $f^i(x)\leq 0,~ i = 1,\ldots, m$, at the same time, guaranteeing constraint satisfaction while learning an optimal solution with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem. △ Less

Submitted 23 February, 2021; v1 submitted 19 December, 2019; originally announced December 2019.

arXiv:1910.11561 [pdf, other]

Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

Authors: Mojmír Mutný, Michał Dereziński, Andreas Krause

Abstract: We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\bM$. However, we show that when the coordi… ▽ More We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\bM$. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix $\bM$, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of $\bM$. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling. △ Less

Submitted 12 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

Journal ref: AISTATS 2020

arXiv:1907.11863 [pdf, ps, other]

Schauder Bases Having Many Good Block Basic Sequences

Authors: Cory A. Krause

Abstract: In the study of asymptotic geometry in Banach spaces, a basic sequence which gives rise to a spreading model has been called a good sequence. It is well known that every normalized basic sequence in a Banach space has a subsequence which is good. We investigate the assumption that every normalized block tree relative to a basis has a branch which is good. This combinatorial property turns out to b… ▽ More In the study of asymptotic geometry in Banach spaces, a basic sequence which gives rise to a spreading model has been called a good sequence. It is well known that every normalized basic sequence in a Banach space has a subsequence which is good. We investigate the assumption that every normalized block tree relative to a basis has a branch which is good. This combinatorial property turns out to be very strong and is equivalent to the space being $1$-asymptotic $\ell_p$ for some $1\leq p\leq\infty$. We also investigate the even stronger assumption that every block basic sequence of a basis is good. Finally, using the Hindman-Milliken-Taylor theorem, we prove a stabilization theorem which produces a basic sequence all of whose normalized constant coefficient block basic sequences are good, and we present an application of this stabilization. △ Less

Submitted 8 January, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

Comments: 21 pages

MSC Class: Primary 46B03; 46B06; 46B25; 46B45; Secondary 05D10

arXiv:1907.03477 [pdf, ps, other]

Bökstedt periodicity and quotients of DVRs

Authors: Achim Krause, Thomas Nikolaus

Abstract: In this note we compute the topological Hochschild homology of quotients of DVRs. Along the way we give a short argument for Bökstedt periodicity and generalizations over various other bases. Our strategy also gives a very efficient way to redo the computations of THH (resp. logarithmic THH) of complete DVRs originally due to Lindenstrauss-Madsen (resp. Hesselholt-Madsen). In this note we compute the topological Hochschild homology of quotients of DVRs. Along the way we give a short argument for Bökstedt periodicity and generalizations over various other bases. Our strategy also gives a very efficient way to redo the computations of THH (resp. logarithmic THH) of complete DVRs originally due to Lindenstrauss-Madsen (resp. Hesselholt-Madsen). △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: 35 pages

arXiv:1903.04626 [pdf, other]

Safe Convex Learning under Uncertain Constraints

Authors: Ilnura Usmanova, Andreas Krause, Maryam Kamgarpour

Abstract: We address the problem of minimizing a convex smooth function $f(x)$ over a compact polyhedral set $D$ given a stochastic zeroth-order constraint feedback model. This problem arises in safety-critical machine learning applications, such as personalized medicine and robotics. In such cases, one needs to ensure constraints are satisfied while exploring the decision space to find optimum of the loss… ▽ More We address the problem of minimizing a convex smooth function $f(x)$ over a compact polyhedral set $D$ given a stochastic zeroth-order constraint feedback model. This problem arises in safety-critical machine learning applications, such as personalized medicine and robotics. In such cases, one needs to ensure constraints are satisfied while exploring the decision space to find optimum of the loss function. We propose a new variant of the Frank-Wolfe algorithm, which applies to the case of uncertain linear constraints. Using robust optimization, we provide the convergence rate of the algorithm while guaranteeing feasibility of all iterates, with high probability. △ Less

Submitted 9 December, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: 15 pages, 7 figures, AISTATS 2019

arXiv:1902.08480 [pdf, other]

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs

Authors: Gabriele Abbati, Philippe Wenk, Michael A Osborne, Andreas Krause, Bernhard Schölkopf, Stefan Bauer

Abstract: Stochastic differential equations are an important modeling class in many disciplines. Consequently, there exist many methods relying on various discretization and numerical integration schemes. In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system. Using state-of-the-art adversarial and moment mat… ▽ More Stochastic differential equations are an important modeling class in many disciplines. Consequently, there exist many methods relying on various discretization and numerical integration schemes. In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system. Using state-of-the-art adversarial and moment matching inference techniques, we avoid the discretization schemes of classical approaches. This leads to significant improvements in parameter accuracy and robustness given random initial guesses. On four established benchmark systems, we compare the performance of our algorithms to state-of-the-art solutions based on extended Kalman filtering and Gaussian processes. △ Less

Submitted 28 May, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: Published at the Thirty-sixth International Conference on Machine Learning (ICML 2019)

arXiv:1902.06278 [pdf, other]

ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems

Authors: Philippe Wenk, Gabriele Abbati, Michael A Osborne, Bernhard Schölkopf, Andreas Krause, Stefan Bauer

Abstract: Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of exper… ▽ More Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of experiments, our approach outperforms the current state of the art for parameter inference both in terms of accuracy and computational cost. It also shows promising results for the much more challenging problem of model selection. △ Less

Submitted 5 December, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

Comments: Published at the Thirty-fourth AAAI Conference on Artificial Intelligence

arXiv:1810.11050 [pdf, ps, other]

C-motivic modular forms

Authors: Bogdan Gheorghe, Daniel C. Isaksen, Achim Krause, Nicolas Ricka

Abstract: We construct a topological model for cellular, 2-complete, stable C-motivic homotopy theory that uses no algebro-geometric foundations. We compute the Steenrod algebra in this context, and we construct a "motivic modular forms" spectrum over C. We construct a topological model for cellular, 2-complete, stable C-motivic homotopy theory that uses no algebro-geometric foundations. We compute the Steenrod algebra in this context, and we construct a "motivic modular forms" spectrum over C. △ Less

Submitted 25 October, 2018; originally announced October 2018.

MSC Class: Primary 14F42; 55N34; 55S10; Secondary 55Q45; 55T15

arXiv:1703.02100 [pdf, other]

Guarantees for Greedy Maximization of Non-submodular Functions with Applications

Authors: Andrew An Bian, Joachim M. Buhmann, Andreas Krause, Sebastian Tschiatschek

Abstract: We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular… ▽ More We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular functions, e.g., the Bayesian A-optimality objective in experimental design. We prove theoretical guarantees supporting the empirical performance. Our guarantees are characterized by a combination of the (generalized) curvature $α$ and the submodularity ratio $γ$. In particular, we prove that Greedy enjoys a tight approximation guarantee of $\frac{1}α(1- e^{-γα})$ for cardinality constrained maximization. In addition, we bound the submodularity ratio and curvature for several important real-world objectives, including the Bayesian A-optimality objective, the determinantal function of a square submatrix and certain linear programs with combinatorial constraints. We experimentally validate our theoretical findings for both synthetic and real-world applications. △ Less

Submitted 14 May, 2019; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: published at ICML 2017. First author is now known as Yatao Bian <[email protected]>. ORCID: https://orcid.org/0000-0002-2368-4084

arXiv:1702.03683 [pdf, ps, other]

Vanishing lines for modules over the motivic Steenrod algebra

Authors: Drew Heard, Achim Krause

Abstract: We study criteria for freeness and for the existence of a vanishing line for modules over certain Hopf subalgebras of the motivic Steenrod algebra over $\mathrm{Spec}(\mathbb{C})$ at the prime 2. These turn out to be determined by the vanishing of certain Margolis homology groups in the quotient Hopf algebra $\mathcal{A}/τ$. We study criteria for freeness and for the existence of a vanishing line for modules over certain Hopf subalgebras of the motivic Steenrod algebra over $\mathrm{Spec}(\mathbb{C})$ at the prime 2. These turn out to be determined by the vanishing of certain Margolis homology groups in the quotient Hopf algebra $\mathcal{A}/τ$. △ Less

Submitted 24 January, 2018; v1 submitted 13 February, 2017; originally announced February 2017.

Comments: Version to appear (with minor typesetting changes) in New York Journal of Mathematics

MSC Class: 14F42; 55S10

Journal ref: New York J. Math 24(2018) 183-199

arXiv:1605.00609 [pdf, other]

Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions

Authors: Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

Abstract: A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}φ_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $φ$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allow… ▽ More A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}φ_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $φ$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allows for the presence of a sparse number of second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, with $|\mathcal{S}_1| \ll d, |\mathcal{S}_2| \ll d^2$, the function $f$ is now assumed to be of the form: $\sum_{p \in \mathcal{S}_1}φ_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}φ_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$. Assuming we have the freedom to query $f$ anywhere in its domain, we derive efficient algorithms that provably recover $\mathcal{S}_1,\mathcal{S}_2$ with finite sample bounds. Our analysis covers the noiseless setting where exact samples of $f$ are obtained, and also extends to the noisy setting where the queries are corrupted with noise. For the noisy setting in particular, we consider two noise models namely: i.i.d Gaussian noise and arbitrary but bounded noise. Our main methods for identification of $\mathcal{S}_2$ essentially rely on estimation of sparse Hessian matrices, for which we provide two novel compressed sensing based schemes. Once $\mathcal{S}_1, \mathcal{S}_2$ are known, we show how the individual components $φ_p$, $φ_{(l,l^{\prime})}$ can be estimated via additional queries of $f$, with uniform error bounds. Lastly, we provide simulation results on synthetic data that validate our theoretical findings. △ Less

Submitted 8 May, 2017; v1 submitted 2 May, 2016; originally announced May 2016.

Comments: To appear in Information and Inference: A Journal of the IMA. Made following changes after review process: (a) Corrected typos throughout the text. (b) Corrected choice of sampling distribution in Section 5, see eqs. (5.2), (5.3). (c) More detailed comparison with existing work in Section 8. (d) Added Section B in appendix on roots of cubic equation

arXiv:1408.0520 [pdf, ps, other]

Asymptotic Dynamics of Stochastic $p$-Laplace Equations on Unbounded Domains

Authors: Andrew Krause

Abstract: This thesis is concerned with the asymptotic behavior of solutions of stochastic $p$-Laplace equations driven by non-autonomous forcing on $\mathbb{R}^n$. Two cases are studied, with additive and multiplicative noise respectively. Estimates on the tails of solutions are used to overcome the non-compactness of Sobolev embeddings on unbounded domains, and prove asymptotic compactness of solution ope… ▽ More This thesis is concerned with the asymptotic behavior of solutions of stochastic $p$-Laplace equations driven by non-autonomous forcing on $\mathbb{R}^n$. Two cases are studied, with additive and multiplicative noise respectively. Estimates on the tails of solutions are used to overcome the non-compactness of Sobolev embeddings on unbounded domains, and prove asymptotic compactness of solution operators in $L^2(\mathbb{R}^n)$. Using this result we prove the existence and uniqueness of random attractors in each case. Additionally, we show the upper semicontinuity of the attractor for the multiplicative noise case as the intensity of the noise approaches zero. △ Less

Submitted 3 August, 2014; originally announced August 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1309.1211

MSC Class: 35B40 (Primary) 35B41; 37L30 (Secondary)

arXiv:1403.2278 [pdf, ps, other]

Bianchi's classification of 3-dimensional Lie algebras revisited

Authors: Manuel Glas, Panagiotis Konstantis, Achim Krause, Frank Loose

Abstract: We present Bianchi's proof on the classification of real (and complex) $3$-dimensional Lie algebras in a coordinate free version from a strictly representation theoretic point of view. Nearby we also compute the automorphism groups and from this the orbit dimensions of the corresponding orbits in the algebraic variety $X\subseteqΛ^2V^*\otimes V$ describing all Lie brackets on a fixed vector space… ▽ More We present Bianchi's proof on the classification of real (and complex) $3$-dimensional Lie algebras in a coordinate free version from a strictly representation theoretic point of view. Nearby we also compute the automorphism groups and from this the orbit dimensions of the corresponding orbits in the algebraic variety $X\subseteqΛ^2V^*\otimes V$ describing all Lie brackets on a fixed vector space $V$ of dimension $3$. Moreover we clarify which orbits lie in the closure of a given orbit and therefore the topology on the orbit space $X/G$ with $G=\mathrm{Aut}(V)$. △ Less

Submitted 10 March, 2014; originally announced March 2014.

arXiv:1309.1211 [pdf, ps, other]

doi 10.1016/j.jmaa.2014.03.037

Pullback Attractors of Non-autonomous Stochastic Degenerate Parabolic Equations on Unbounded Domains

Authors: Andrew Krause, Bixiang Wang

Abstract: This paper is concerned with pullback attractors of the stochastic p-Laplace equation defined on the entire space R^n. We first establish the asymptotic compactness of the equation in L^2(R^n) and then prove the existence and uniqueness of non-autonomous random attractors. This attractor is pathwise periodic if the non-autonomous deterministic forcing is time periodic. The difficulty of non-compac… ▽ More This paper is concerned with pullback attractors of the stochastic p-Laplace equation defined on the entire space R^n. We first establish the asymptotic compactness of the equation in L^2(R^n) and then prove the existence and uniqueness of non-autonomous random attractors. This attractor is pathwise periodic if the non-autonomous deterministic forcing is time periodic. The difficulty of non-compactness of Sobolev embeddings on R^n is overcome by the uniform smallness of solutions outside a bounded domain. △ Less

Submitted 4 September, 2013; originally announced September 2013.

MSC Class: 35B40 (Primary) 35B41; 37L30 (Secondary)

arXiv:1010.5511 [pdf, other]

Efficient Minimization of Decomposable Submodular Functions

Authors: Peter Stobbe, Andreas Krause

Abstract: Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel sub… ▽ More Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efficiently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classification-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. △ Less

Submitted 26 October, 2010; originally announced October 2010.

Comments: Expanded version of paper for Neural Information Processing Systems 2010

Showing 1–44 of 44 results for author: Krause, A