-
Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes
Authors:
Vinzenz Thoma,
Barna Pasztor,
Andreas Krause,
Giorgia Ramponi,
Yifan Hu
Abstract:
In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP c…
▽ More
In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward sha** and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers' trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
On the $K$-theory of $\mathbf{Z}/p^n$
Authors:
Benjamin Antieau,
Achim Krause,
Thomas Nikolaus
Abstract:
We give an explicit algebraic description, based on prismatic cohomology, of the algebraic K-groups of rings of the form $O_K/I$ where $K$ is a p-adic field and $I$ is a non-trivial ideal in the ring of integers $O_K$; this class includes the rings $\mathbf{Z}/p^n$ where $p$ is a prime.
The algebraic description allows us to describe a practical algorithm to compute individual K-groups as well a…
▽ More
We give an explicit algebraic description, based on prismatic cohomology, of the algebraic K-groups of rings of the form $O_K/I$ where $K$ is a p-adic field and $I$ is a non-trivial ideal in the ring of integers $O_K$; this class includes the rings $\mathbf{Z}/p^n$ where $p$ is a prime.
The algebraic description allows us to describe a practical algorithm to compute individual K-groups as well as to obtain several theoretical results: the vanishing of the even K-groups in high degrees, the determination of the orders of the odd K-groups in high degrees, and the degree of nilpotence of $v_1$ acting on the mod $p$ syntomic cohomology of $\mathbf{Z}/p^n$.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Safe Guaranteed Exploration for Non-linear Systems
Authors:
Manish Prajapat,
Johannes Köhler,
Matteo Turchetta,
Andreas Krause,
Melanie N. Zeilinger
Abstract:
Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind result…
▽ More
Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. Based on this framework we propose an efficient algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC improves efficiency by incorporating three techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Witt vectors with coefficients and TR
Authors:
Emanuele Dotto,
Achim Krause,
Thomas Nikolaus,
Irakli Patchkoria
Abstract:
We give a new construction of $p$-typical Witt vectors with coefficients in terms of ghost maps and show that this construction is isomorphic to the one defined in terms of formal power series from the authors' previous paper. We show that our construction recovers Kaledin's polynomial Witt vectors in the case of vector spaces over a perfect field of characteristic $p$. We then identify the compon…
▽ More
We give a new construction of $p$-typical Witt vectors with coefficients in terms of ghost maps and show that this construction is isomorphic to the one defined in terms of formal power series from the authors' previous paper. We show that our construction recovers Kaledin's polynomial Witt vectors in the case of vector spaces over a perfect field of characteristic $p$. We then identify the components of the $p$-typical TR with coefficients, originally defined by Lindenstrauss and McCarthy and later reworked by the second and third authors in joint work with McCandless, with the $p$-typical Witt vectors with coefficients. This extends a celebrated result of Hesselholt and Hesselholt-Madsen relating the components of TR with the Witt vectors. As an application, we given an algebraic description of the components of the Hill-Hopkins-Ravenel norm for cyclic $p$-groups in terms of $p$-typical Witt vectors with coefficients.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm
Authors:
Mohammad Reza Karimi,
Ya-** Hsieh,
Andreas Krause
Abstract:
Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we b…
▽ More
Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schrödinger equation" of (Claisse et al. 2023).
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Riemannian stochastic optimization methods avoid strict saddle points
Authors:
Ya-** Hsieh,
Mohammad Reza Karimi,
Andreas Krause,
Panayotis Mertikopoulos
Abstract:
Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesical…
▽ More
Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
Authors:
Lenart Treven,
Jonas Hübotter,
Bhavya Sukhija,
Florian Dörfler,
Andreas Krause
Abstract:
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use t…
▽ More
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Contextual Stochastic Bilevel Optimization
Authors:
Yifan Hu,
Jie Wang,
Yao Xie,
Andreas Krause,
Daniel Kuhn
Abstract:
We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the u…
▽ More
We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Prismatic cohomology relative to $δ$-rings
Authors:
Benjamin Antieau,
Achim Krause,
Thomas Nikolaus
Abstract:
We develop prismatic and syntomic cohomology relative to a $δ$-ring. This simultaneously generalizes Bhatt and Scholze's absolute and relative prismatic cohomology and shows that the latter, which was defined relative to a prism, is in fact independent of the prism structure and only depends on the underlying $δ$-ring. We give several possible definitions of our new version of prismatic cohomology…
▽ More
We develop prismatic and syntomic cohomology relative to a $δ$-ring. This simultaneously generalizes Bhatt and Scholze's absolute and relative prismatic cohomology and shows that the latter, which was defined relative to a prism, is in fact independent of the prism structure and only depends on the underlying $δ$-ring. We give several possible definitions of our new version of prismatic cohomology: a site theoretic definition, one using prismatic crystals, and a stack theoretic definition. These are equivalent under mild syntomicity hypotheses. As an application, we note how the theory of prismatic cohomology of filtered rings arises naturally in this context.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
A note on quadratic forms
Authors:
Fabian Hebestreit,
Achim Krause,
Maxime Ramzi
Abstract:
For a field extension $L/K$ we consider maps that are quadratic over $L$ but whose polarisation is only bilinear over $K$. Our main result is that all such are automatically quadratic forms over $L$ in the usual sense if and only if $L/K$ is formally unramified. In particular, this shows that over finite and number fields, one of the axioms in the standard definition of quadratic forms is superflu…
▽ More
For a field extension $L/K$ we consider maps that are quadratic over $L$ but whose polarisation is only bilinear over $K$. Our main result is that all such are automatically quadratic forms over $L$ in the usual sense if and only if $L/K$ is formally unramified. In particular, this shows that over finite and number fields, one of the axioms in the standard definition of quadratic forms is superfluous.
△ Less
Submitted 6 February, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Polygonic spectra and TR with coefficients
Authors:
Achim Krause,
Jonas McCandless,
Thomas Nikolaus
Abstract:
We introduce the notion of a polygonic spectrum which is designed to axiomatize the structure on topological Hochschild homology $\mathrm{THH}(R,M)$ of an $\mathbb{E}_1$-ring $R$ with coefficients in an $R$-bimodule $M$. For every polygonic spectrum $X$, we define a spectrum $\mathrm{TR}(X)$ as the map** spectrum from the polygonic version of the sphere spectrum $\mathbb{S}$ to $X$. In particula…
▽ More
We introduce the notion of a polygonic spectrum which is designed to axiomatize the structure on topological Hochschild homology $\mathrm{THH}(R,M)$ of an $\mathbb{E}_1$-ring $R$ with coefficients in an $R$-bimodule $M$. For every polygonic spectrum $X$, we define a spectrum $\mathrm{TR}(X)$ as the map** spectrum from the polygonic version of the sphere spectrum $\mathbb{S}$ to $X$. In particular if applied to $X = \mathrm{THH}(R,M)$ this gives a conceptual definition of $\mathrm{TR}(R,M)$.
Every cyclotomic spectrum gives rise to a polygonic spectrum and we prove that TR agrees with the classical definition of TR in this case. We construct Frobenius and Verschiebung maps on $\mathrm{TR}(X)$ by exhibiting $\mathrm{TR}(X)$ as the $\mathbb{Z}$-fixedpoints of a quasifinitely genuine $\mathbb{Z}$-spectrum. The notion of quasifinitely genuine $\mathbb{Z}$-spectra is a new notion that we introduce and discuss inspired by a similar notion over $\mathbb{Z}$ introduced by Kaledin. Besides the usual coherences for genuine spectra, this notion additionally encodes that $\mathrm{TR}(X)$ admits certain infinite sums of Verschiebung maps.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Learning To Dive In Branch And Bound
Authors:
Max B. Paulus,
Andreas Krause
Abstract:
Primal heuristics are important for solving mixed integer linear programs, because they find feasible solutions that facilitate branch and bound search. A prominent group of primal heuristics are diving heuristics. They iteratively modify and resolve linear programs to conduct a depth-first search from any node in the search tree. Existing divers rely on generic decision rules that fail to exploit…
▽ More
Primal heuristics are important for solving mixed integer linear programs, because they find feasible solutions that facilitate branch and bound search. A prominent group of primal heuristics are diving heuristics. They iteratively modify and resolve linear programs to conduct a depth-first search from any node in the search tree. Existing divers rely on generic decision rules that fail to exploit structural commonality between similar problem instances that often arise in practice. Therefore, we propose L2Dive to learn specific diving heuristics with graph neural networks: We train generative models to predict variable assignments and leverage the duality of linear programs to make diving decisions based on the model's predictions. L2Dive is fully integrated into the open-source solver SCIP. We find that L2Dive outperforms standard divers to find better feasible solutions on a range of combinatorial optimization problems. For real-world applications from server load balancing and neural network verification, L2Dive improves the primal-dual integral by up to 7% (35%) on average over a tuned (default) solver baseline and reduces average solving time by 20% (29%).
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
A Dynamical System View of Langevin-Based Non-Convex Sampling
Authors:
Mohammad Reza Karimi,
Ya-** Hsieh,
Andreas Krause
Abstract:
Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture…
▽ More
Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture the scales of the variables such as Wasserstein distances, and (3) mainly apply to elementary schemes such as stochastic gradient Langevin dynamics. In this paper, we develop a new framework that lifts the above issues by harnessing several tools from the theory of dynamical systems. Our key result is that, for a large class of state-of-the-art sampling schemes, their last-iterate convergence in Wasserstein distances can be reduced to the study of their continuous-time counterparts, which is much better understood. Coupled with standard assumptions of MCMC sampling, our theory immediately yields the last-iterate Wasserstein convergence of many advanced sampling schemes such as proximal, randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our framework also motivates more efficient schemes that enjoy the same rigorous guarantees.
△ Less
Submitted 13 March, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Near-Optimal Multi-Agent Learning for Safe Coverage Control
Authors:
Manish Prajapat,
Matteo Turchetta,
Melanie N. Zeilinger,
Andreas Krause
Abstract:
In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim…
▽ More
In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning
Authors:
Ilnura Usmanova,
Yarden As,
Maryam Kamgarpour,
Andreas Krause
Abstract:
Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the fin…
▽ More
Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the final output of the algorithm.
We introduce a general approach for seeking a stationary point in high dimensional non-linear stochastic optimization problems in which maintaining safety during learning is crucial. Our approach called LB-SGD is based on applying stochastic gradient descent (SGD) with a carefully chosen adaptive step size to a logarithmic barrier approximation of the original problem. We provide a complete convergence analysis of non-convex, convex, and strongly-convex smooth constrained problems, with first-order and zeroth-order feedback. Our approach yields efficient updates and scales better with dimensionality compared to existing approaches.
We empirically compare the sample complexity and the computational cost of our method with existing safe learning approaches. Beyond synthetic benchmarks, we demonstrate the effectiveness of our approach on minimizing constraint violation in policy search tasks in safe reinforcement learning (RL).
△ Less
Submitted 2 June, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning
Authors:
Max B. Paulus,
Giulia Zarpellon,
Andreas Krause,
Laurent Charlin,
Chris J. Maddison
Abstract:
Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvemen…
▽ More
Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Riemannian stochastic approximation algorithms
Authors:
Mohammad Reza Karimi,
Ya-** Hsieh,
Panayotis Mertikopoulos,
Andreas Krause
Abstract:
We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this dif…
▽ More
We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this difficulty by introducing a suitable Fermi coordinate frame which allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM) algorithms under study to that of an associated deterministic dynamical system. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, despite the significant complications that arise due to the curvature and topology of the underlying manifold. We showcase the flexibility of the proposed framework by applying it to a range of retraction-based variants of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence.
△ Less
Submitted 27 December, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces
Authors:
Mojmír Mutný,
Andreas Krause
Abstract:
Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate the optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, whic…
▽ More
Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate the optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, which allows estimating parameters without bias. We generalize this framework to RKHSs, and allow for the linear functional to be only approximately inferred, i.e., with a fixed bias. This scenario captures many important modern applications, such as estimation of gradient maps, integrals, and solutions to differential equations. We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise, enabling us to certify estimation with bounded error with high probability.
△ Less
Submitted 15 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
On the K-theory of $\mathbb{Z}/p^n$ -- announcement
Authors:
Benjamin Antieau,
Achim Krause,
Thomas Nikolaus
Abstract:
We announce new methods for using prismatic cohomology to compute the K-groups of $\mathbb{Z}/p^n$ and related rings. We use computer algebra methods to compute these K-groups through a large range in specific cases and also obtain explicit formulas for their orders in large degrees.
We announce new methods for using prismatic cohomology to compute the K-groups of $\mathbb{Z}/p^n$ and related rings. We use computer algebra methods to compute these K-groups through a large range in specific cases and also obtain explicit formulas for their orders in large degrees.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems
Authors:
Andreas Schlaginhaufen,
Philippe Wenk,
Andreas Krause,
Florian Dörfler
Abstract:
Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial obser…
▽ More
Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial observations are the norm. As we will demonstrate, initialization of unobserved augmented states can become a key problem for neural ODEs. To alleviate this issue, we propose to augment the system's state with its history. Inspired by state augmentation in discrete-time systems, we thus obtain neural delay differential equations. Based on classical time delay stability analysis, we then show how to ensure stability of the learned models, and theoretically analyze our approach. Our experiments demonstrate its applicability to stable system identification of partially observed systems and learning a stabilizing feedback policy in delayed feedback control.
△ Less
Submitted 10 December, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Fast Projection Onto Convex Smooth Constraints
Authors:
Ilnura Usmanova,
Maryam Kamgarpour,
Andreas Krause,
Kfir Yehuda Levy
Abstract:
The Euclidean projection onto a convex set is an important problem that arises in numerous constrained optimization tasks. Unfortunately, in many cases, computing projections is computationally demanding. In this work, we focus on projection problems where the constraints are smooth and the number of constraints is significantly smaller than the dimension. The runtime of existing approaches to sol…
▽ More
The Euclidean projection onto a convex set is an important problem that arises in numerous constrained optimization tasks. Unfortunately, in many cases, computing projections is computationally demanding. In this work, we focus on projection problems where the constraints are smooth and the number of constraints is significantly smaller than the dimension. The runtime of existing approaches to solving such problems is either cubic in the dimension or polynomial in the inverse of the target accuracy. Conversely, we propose a simple and efficient primal-dual approach, with a runtime that scales only linearly with the dimension, and only logarithmically in the inverse of the target accuracy. We empirically demonstrate its performance, and compare it with standard baselines.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models
Authors:
Lenart Treven,
Philippe Wenk,
Florian Dörfler,
Andreas Krause
Abstract:
Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of pre…
▽ More
Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.
△ Less
Submitted 15 October, 2021; v1 submitted 22 June, 2021;
originally announced June 2021.
-
PopSkipJump: Decision-Based Attack for Probabilistic Classifiers
Authors:
Carl-Johann Simon-Gabriel,
Noman Ahmed Sheikh,
Andreas Krause
Abstract:
Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed…
▽ More
Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed for classifiers with probabilistic outputs. It is based on the HopSkipJump attack by Chen et al. (2019, arXiv:1904.02144v5 ), a strong and query efficient decision-based attack originally designed for deterministic classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries to maintain HopSkipJump's original output quality across various noise levels, while converging to its query efficiency as the noise level decreases. We test our attack on various noise models, including state-of-the-art off-the-shelf randomized defenses, and show that they offer almost no extra robustness to decision-based attacks. Code is available at https://github.com/cjsg/PopSkipJump .
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Authors:
Tobias Sutter,
Andreas Krause,
Daniel Kuhn
Abstract:
Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optim…
▽ More
Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.
△ Less
Submitted 26 October, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Map** spaces in homotopy coherent nerves
Authors:
Fabian Hebestreit,
Achim Krause
Abstract:
We give a direct proof that middle map** spaces in coherent nerves of Kan enriched categories have the same homotopy type as the original map** spaces.
We give a direct proof that middle map** spaces in coherent nerves of Kan enriched categories have the same homotopy type as the original map** spaces.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
The Picard group in equivariant homotopy theory via stable module categories
Authors:
Achim Krause
Abstract:
We develop a mechanism of "isotropy separation for compact objects" that explicitly describes an invertible $G$-spectrum through its collection of geometric fixed points and gluing data located in certain variants of the stable module category. As an application, we carry out a complete analysis of invertible G-spectra in the case $G=A_5$. A further application is given by showing that the Picard…
▽ More
We develop a mechanism of "isotropy separation for compact objects" that explicitly describes an invertible $G$-spectrum through its collection of geometric fixed points and gluing data located in certain variants of the stable module category. As an application, we carry out a complete analysis of invertible G-spectra in the case $G=A_5$. A further application is given by showing that the Picard groups of $\mathrm{Sp}^G$ and a category of derived Mackey functors agree.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives
Authors:
Emmanouil Angelis,
Philippe Wenk,
Bernhard Schölkopf,
Stefan Bauer,
Andreas Krause
Abstract:
Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and expo…
▽ More
Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior. To furthermore illustrate the practical applicability of our method, we then apply it to ODIN, a recently developed algorithm for ODE parameter inference. In an extensive experiments section, all results are empirically validated, demonstrating the speed, accuracy, and practical applicability of this approach.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Witt vectors with coefficients and characteristic polynomials over non-commutative rings
Authors:
Emanuele Dotto,
Achim Krause,
Thomas Nikolaus,
Irakli Patchkoria
Abstract:
For a not-necessarily commutative ring R we define an abelian group W(R;M) of Witt vectors with coefficients in an R-bimodule M. These groups generalize the usual big Witt vectors of commutative rings and we prove that they have analogous formal properties and structure. One main result is that W(R) := W(R;R) is Morita invariant in R.
For an R-linear endomorphism f of a finitely generated projec…
▽ More
For a not-necessarily commutative ring R we define an abelian group W(R;M) of Witt vectors with coefficients in an R-bimodule M. These groups generalize the usual big Witt vectors of commutative rings and we prove that they have analogous formal properties and structure. One main result is that W(R) := W(R;R) is Morita invariant in R.
For an R-linear endomorphism f of a finitely generated projective R-module we define a characteristic element $χ_f \in W(R)$. This element is a non-commutative analogue of the classical characteristic polynomial and we show that it has similar properties. The assignment $f \mapsto χ_f$ induces an isomorphism between a suitable completion of cyclic K-theory and W(R).
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Log Barriers for Safe Non-convex Black-box Optimization
Authors:
Ilnura Usmanova,
Andreas Krause,
Maryam Kamgarpour
Abstract:
We address the problem of minimizing a smooth function $f^0(x)$ over a compact set $D$ defined by smooth functional constraints $f^i(x)\leq 0,~ i = 1,\ldots, m$ given noisy value measurements of $f^i(x)$. This problem arises in safety-critical applications, where certain parameters need to be adapted online in a data-driven fashion, such as in personalized medicine, robotics, manufacturing, etc. I…
▽ More
We address the problem of minimizing a smooth function $f^0(x)$ over a compact set $D$ defined by smooth functional constraints $f^i(x)\leq 0,~ i = 1,\ldots, m$ given noisy value measurements of $f^i(x)$. This problem arises in safety-critical applications, where certain parameters need to be adapted online in a data-driven fashion, such as in personalized medicine, robotics, manufacturing, etc. In such cases, it is important to ensure constraints are not violated while taking measurements and seeking the minimum of the cost function. We propose a new algorithm s0-LBM, which provides provably feasible iterates with high probability and applies to the challenging case of uncertain zero-th order oracle. We also analyze the convergence rate of the algorithm, and empirically demonstrate its effectiveness.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.
-
Safe non-smooth black-box optimization with application to policy search
Authors:
Ilnura Usmanova,
Andreas Krause,
Maryam Kamgarpour
Abstract:
For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $\min f^0(x)$ subject to $f^i(x)\leq 0,~ i = 1,\ldots, m$, at the same time, guaranteeing constraint satisfactio…
▽ More
For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $\min f^0(x)$ subject to $f^i(x)\leq 0,~ i = 1,\ldots, m$, at the same time, guaranteeing constraint satisfaction while learning an optimal solution with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem.
△ Less
Submitted 23 February, 2021; v1 submitted 19 December, 2019;
originally announced December 2019.
-
Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling
Authors:
Mojmír Mutný,
Michał Dereziński,
Andreas Krause
Abstract:
We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\bM$. However, we show that when the coordi…
▽ More
We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\bM$. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix $\bM$, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of $\bM$. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling.
△ Less
Submitted 12 February, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Schauder Bases Having Many Good Block Basic Sequences
Authors:
Cory A. Krause
Abstract:
In the study of asymptotic geometry in Banach spaces, a basic sequence which gives rise to a spreading model has been called a good sequence. It is well known that every normalized basic sequence in a Banach space has a subsequence which is good. We investigate the assumption that every normalized block tree relative to a basis has a branch which is good. This combinatorial property turns out to b…
▽ More
In the study of asymptotic geometry in Banach spaces, a basic sequence which gives rise to a spreading model has been called a good sequence. It is well known that every normalized basic sequence in a Banach space has a subsequence which is good. We investigate the assumption that every normalized block tree relative to a basis has a branch which is good. This combinatorial property turns out to be very strong and is equivalent to the space being $1$-asymptotic $\ell_p$ for some $1\leq p\leq\infty$. We also investigate the even stronger assumption that every block basic sequence of a basis is good. Finally, using the Hindman-Milliken-Taylor theorem, we prove a stabilization theorem which produces a basic sequence all of whose normalized constant coefficient block basic sequences are good, and we present an application of this stabilization.
△ Less
Submitted 8 January, 2020; v1 submitted 27 July, 2019;
originally announced July 2019.
-
Bökstedt periodicity and quotients of DVRs
Authors:
Achim Krause,
Thomas Nikolaus
Abstract:
In this note we compute the topological Hochschild homology of quotients of DVRs. Along the way we give a short argument for Bökstedt periodicity and generalizations over various other bases. Our strategy also gives a very efficient way to redo the computations of THH (resp. logarithmic THH) of complete DVRs originally due to Lindenstrauss-Madsen (resp. Hesselholt-Madsen).
In this note we compute the topological Hochschild homology of quotients of DVRs. Along the way we give a short argument for Bökstedt periodicity and generalizations over various other bases. Our strategy also gives a very efficient way to redo the computations of THH (resp. logarithmic THH) of complete DVRs originally due to Lindenstrauss-Madsen (resp. Hesselholt-Madsen).
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Safe Convex Learning under Uncertain Constraints
Authors:
Ilnura Usmanova,
Andreas Krause,
Maryam Kamgarpour
Abstract:
We address the problem of minimizing a convex smooth function $f(x)$ over a compact polyhedral set $D$ given a stochastic zeroth-order constraint feedback model. This problem arises in safety-critical machine learning applications, such as personalized medicine and robotics. In such cases, one needs to ensure constraints are satisfied while exploring the decision space to find optimum of the loss…
▽ More
We address the problem of minimizing a convex smooth function $f(x)$ over a compact polyhedral set $D$ given a stochastic zeroth-order constraint feedback model. This problem arises in safety-critical machine learning applications, such as personalized medicine and robotics. In such cases, one needs to ensure constraints are satisfied while exploring the decision space to find optimum of the loss function. We propose a new variant of the Frank-Wolfe algorithm, which applies to the case of uncertain linear constraints. Using robust optimization, we provide the convergence rate of the algorithm while guaranteeing feasibility of all iterates, with high probability.
△ Less
Submitted 9 December, 2019; v1 submitted 11 March, 2019;
originally announced March 2019.
-
AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs
Authors:
Gabriele Abbati,
Philippe Wenk,
Michael A Osborne,
Andreas Krause,
Bernhard Schölkopf,
Stefan Bauer
Abstract:
Stochastic differential equations are an important modeling class in many disciplines. Consequently, there exist many methods relying on various discretization and numerical integration schemes. In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system. Using state-of-the-art adversarial and moment mat…
▽ More
Stochastic differential equations are an important modeling class in many disciplines. Consequently, there exist many methods relying on various discretization and numerical integration schemes. In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system. Using state-of-the-art adversarial and moment matching inference techniques, we avoid the discretization schemes of classical approaches. This leads to significant improvements in parameter accuracy and robustness given random initial guesses. On four established benchmark systems, we compare the performance of our algorithms to state-of-the-art solutions based on extended Kalman filtering and Gaussian processes.
△ Less
Submitted 28 May, 2019; v1 submitted 22 February, 2019;
originally announced February 2019.
-
ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems
Authors:
Philippe Wenk,
Gabriele Abbati,
Michael A Osborne,
Bernhard Schölkopf,
Andreas Krause,
Stefan Bauer
Abstract:
Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of exper…
▽ More
Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of experiments, our approach outperforms the current state of the art for parameter inference both in terms of accuracy and computational cost. It also shows promising results for the much more challenging problem of model selection.
△ Less
Submitted 5 December, 2019; v1 submitted 17 February, 2019;
originally announced February 2019.
-
C-motivic modular forms
Authors:
Bogdan Gheorghe,
Daniel C. Isaksen,
Achim Krause,
Nicolas Ricka
Abstract:
We construct a topological model for cellular, 2-complete, stable C-motivic homotopy theory that uses no algebro-geometric foundations. We compute the Steenrod algebra in this context, and we construct a "motivic modular forms" spectrum over C.
We construct a topological model for cellular, 2-complete, stable C-motivic homotopy theory that uses no algebro-geometric foundations. We compute the Steenrod algebra in this context, and we construct a "motivic modular forms" spectrum over C.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
Guarantees for Greedy Maximization of Non-submodular Functions with Applications
Authors:
Andrew An Bian,
Joachim M. Buhmann,
Andreas Krause,
Sebastian Tschiatschek
Abstract:
We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular…
▽ More
We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular functions, e.g., the Bayesian A-optimality objective in experimental design. We prove theoretical guarantees supporting the empirical performance. Our guarantees are characterized by a combination of the (generalized) curvature $α$ and the submodularity ratio $γ$. In particular, we prove that Greedy enjoys a tight approximation guarantee of $\frac{1}α(1- e^{-γα})$ for cardinality constrained maximization. In addition, we bound the submodularity ratio and curvature for several important real-world objectives, including the Bayesian A-optimality objective, the determinantal function of a square submatrix and certain linear programs with combinatorial constraints. We experimentally validate our theoretical findings for both synthetic and real-world applications.
△ Less
Submitted 14 May, 2019; v1 submitted 6 March, 2017;
originally announced March 2017.
-
Vanishing lines for modules over the motivic Steenrod algebra
Authors:
Drew Heard,
Achim Krause
Abstract:
We study criteria for freeness and for the existence of a vanishing line for modules over certain Hopf subalgebras of the motivic Steenrod algebra over $\mathrm{Spec}(\mathbb{C})$ at the prime 2. These turn out to be determined by the vanishing of certain Margolis homology groups in the quotient Hopf algebra $\mathcal{A}/τ$.
We study criteria for freeness and for the existence of a vanishing line for modules over certain Hopf subalgebras of the motivic Steenrod algebra over $\mathrm{Spec}(\mathbb{C})$ at the prime 2. These turn out to be determined by the vanishing of certain Margolis homology groups in the quotient Hopf algebra $\mathcal{A}/τ$.
△ Less
Submitted 24 January, 2018; v1 submitted 13 February, 2017;
originally announced February 2017.
-
Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions
Authors:
Hemant Tyagi,
Anastasios Kyrillidis,
Bernd Gärtner,
Andreas Krause
Abstract:
A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}φ_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $φ$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allow…
▽ More
A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}φ_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $φ$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allows for the presence of a sparse number of second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, with $|\mathcal{S}_1| \ll d, |\mathcal{S}_2| \ll d^2$, the function $f$ is now assumed to be of the form: $\sum_{p \in \mathcal{S}_1}φ_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}φ_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$. Assuming we have the freedom to query $f$ anywhere in its domain, we derive efficient algorithms that provably recover $\mathcal{S}_1,\mathcal{S}_2$ with finite sample bounds. Our analysis covers the noiseless setting where exact samples of $f$ are obtained, and also extends to the noisy setting where the queries are corrupted with noise. For the noisy setting in particular, we consider two noise models namely: i.i.d Gaussian noise and arbitrary but bounded noise. Our main methods for identification of $\mathcal{S}_2$ essentially rely on estimation of sparse Hessian matrices, for which we provide two novel compressed sensing based schemes. Once $\mathcal{S}_1, \mathcal{S}_2$ are known, we show how the individual components $φ_p$, $φ_{(l,l^{\prime})}$ can be estimated via additional queries of $f$, with uniform error bounds. Lastly, we provide simulation results on synthetic data that validate our theoretical findings.
△ Less
Submitted 8 May, 2017; v1 submitted 2 May, 2016;
originally announced May 2016.
-
Asymptotic Dynamics of Stochastic $p$-Laplace Equations on Unbounded Domains
Authors:
Andrew Krause
Abstract:
This thesis is concerned with the asymptotic behavior of solutions of stochastic $p$-Laplace equations driven by non-autonomous forcing on $\mathbb{R}^n$. Two cases are studied, with additive and multiplicative noise respectively. Estimates on the tails of solutions are used to overcome the non-compactness of Sobolev embeddings on unbounded domains, and prove asymptotic compactness of solution ope…
▽ More
This thesis is concerned with the asymptotic behavior of solutions of stochastic $p$-Laplace equations driven by non-autonomous forcing on $\mathbb{R}^n$. Two cases are studied, with additive and multiplicative noise respectively. Estimates on the tails of solutions are used to overcome the non-compactness of Sobolev embeddings on unbounded domains, and prove asymptotic compactness of solution operators in $L^2(\mathbb{R}^n)$. Using this result we prove the existence and uniqueness of random attractors in each case. Additionally, we show the upper semicontinuity of the attractor for the multiplicative noise case as the intensity of the noise approaches zero.
△ Less
Submitted 3 August, 2014;
originally announced August 2014.
-
Bianchi's classification of 3-dimensional Lie algebras revisited
Authors:
Manuel Glas,
Panagiotis Konstantis,
Achim Krause,
Frank Loose
Abstract:
We present Bianchi's proof on the classification of real (and complex) $3$-dimensional Lie algebras in a coordinate free version from a strictly representation theoretic point of view. Nearby we also compute the automorphism groups and from this the orbit dimensions of the corresponding orbits in the algebraic variety $X\subseteqΛ^2V^*\otimes V$ describing all Lie brackets on a fixed vector space…
▽ More
We present Bianchi's proof on the classification of real (and complex) $3$-dimensional Lie algebras in a coordinate free version from a strictly representation theoretic point of view. Nearby we also compute the automorphism groups and from this the orbit dimensions of the corresponding orbits in the algebraic variety $X\subseteqΛ^2V^*\otimes V$ describing all Lie brackets on a fixed vector space $V$ of dimension $3$. Moreover we clarify which orbits lie in the closure of a given orbit and therefore the topology on the orbit space $X/G$ with $G=\mathrm{Aut}(V)$.
△ Less
Submitted 10 March, 2014;
originally announced March 2014.
-
Pullback Attractors of Non-autonomous Stochastic Degenerate Parabolic Equations on Unbounded Domains
Authors:
Andrew Krause,
Bixiang Wang
Abstract:
This paper is concerned with pullback attractors of the stochastic p-Laplace equation defined on the entire space R^n. We first establish the asymptotic compactness of the equation in L^2(R^n) and then prove the existence and uniqueness of non-autonomous random attractors. This attractor is pathwise periodic if the non-autonomous deterministic forcing is time periodic. The difficulty of non-compac…
▽ More
This paper is concerned with pullback attractors of the stochastic p-Laplace equation defined on the entire space R^n. We first establish the asymptotic compactness of the equation in L^2(R^n) and then prove the existence and uniqueness of non-autonomous random attractors. This attractor is pathwise periodic if the non-autonomous deterministic forcing is time periodic. The difficulty of non-compactness of Sobolev embeddings on R^n is overcome by the uniform smallness of solutions outside a bounded domain.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
Efficient Minimization of Decomposable Submodular Functions
Authors:
Peter Stobbe,
Andreas Krause
Abstract:
Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel sub…
▽ More
Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efficiently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classification-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude.
△ Less
Submitted 26 October, 2010;
originally announced October 2010.