Search | arXiv e-print repository

Online Convex Optimization with Unbounded Memory

Authors: Raunak Kumar, Sarah Dean, Robert Kleinberg

Abstract: Online convex optimization (OCO) is a widely used framework in online learning. In each round, the learner chooses a decision in a convex set and an adversary chooses a convex loss function, and then the learner suffers the loss associated with their current decision. However, in many applications the learner's loss depends not only on the current decision but on the entire history of decisions un… ▽ More Online convex optimization (OCO) is a widely used framework in online learning. In each round, the learner chooses a decision in a convex set and an adversary chooses a convex loss function, and then the learner suffers the loss associated with their current decision. However, in many applications the learner's loss depends not only on the current decision but on the entire history of decisions until that point. The OCO framework and its existing generalizations do not capture this, and they can only be applied to many settings of interest after a long series of approximation arguments. They also leave open the question of whether the dependence on memory is tight because there are no non-trivial lower bounds. In this work we introduce a generalization of the OCO framework, "Online Convex Optimization with Unbounded Memory", that captures long-term dependence on past decisions. We introduce the notion of $p$-effective memory capacity, $H_p$, that quantifies the maximum influence of past decisions on present losses. We prove an $O(\sqrt{H_p T})$ upper bound on the policy regret and a matching (worst-case) lower bound. As a special case, we prove the first non-trivial lower bound for OCO with finite memory \citep{anavaHM2015online}, which could be of independent interest, and also improve existing upper bounds. We demonstrate the broad applicability of our framework by using it to derive regret bounds, and to improve and simplify existing regret bound derivations, for a variety of online learning problems including online linear control and an online variant of performative prediction. △ Less

Submitted 29 March, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:1705.01194 [pdf, ps, other]

The Lovász Theta Function for Random Regular Graphs and Community Detection in the Hard Regime

Authors: Jess Banks, Robert Kleinberg, Cristopher Moore

Abstract: We derive upper and lower bounds on the degree $d$ for which the Lovász $\vartheta$ function, or equivalently sum-of-squares proofs with degree two, can refute the existence of a $k$-coloring in random regular graphs $G_{n,d}$. We show that this type of refutation fails well above the $k$-colorability transition, and in particular everywhere below the Kesten-Stigum threshold. This is consistent wi… ▽ More We derive upper and lower bounds on the degree $d$ for which the Lovász $\vartheta$ function, or equivalently sum-of-squares proofs with degree two, can refute the existence of a $k$-coloring in random regular graphs $G_{n,d}$. We show that this type of refutation fails well above the $k$-colorability transition, and in particular everywhere below the Kesten-Stigum threshold. This is consistent with the conjecture that refuting $k$-colorability, or distinguishing $G_{n,d}$ from the planted coloring model, is hard in this region. Our results also apply to the disassortative case of the stochastic block model, adding evidence to the conjecture that there is a regime where community detection is computationally hard even though it is information-theoretically possible. Using orthogonal polynomials, we also provide explicit upper bounds on $\vartheta(\overline{G})$ for regular graphs of a given girth, which may be of independent interest. △ Less

Submitted 28 August, 2017; v1 submitted 2 May, 2017; originally announced May 2017.

arXiv:1703.04143 [pdf, ps, other]

Bernoulli Factories and Black-Box Reductions in Mechanism Design

Authors: Shaddin Dughmi, Jason Hartline, Robert Kleinberg, Rad Niazadeh

Abstract: We provide a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multi-dimensional and continuous type spaces. The key technical barrier preventing exact incentive compatibility in prior black-box reductions is that r… ▽ More We provide a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multi-dimensional and continuous type spaces. The key technical barrier preventing exact incentive compatibility in prior black-box reductions is that repairing violations of incentive constraints requires understanding the distribution of the mechanism's output, which is typically #P-hard to compute. Reductions that instead estimate the output distribution by sampling inevitably suffer from sampling error, which typically precludes exact incentive compatibility. We overcome this barrier by employing and generalizing the computational model in the literature on $\textit{Bernoulli factories}$. In a Bernoulli factory problem, one is given a function map** the bias of an "input coin" to that of an "output coin", and the challenge is to efficiently simulate the output coin given only sample access to the input coin. This is the key ingredient in designing an incentive compatible mechanism for bipartite matching, which can be used to make the approximately incentive compatible reduction of Hartline et al. (2015) exactly incentive compatible. △ Less

Submitted 7 November, 2020; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: Forthcoming in the Journal of the ACM (JACM) - Nov 2020; conference version appeared in Proc. 49th ACM Symposium on Theory of Computing (STOC 2017)

arXiv:1607.00047 [pdf, other]

doi 10.19086/da.3734

The Growth Rate of Tri-Colored Sum-Free Sets

Authors: Robert Kleinberg, Will Sawin, David E. Speyer

Abstract: Let $G$ be an abelian group. A tri-colored sum-free set in $G^n$ is a collection of triples $({\bf a}_i, {\bf b}_i, {\bf c}_i)$ in $G^n$ such that ${\bf a}_i+{\bf b}_j+{\bf c}_k=0$ if and only if $i=j=k$. Fix a prime $q$ and let $C_q$ be the cyclic group of order $q$. Let $θ= \min_{ρ>0} (1+ρ+\cdots + ρ^{q-1}) ρ^{-(q-1)/3}$. Blasiak, Church, Cohn, Grochow, Naslund, Sawin, and Umans (building on pre… ▽ More Let $G$ be an abelian group. A tri-colored sum-free set in $G^n$ is a collection of triples $({\bf a}_i, {\bf b}_i, {\bf c}_i)$ in $G^n$ such that ${\bf a}_i+{\bf b}_j+{\bf c}_k=0$ if and only if $i=j=k$. Fix a prime $q$ and let $C_q$ be the cyclic group of order $q$. Let $θ= \min_{ρ>0} (1+ρ+\cdots + ρ^{q-1}) ρ^{-(q-1)/3}$. Blasiak, Church, Cohn, Grochow, Naslund, Sawin, and Umans (building on previous work of Croot, Lev and Pach, and of Ellenberg and Gijswijt) showed that a tri-colored sum-free set in $C_q^n$ has size at most $3 θ^n$. Between this paper and a paper of Pebody, we will show that, for any $δ> 0$, and $n$ sufficiently large, there are tri-colored sum-free sets in $C_q^n$ of size $(θ-δ)^n$. Our construction also works when $q$ is not prime. △ Less

Submitted 6 July, 2018; v1 submitted 30 June, 2016; originally announced July 2016.

Comments: 10 pages, published in Discrete Analysis

Journal ref: Discrete Analysis 2018:12

arXiv:1605.08416 [pdf, ps, other]

A nearly tight upper bound on tri-colored sum-free sets in characteristic 2

Authors: Robert Kleinberg

Abstract: A tri-colored sum-free set in an abelian group $H$ is a collection of ordered triples in $H^3$, $\{(a_i,b_i,c_i)\}_{i=1}^m$, such that the equation $a_i+b_j+c_k=0$ holds if and only if $i=j=k$. Using a variant of the lemma introduced by Croot, Lev, and Pach in their breakthrough work on arithmetic-progression-free sets, we prove that the size of any tri-colored sum-free set in $\mathbb{F}_2^n$ is… ▽ More A tri-colored sum-free set in an abelian group $H$ is a collection of ordered triples in $H^3$, $\{(a_i,b_i,c_i)\}_{i=1}^m$, such that the equation $a_i+b_j+c_k=0$ holds if and only if $i=j=k$. Using a variant of the lemma introduced by Croot, Lev, and Pach in their breakthrough work on arithmetic-progression-free sets, we prove that the size of any tri-colored sum-free set in $\mathbb{F}_2^n$ is bounded above by $6 {n \choose \lfloor n/3 \rfloor}$. This upper bound is tight, up to a factor subexponential in $n$: there exist tri-colored sum-free sets in $\mathbb{F}_2^n$ of size greater than ${n \choose \lfloor n/3 \rfloor} \cdot 2^{-\sqrt{16 n / 3}}$ for all sufficiently large $n$. △ Less

Submitted 26 May, 2016; originally announced May 2016.

arXiv:1502.02155 [pdf, ps, other]

Secretary Problems with Non-Uniform Arrival Order

Authors: Thomas Kesselheim, Robert Kleinberg, Rad Niazadeh

Abstract: For many online problems, it is known that the uniform arrival order enables the design of algorithms with much better performance guarantees than under worst-case. The quintessential example is the secretary problem. If the sequence of elements is presented in uniformly random order there is an algorithm that picks the maximum value with probability 1/e, whereas no non-trivial performance guarant… ▽ More For many online problems, it is known that the uniform arrival order enables the design of algorithms with much better performance guarantees than under worst-case. The quintessential example is the secretary problem. If the sequence of elements is presented in uniformly random order there is an algorithm that picks the maximum value with probability 1/e, whereas no non-trivial performance guarantee is possible if the elements arrive in worst-case order. This work initiates an investigation into relaxations of the random-ordering hypothesis in online algorithms, by focusing on the secretary problems. We present two sets of properties of distributions over permutations as sufficient conditions, called the block-independence property and uniform-induced-ordering property. We show these two are asymptotically equivalent by borrowing some techniques from the approximation theory. Moreover, we show they both imply the existence of secretary algorithms with constant probability of correct selection, approaching the optimal constant 1/e in the limit. We substantiate our idea by providing several constructions of distributions that satisfy block-independence. We also show that Θ(log log n) is the minimum entropy of any permutation distribution that permits constant probability of correct selection in the secretary problem with n elements. While our block-independence condition is sufficient for constant probability of correct selection, it is not necessary; however, we present complexity-theoretic evidence that no simple necessary and sufficient criterion exists. Finally, we explore the extent to which the performance guarantees of other algorithms are preserved when one relaxes the uniform random ordering assumption, obtaining a positive result for Kleinberg's multiple-choice secretary algorithm and a negative result for the weighted bipartite matching algorithm of Korula and Pal. △ Less

Submitted 7 February, 2015; originally announced February 2015.

Comments: To appear in Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC 2015)

arXiv:1201.4764 [pdf, other]

Matroid Prophet Inequalities

Authors: Robert Kleinberg, S. Matthew Weinberg

Abstract: Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent observation. The famous prophet inequality of Krengel, Sucheston, and Garling asserts that a gambler who knows the distribution of each random variable can achieve at least half as much reward, in expectation, as a "proph… ▽ More Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent observation. The famous prophet inequality of Krengel, Sucheston, and Garling asserts that a gambler who knows the distribution of each random variable can achieve at least half as much reward, in expectation, as a "prophet" who knows the sampled values of each random variable and can choose the largest one. We generalize this result to the setting in which the gambler and the prophet are allowed to make more than one selection, subject to a matroid constraint. We show that the gambler can still achieve at least half as much reward as the prophet; this result is the best possible, since it is known that the ratio cannot be improved even in the original prophet inequality, which corresponds to the special case of rank-one matroids. Generalizing the result still further, we show that under an intersection of p matroid constraints, the prophet's reward exceeds the gambler's by a factor of at most O(p), and this factor is also tight. Beyond their interest as theorems about pure online algorithms or optimal stop** rules, these results also have applications to mechanism design. Our results imply improved bounds on the ability of sequential posted-price mechanisms to approximate Bayesian optimal mechanisms in both single-parameter and multi-parameter settings. In particular, our results imply the first efficiently computable constant-factor approximations to the Bayesian optimal revenue in certain multi-parameter settings. △ Less

Submitted 23 January, 2012; originally announced January 2012.

Comments: 18 pages

ACM Class: F.1.2; G.3

arXiv:1108.2489 [pdf, ps, other]

Lexicographic products and the power of non-linear network coding

Authors: Anna Blasiak, Robert Kleinberg, Eyal Lubetzky

Abstract: We introduce a technique for establishing and amplifying gaps between parameters of network coding and index coding. The technique uses linear programs to establish separations between combinatorial and coding-theoretic parameters and applies hypergraph lexicographic products to amplify these separations. This entails combining the dual solutions of the lexicographic multiplicands and proving that… ▽ More We introduce a technique for establishing and amplifying gaps between parameters of network coding and index coding. The technique uses linear programs to establish separations between combinatorial and coding-theoretic parameters and applies hypergraph lexicographic products to amplify these separations. This entails combining the dual solutions of the lexicographic multiplicands and proving that they are a valid dual of the product. Our result is general enough to apply to a large family of linear programs. This blend of linear programs and lexicographic products gives a recipe for constructing hard instances in which the gap between combinatorial or coding-theoretic parameters is polynomially large. We find polynomial gaps in cases in which the largest previously known gaps were only small constant factors or entirely unknown. Most notably, we show a polynomial separation between linear and non-linear network coding rates. This involves exploiting a connection between matroids and index coding to establish a previously unknown separation between linear and non-linear index coding rates. We also construct index coding problems with a polynomial gap between the broadcast rate and the trivial lower bound for which no gap was previously known. △ Less

Submitted 11 August, 2011; originally announced August 2011.

Comments: 29 pages

MSC Class: 94A29; 68P30; 90C35

arXiv:1004.1379 [pdf, other]

Index coding via linear programming

Authors: Anna Blasiak, Robert Kleinberg, Eyal Lubetzky

Abstract: Index Coding has received considerable attention recently motivated in part by real-world applications and in part by its connection to Network Coding. The basic setting of Index Coding encodes the problem input as an undirected graph and the fundamental parameter is the broadcast rate $β$, the average communication cost per bit for sufficiently long messages (i.e. the non-linear vector capacity).… ▽ More Index Coding has received considerable attention recently motivated in part by real-world applications and in part by its connection to Network Coding. The basic setting of Index Coding encodes the problem input as an undirected graph and the fundamental parameter is the broadcast rate $β$, the average communication cost per bit for sufficiently long messages (i.e. the non-linear vector capacity). Recent nontrivial bounds on $β$ were derived from the study of other Index Coding capacities (e.g. the scalar capacity $β_1$) by Bar-Yossef et al (2006), Lubetzky and Stav (2007) and Alon et al (2008). However, these indirect bounds shed little light on the behavior of $β$: there was no known polynomial-time algorithm for approximating $β$ in a general network to within a nontrivial (i.e. $o(n)$) factor, and the exact value of $β$ remained unknown for any graph where Index Coding is nontrivial. Our main contribution is a direct information-theoretic analysis of the broadcast rate $β$ using linear programs, in contrast to previous approaches that compared $β$ with graph-theoretic parameters. This allows us to resolve the aforementioned two open questions. We provide a polynomial-time algorithm with a nontrivial approximation ratio for computing $β$ in a general network along with a polynomial-time decision procedure for recognizing instances with $β=2$. In addition, we pinpoint $β$ precisely for various classes of graphs (e.g. for various Cayley graphs of cyclic groups) thereby simultaneously improving the previously known upper and lower bounds for these graphs. Via this approach we construct graphs where the difference between $β$ and its trivial lower bound is linear in the number of vertices and ones where $β$ is uniformly bounded while its upper bound derived from the naive encoding scheme is polynomially worse. △ Less

Submitted 12 July, 2011; v1 submitted 8 April, 2010; originally announced April 2010.

Comments: 31 pages, 2 figures

MSC Class: 94A29; 90C35; 68P30; 05C35

arXiv:math/0603207 [pdf, ps, other]

doi 10.1007/s00211-007-0061-6

Fast matrix multiplication is stable

Authors: James Demmel, Ioana Dumitriu, Olga Holtz, Robert Kleinberg

Abstract: We perform forward error analysis for a large class of recursive matrix multiplication algorithms in the spirit of [D. Bini and G. Lotti, Stability of fast algorithms for matrix multiplication, Numer. Math. 36 (1980), 63--72]. As a consequence of our analysis, we show that the exponent of matrix multiplication (the optimal running time) can be achieved by numerically stable algorithms. We also s… ▽ More We perform forward error analysis for a large class of recursive matrix multiplication algorithms in the spirit of [D. Bini and G. Lotti, Stability of fast algorithms for matrix multiplication, Numer. Math. 36 (1980), 63--72]. As a consequence of our analysis, we show that the exponent of matrix multiplication (the optimal running time) can be achieved by numerically stable algorithms. We also show that new group-theoretic algorithms proposed in [H. Cohn, and C. Umans, A group-theoretic approach to fast matrix multiplication, FOCS 2003, 438--449] and [H. Cohn, R. Kleinberg, B. Szegedy and C. Umans, Group-theoretic algorithms for matrix multiplication, FOCS 2005, 379--388] are all included in the class of algorithms to which our analysis applies, and are therefore numerically stable. We perform detailed error analysis for three specific fast group-theoretic algorithms. △ Less

Submitted 7 December, 2006; v1 submitted 8 March, 2006; originally announced March 2006.

Comments: 19 pages; final version, expanded and updated to reflect referees' remarks; to appear in Numerische Mathematik

MSC Class: 65Y20; 65F30; 65G50; 68Q17; 68W40; 20C05; 20K01; 16S34; 43A30; 65T50

Journal ref: Numer. Math. 106 (2007), no. 2, 199-224

arXiv:math/0511460 [pdf, ps, other]

doi 10.1109/SFCS.2005.39

Group-theoretic algorithms for matrix multiplication

Authors: Henry Cohn, Robert Kleinberg, Balazs Szegedy, Christopher Umans

Abstract: We further develop the group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans, and for the first time use it to derive algorithms asymptotically faster than the standard algorithm. We describe several families of wreath product groups that achieve matrix multiplication exponent less than 3, the asymptotically fastest of which achieves exponent 2.41. We present two co… ▽ More We further develop the group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans, and for the first time use it to derive algorithms asymptotically faster than the standard algorithm. We describe several families of wreath product groups that achieve matrix multiplication exponent less than 3, the asymptotically fastest of which achieves exponent 2.41. We present two conjectures regarding specific improvements, one combinatorial and the other algebraic. Either one would imply that the exponent of matrix multiplication is 2. △ Less

Submitted 17 November, 2005; originally announced November 2005.

Comments: 10 pages

Journal ref: Proceedings of the 46th Annual Symposium on Foundations of Computer Science, 23-25 October 2005, Pittsburgh, PA, IEEE Computer Society, pp. 379-388

arXiv:cond-mat/0502205 [pdf, ps, other]

Degree Distribution of Competition-Induced Preferential Attachment Graphs

Authors: N. Berger, C. Borgs, J. T. Chayes, R. M. D'Souza, R. D. Kleinberg

Abstract: We introduce a family of one-dimensional geometric growth models, constructed iteratively by locally optimizing the tradeoffs between two competing metrics, and show that this family is equivalent to a family of preferential attachment random graph models with upper cutoffs. This is the first explanation of how preferential attachment can arise from a more basic underlying mechanism of local com… ▽ More We introduce a family of one-dimensional geometric growth models, constructed iteratively by locally optimizing the tradeoffs between two competing metrics, and show that this family is equivalent to a family of preferential attachment random graph models with upper cutoffs. This is the first explanation of how preferential attachment can arise from a more basic underlying mechanism of local competition. We rigorously determine the degree distribution for the family of random graph models, showing that it obeys a power law up to a finite threshold and decays exponentially above this threshold. We also rigorously analyze a generalized version of our graph process, with two natural parameters, one corresponding to the cutoff and the other a ``fertility'' parameter. We prove that the general model has a power-law degree distribution up to a cutoff, and establish monotonicity of the power as a function of the two parameters. Limiting cases of the general model include the standard preferential attachment model without cutoff and the uniform attachment model. △ Less

Submitted 8 February, 2005; v1 submitted 8 February, 2005; originally announced February 2005.

Comments: 24 pages, one figure. To appear in the journal: Combinatorics, Probability and Computing. Note, this is a long version, with complete proofs, of the paper "Competition-Induced Preferential Attachment" (cond-mat/0402268)

Showing 1–12 of 12 results for author: Kleinberg, R