Search | arXiv e-print repository

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Authors: Thomas P. Zollo, Todd Morrill, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel

Abstract: The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we pro… ▽ More The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we propose Prompt Risk Control, a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures. We offer methods for producing bounds on a diverse set of metrics, including quantities that measure worst-case responses and disparities in generation quality across the population of users. In addition, we extend the underlying statistical bounding techniques to accommodate the possibility of distribution shifts in deployment. Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how such a framework can foster responsible deployment by reducing the risk of the worst outcomes. △ Less

Submitted 27 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: 34 pages, 10 figures, published as conference paper at ICLR 2024, and accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023

arXiv:2309.13786 [pdf, other]

Distribution-Free Statistical Dispersion Control for Societal Applications

Authors: Zhun Deng, Thomas P. Zollo, Jake C. Snell, Toniann Pitassi, Richard Zemel

Abstract: Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the disp… ▽ More Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the dispersion of a loss distribution, or the extent to which different members of a population experience unequal effects of algorithmic decisions. We initiate the study of distribution-free control of statistical dispersion measures with societal implications and propose a simple yet flexible framework that allows us to handle a much richer class of statistical functionals beyond previous work. Our methods are verified through experiments in toxic comment detection, medical imaging, and film recommendation. △ Less

Submitted 6 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: Accepted by NeurIPS as spotlight (top 3% among submissions)

arXiv:2309.06554 [pdf, ps, other]

An improved protocol for ExactlyN with more than 3 players

Authors: Lianna Hambardzumyan, Toniann Pitassi, Suhail Sherif, Morgan Shirley, Adi Shraibman

Abstract: The ExactlyN problem in the number-on-forehead (NOF) communication setting asks $k$ players, each of whom can see every input but their own, if the $k$ input numbers add up to $N$. Introduced by Chandra, Furst and Lipton in 1983, ExactlyN is important for its role in understanding the strength of randomness in communication complexity with many players. It is also tightly connected to the field of… ▽ More The ExactlyN problem in the number-on-forehead (NOF) communication setting asks $k$ players, each of whom can see every input but their own, if the $k$ input numbers add up to $N$. Introduced by Chandra, Furst and Lipton in 1983, ExactlyN is important for its role in understanding the strength of randomness in communication complexity with many players. It is also tightly connected to the field of combinatorics: its $k$-party NOF communication complexity is related to the size of the largest corner-free subset in $[N]^{k-1}$. In 2021, Linial and Shraibman gave more efficient protocols for ExactlyN for 3 players. As an immediate consequence, this also gave a new construction of larger corner-free subsets in $[N]^2$. Later that year Green gave a further refinement to their argument. These results represent the first improvements to the highest-order term for $k=3$ since the famous work of Behrend in 1946. In this paper we give a corresponding improvement to the highest-order term for all $k>3$, the first since Rankin in 1961. That is, we give a more efficient protocol for ExactlyN as well as larger corner-free sets in higher dimensions. Nearly all previous results in this line of research approached the problem from the combinatorics perspective, implicitly resulting in non-constructive protocols for ExactlyN. Approaching the problem from the communication complexity point of view and constructing explicit protocols for ExactlyN was key to the improvements in the $k=3$ setting. As a further contribution we provide explicit protocols for ExactlyN for any number of players which serves as a base for our improvement. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.16042 [pdf, ps, other]

Optimal Non-Adaptive Cell Probe Dictionaries and Hashing

Authors: Kasper Green Larsen, Rasmus Pagh, Giuseppe Persiano, Toniann Pitassi, Kevin Yeo, Or Zamir

Abstract: We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]x[u] using s words of space and answering key lookup queries in t = O(lg(u/n)/ lg(s/n)) nonadaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due… ▽ More We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]x[u] using s words of space and answering key lookup queries in t = O(lg(u/n)/ lg(s/n)) nonadaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due to Buhrman et al. We also present matching lower bounds for the non-adaptive static membership problem in the deterministic setting. Our lower bound implies that both our dictionary algorithm and the preceding membership algorithm are optimal, and in particular that there is an inherent complexity gap in these problems between no adaptivity and one round of adaptivity (with which hashing-based algorithms solve these problems in constant time). Using the ideas underlying our data structure, we also obtain the first implementation of a n-wise independent family of hash functions with optimal evaluation time in the cell probe model. △ Less

Submitted 19 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Appears at ICALP 2024. This paper is a merge and revision of two previous reports [PY20] and [LPPZ23]

arXiv:2305.19320 [pdf, ps, other]

On the algebraic proof complexity of Tensor Isomorphism

Authors: Nicola Galesi, Joshua A. Grochow, Toniann Pitassi, Adrian She

Abstract: The Tensor Isomorphism problem (TI) has recently emerged as having connections to multiple areas of research within complexity and beyond, but the current best upper bound is essentially the brute force algorithm. Being an algebraic problem, TI (or rather, proving that two tensors are non-isomorphic) lends itself very naturally to algebraic and semi-algebraic proof systems, such as the Polynomial… ▽ More The Tensor Isomorphism problem (TI) has recently emerged as having connections to multiple areas of research within complexity and beyond, but the current best upper bound is essentially the brute force algorithm. Being an algebraic problem, TI (or rather, proving that two tensors are non-isomorphic) lends itself very naturally to algebraic and semi-algebraic proof systems, such as the Polynomial Calculus (PC) and Sum of Squares (SoS). For its combinatorial cousin Graph Isomorphism, essentially optimal lower bounds are known for approaches based on PC and SoS (Berkholz & Grohe, SODA '17). Our main results are an $Ω(n)$ lower bound on PC degree or SoS degree for Tensor Isomorphism, and a nontrivial upper bound for testing isomorphism of tensors of bounded rank. We also show that PC cannot perform basic linear algebra in sub-linear degree, such as comparing the rank of two matrices, or deriving $BA=I$ from $AB=I$. As linear algebra is a key tool for understanding tensors, we introduce a strictly stronger proof system, PC+Inv, which allows as derivation rules all substitution instances of the implication $AB=I \rightarrow BA=I$. We conjecture that even PC+Inv cannot solve TI in polynomial time either, but leave open getting lower bounds on PC+Inv for any system of equations, let alone those for TI. We also highlight many other open questions about proof complexity approaches to TI. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Full version of extended abstract to appear in CCC '23

MSC Class: 03F20; 15A69; 68Q25; 13P15 ACM Class: F.2.2; F.4.1

arXiv:2303.12921 [pdf, ps, other]

Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

Authors: Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, Jessica Sorrell

Abstract: The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis… ▽ More The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $δ$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions. △ Less

Submitted 24 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: STOC 2023, minor typos fixed

arXiv:2212.13629 [pdf, other]

Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

Authors: Jake C. Snell, Thomas P. Zollo, Zhun Deng, Toniann Pitassi, Richard Zemel

Abstract: Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantile… ▽ More Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 24 pages, 4 figures. Code is available at https://github.com/jakesnell/quantile-risk-control

arXiv:2210.15439 [pdf, ps, other]

Learning versus Refutation in Noninteractive Local Differential Privacy

Authors: Alexander Edmonds, Aleksandar Nikolov, Toniann Pitassi

Abstract: We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Learning requires finding a concept that best fits an unknown target function (from labelled samples drawn from a distribution), whereas refutation requires distinguishing between data distributions that are well-correlated with some concept in the class, versus distributions where th… ▽ More We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Learning requires finding a concept that best fits an unknown target function (from labelled samples drawn from a distribution), whereas refutation requires distinguishing between data distributions that are well-correlated with some concept in the class, versus distributions where the labels are random. Our main result is a complete characterization of the sample complexity of agnostic PAC learning for non-interactive LDP protocols. We show that the optimal sample complexity for any concept class is captured by the approximate $γ_2$~norm of a natural matrix associated with the class. Combined with previous work [Edmonds, Nikolov and Ullman, 2019] this gives an equivalence between learning and refutation in the agnostic setting. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2201.08430 [pdf, ps, other]

Reproducibility in Learning

Authors: Russell Impagliazzo, Rex Lei, Toniann Pitassi, Jessica Sorrell

Abstract: We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples -- with high probability, it returns the exact same output when run on two samples from the same underlying distribution. We begin by unpacking the definition, clarifying how randomness is instrumental in balancing accuracy and reproducibility.… ▽ More We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples -- with high probability, it returns the exact same output when run on two samples from the same underlying distribution. We begin by unpacking the definition, clarifying how randomness is instrumental in balancing accuracy and reproducibility. We initiate a theory of reproducible algorithms, showing how reproducibility implies desirable properties such as data reuse and efficient testability. Despite the exceedingly strong demand of reproducibility, there are efficient reproducible algorithms for several fundamental problems in statistics and learning. First, we show that any statistical query algorithm can be made reproducible with a modest increase in sample complexity, and we use this to construct reproducible algorithms for finding approximate heavy-hitters and medians. Using these ideas, we give the first reproducible algorithm for learning halfspaces via a reproducible weak learner and a reproducible boosting algorithm. Finally, we initiate the study of lower bounds and inherent tradeoffs for reproducible algorithms, giving nearly tight sample complexity upper and lower bounds for reproducible versus nonreproducible SQ algorithms. △ Less

Submitted 14 April, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2111.07483 [pdf, ps, other]

Tradeoffs for small-depth Frege proofs

Authors: Toniann Pitassi, Prasanna Ramakrishnan, Li-Yang Tan

Abstract: We study the complexity of small-depth Frege proofs and give the first tradeoffs between the size of each line and the number of lines. Existing lower bounds apply to the overall proof size -- the sum of sizes of all lines -- and do not distinguish between these notions of complexity. For depth-$d$ Frege proofs of the Tseitin principle on the $n \times n$ grid where each line is a size-$s$ formu… ▽ More We study the complexity of small-depth Frege proofs and give the first tradeoffs between the size of each line and the number of lines. Existing lower bounds apply to the overall proof size -- the sum of sizes of all lines -- and do not distinguish between these notions of complexity. For depth-$d$ Frege proofs of the Tseitin principle on the $n \times n$ grid where each line is a size-$s$ formula, we prove that $\exp(n/2^{Ω(d\sqrt{\log s})})$ many lines are necessary. This yields new lower bounds on line complexity that are not implied by Håstad's recent $\exp(n^{Ω(1/d)})$ lower bound on the overall proof size. For $s = \mathrm{poly}(n)$, for example, our lower bound remains $\exp(n^{1-o(1)})$ for all $d = o(\sqrt{\log n})$, whereas Håstad's lower bound is $\exp(n^{o(1)})$ once $d = ω_n(1)$. Our main conceptual contribution is the simple observation that techniques for establishing correlation bounds in circuit complexity can be leveraged to establish such tradeoffs in proof complexity. △ Less

Submitted 7 April, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

Comments: FOCS 2021. Fixed typo in Theorem 1.1

arXiv:2102.05019 [pdf, ps, other]

On the Power and Limitations of Branch and Cut

Authors: Noah Fleming, Mika Göös, Russell Impagliazzo, Toniann Pitassi, Robert Robere, Li-Yang Tan, Avi Wigderson

Abstract: The Stabbing Planes proof system was introduced to model the reasoning carried out in practical mixed integer programming solvers. As a proof system, it is powerful enough to simulate Cutting Planes and to refute the Tseitin formulas -- certain unsatisfiable systems of linear equations mod 2 -- which are canonical hard examples for many algebraic proof systems. In a recent (and surprising) result,… ▽ More The Stabbing Planes proof system was introduced to model the reasoning carried out in practical mixed integer programming solvers. As a proof system, it is powerful enough to simulate Cutting Planes and to refute the Tseitin formulas -- certain unsatisfiable systems of linear equations mod 2 -- which are canonical hard examples for many algebraic proof systems. In a recent (and surprising) result, Dadush and Tiwari showed that these short refutations of the Tseitin formulas could be translated into quasi-polynomial size and depth Cutting Planes proofs, refuting a long-standing conjecture. This translation raises several interesting questions. First, whether all Stabbing Planes proofs can be efficiently simulated by Cutting Planes. This would allow for the substantial analysis done on the Cutting Planes system to be lifted to practical mixed integer programming solvers. Second, whether the quasi-polynomial depth of these proofs is inherent to Cutting Planes. In this paper we make progress towards answering both of these questions. First, we show that any Stabbing Planes proof with bounded coefficients SP* can be translated into Cutting Planes. As a consequence of the known lower bounds for Cutting Planes, this establishes the first exponential lower bounds on SP*. Using this translation, we extend the result of Dadush and Tiwari to show that Cutting Planes has short refutations of any unsatisfiable system of linear equations over a finite field. Like the Cutting Planes proofs of Dadush and Tiwari, our refutations also incur a quasi-polynomial blow-up in depth, and we conjecture that this is inherent. As a step towards this conjecture, we develop a new geometric technique for proving lower bounds on the depth of Cutting Planes proofs. This allows us to establish the first lower bounds on the depth of Semantic Cutting Planes proofs of the Tseitin formulas. △ Less

Submitted 21 May, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

arXiv:2102.00314 [pdf, ps, other]

Size and Depth Separation in Approximating Benign Functions with Neural Networks

Authors: Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir

Abstract: When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially-bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these cond… ▽ More When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially-bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions "benign", and explore the benefits of size and depth for approximation of benign functions with ReLU networks. As we show, this problem is more challenging than the corresponding problem for non-benign functions. We give barriers to showing depth-lower-bounds: Proving existence of a benign function that cannot be approximated by polynomial-size networks of depth $4$ would settle longstanding open problems in computational complexity. It implies that beyond depth $4$ there is a barrier to showing depth-separation for benign functions, even between networks of constant depth and networks of nonconstant depth. We also study size-separation, namely, whether there are benign functions that can be approximated with networks of size $O(s(d))$, but not with networks of size $O(s'(d))$. We show a complexity-theoretic barrier to proving such results beyond size $O(d\log^2(d))$, but also show an explicit benign function, that can be approximated with networks of size $O(d)$ and not with networks of size $o(d/\log d)$. For approximation in $L_\infty$ we achieve such separation already between size $O(d)$ and size $o(d)$. Moreover, we show superpolynomial size lower bounds and barriers to such lower bounds, depending on the assumptions on the function. Our size-separation results rely on an analysis of size lower bounds for Boolean functions, which is of independent interest: We show linear size lower bounds for computing explicit Boolean functions with neural networks and threshold circuits. △ Less

Submitted 28 June, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

Comments: Edits after review + changing the terminology from "natural functions" to "benign functions"

arXiv:2010.07140 [pdf, other]

Theoretical bounds on estimation error for meta-learning

Authors: James Lucas, Mengye Ren, Irene Kameni, Toniann Pitassi, Richard Zemel

Abstract: Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test distributions differ. Unfortunately, there is severely limited theoretical support for these alg… ▽ More Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test distributions differ. Unfortunately, there is severely limited theoretical support for these algorithms and little is known about the difficulty of these problems. In this work, we provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms that are trained on data from multiple sources and tested on novel data. Our bounds depend intuitively on the information shared between sources of data, and characterize the difficulty of learning in this setting for arbitrary algorithms. We demonstrate these bounds on a hierarchical Bayesian model of meta-learning, computing both upper and lower bounds on parameter estimation via maximum-a-posteriori inference. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 12 pages in main paper,22 pages in appendix,4 figures total

arXiv:2007.02740 [pdf, ps, other]

KRW Composition Theorems via Lifting

Authors: Susanna F. de Rezende, Or Meir, Jakob Nordström, Toniann Pitassi, Robert Robere

Abstract: One of the major open problems in complexity theory is proving super-logarithmic lower bounds on the depth of circuits (i.e., $\mathbf{P}\not\subseteq\mathbf{NC}^1$). Karchmer, Raz, and Wigderson (Computational Complexity 5(3/4), 1995) suggested to approach this problem by proving that depth complexity behaves "as expected" with respect to the composition of functions $f\diamond g$. They showed th… ▽ More One of the major open problems in complexity theory is proving super-logarithmic lower bounds on the depth of circuits (i.e., $\mathbf{P}\not\subseteq\mathbf{NC}^1$). Karchmer, Raz, and Wigderson (Computational Complexity 5(3/4), 1995) suggested to approach this problem by proving that depth complexity behaves "as expected" with respect to the composition of functions $f\diamond g$. They showed that the validity of this conjecture would imply that $\mathbf{P}\not\subseteq\mathbf{NC}^1$. Several works have made progress toward resolving this conjecture by proving special cases. In particular, these works proved the KRW conjecture for every outer function $f$, but only for few inner functions $g$. Thus, it is an important challenge to prove the KRW conjecture for a wider range of inner functions. In this work, we extend significantly the range of inner functions that can be handled. First, we consider the $\textit{monotone}$ version of the KRW conjecture. We prove it for every monotone inner function $g$ whose depth complexity can be lower bounded via a query-to-communication lifting theorem. This allows us to handle several new and well-studied functions such as the $s\textbf{-}t$-connectivity, clique, and generation functions. In order to carry this progress back to the $\textit{non-monotone}$ setting, we introduce a new notion of $\textit{semi-monotone}$ composition, which combines the non-monotone complexity of the outer function $f$ with the monotone complexity of the inner function $g$. In this setting, we prove the KRW conjecture for a similar selection of inner functions $g$, but only for a specific choice of the outer function $f$. △ Less

Submitted 27 January, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2004.08037 [pdf, other]

Automating Cutting Planes is NP-Hard}

Authors: Mika Göös, Sa** Koroth, Ian Mertz, Toniann Pitassi

Abstract: We show that Cutting Planes (CP) proofs are hard to find: Given an unsatisfiable formula $F$, 1) It is NP-hard to find a CP refutation of $F$ in time polynomial in the length of the shortest such refutation; and 2)unless Gap-Hitting-Set admits a nontrivial algorithm, one cannot find a tree-like CP refutation of $F$ in time polynomial in the length of the shortest such refutation. The first r… ▽ More We show that Cutting Planes (CP) proofs are hard to find: Given an unsatisfiable formula $F$, 1) It is NP-hard to find a CP refutation of $F$ in time polynomial in the length of the shortest such refutation; and 2)unless Gap-Hitting-Set admits a nontrivial algorithm, one cannot find a tree-like CP refutation of $F$ in time polynomial in the length of the shortest such refutation. The first result extends the recent breakthrough of Atserias and Müller (FOCS 2019) that established an analogous result for Resolution. Our proofs rely on two new lifting theorems: (1) Dag-like lifting for gadgets with many output bits. (2) Tree-like lifting that simulates an $r$-round protocol with gadgets of query complexity $O(\log r)$ independent of input length. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: Full version of the conference version at STOC 2020 by the same title

arXiv:2003.02323 [pdf, ps, other]

Towards a Complexity-theoretic Understanding of Restarts in SAT solvers

Authors: Chunxiao Li, Noah Fleming, Marc Vinyals, Toniann Pitassi, Vijay Ganesh

Abstract: Restarts are a widely-used class of techniques integral to the efficiency of Conflict-Driven Clause Learning (CDCL) Boolean SAT solvers. While the utility of such policies has been well-established empirically, a theoretical explanation of whether restarts are indeed crucial to the power of CDCL solvers is lacking. In this paper, we prove a series of theoretical results that characterize the power… ▽ More Restarts are a widely-used class of techniques integral to the efficiency of Conflict-Driven Clause Learning (CDCL) Boolean SAT solvers. While the utility of such policies has been well-established empirically, a theoretical explanation of whether restarts are indeed crucial to the power of CDCL solvers is lacking. In this paper, we prove a series of theoretical results that characterize the power of restarts for various models of SAT solvers. More precisely, we make the following contributions. First, we prove an exponential separation between a {\it drunk} randomized CDCL solver model with restarts and the same model without restarts using a family of satisfiable instances. Second, we show that the configuration of CDCL solver with VSIDS branching and restarts (with activities erased after restarts) is exponentially more powerful than the same configuration without restarts for a family of unsatisfiable instances. To the best of our knowledge, these are the first separation results involving restarts in the context of SAT solvers. Third, we show that restarts do not add any proof complexity-theoretic power vis-a-vis a number of models of CDCL and DPLL solvers with non-deterministic static variable and value selection. △ Less

Submitted 11 May, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

arXiv:2001.02144 [pdf, ps, other]

Lifting with Simple Gadgets and Applications to Circuit and Proof Complexity

Authors: Susanna F. de Rezende, Or Meir, Jakob Nordström, Toniann Pitassi, Robert Robere, Marc Vinyals

Abstract: We significantly strengthen and generalize the theorem lifting Nullstellensatz degree to monotone span program size by Pitassi and Robere (2018) so that it works for any gadget with high enough rank, in particular, for useful gadgets such as equality and greater-than. We apply our generalized theorem to solve two open problems: * We present the first result that demonstrates a separation in proo… ▽ More We significantly strengthen and generalize the theorem lifting Nullstellensatz degree to monotone span program size by Pitassi and Robere (2018) so that it works for any gadget with high enough rank, in particular, for useful gadgets such as equality and greater-than. We apply our generalized theorem to solve two open problems: * We present the first result that demonstrates a separation in proof power for cutting planes with unbounded versus polynomially bounded coefficients. Specifically, we exhibit CNF formulas that can be refuted in quadratic length and constant line space in cutting planes with unbounded coefficients, but for which there are no refutations in subexponential length and subpolynomial line space if coefficients are restricted to be of polynomial magnitude. * We give the first explicit separation between monotone Boolean formulas and monotone real formulas. Specifically, we give an explicit family of functions that can be computed with monotone real formulas of nearly linear size but require monotone Boolean formulas of exponential size. Previously only a non-explicit separation was known. An important technical ingredient, which may be of independent interest, is that we show that the Nullstellensatz degree of refuting the pebbling formula over a DAG G over any field coincides exactly with the reversible pebbling price of G. In particular, this implies that the standard decision tree complexity and the parity decision tree complexity of the corresponding falsified clause search problem are equal. △ Less

Submitted 7 January, 2020; originally announced January 2020.

ACM Class: F.2.2; F.2.3; F.4.1

arXiv:1909.09141 [pdf, other]

Causal Modeling for Fairness in Dynamical Systems

Authors: Elliot Creager, David Madras, Toniann Pitassi, Richard Zemel

Abstract: In many application areas---lending, education, and online recommenders, for example---fairness and equity concerns emerge when a machine learning system interacts with a dynamically changing environment to produce both immediate and long-term effects for individuals and demographic groups. We discuss causal directed acyclic graphs (DAGs) as a unifying framework for the recent literature on fairne… ▽ More In many application areas---lending, education, and online recommenders, for example---fairness and equity concerns emerge when a machine learning system interacts with a dynamically changing environment to produce both immediate and long-term effects for individuals and demographic groups. We discuss causal directed acyclic graphs (DAGs) as a unifying framework for the recent literature on fairness in such dynamical systems. We show that this formulation affords several new directions of inquiry to the modeler, where causal assumptions can be expressed and manipulated. We emphasize the importance of computing interventional quantities in the dynamical fairness setting, and show how causal assumptions enable simulation (when environment dynamics are known) and off-policy estimation (when dynamics are unknown) of intervention on short- and long-term outcomes, at both the group and individual levels. △ Less

Submitted 6 July, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

arXiv:1906.02589 [pdf, other]

Flexibly Fair Representation Learning by Disentanglement

Authors: Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel

Abstract: We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easi… ▽ More We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder---which does not require the sensitive attributes for inference---enables the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions. △ Less

Submitted 6 June, 2019; originally announced June 2019.

Journal ref: Proceedings of the International Conference on Machine Learning (ICML), 2019

arXiv:1904.13056 [pdf, ps, other]

Query-to-Communication Lifting Using Low-Discrepancy Gadgets

Authors: Arkadev Chattopadhyay, Yuval Filmus, Sa** Koroth, Or Meir, Toniann Pitassi

Abstract: Lifting theorems are theorems that relate the query complexity of a function $f:\{0,1\}^{n}\to\{0,1\}$ to the communication complexity of the composed function $f \circ g^{n}$, for some "gadget" $g:\{0,1\}^{b}\times\{0,1\}^{b}\to\{0,1\}$. Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In… ▽ More Lifting theorems are theorems that relate the query complexity of a function $f:\{0,1\}^{n}\to\{0,1\}$ to the communication complexity of the composed function $f \circ g^{n}$, for some "gadget" $g:\{0,1\}^{b}\times\{0,1\}^{b}\to\{0,1\}$. Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget $g$. We prove a new lifting theorem that works for all gadgets $g$ that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold. Our result has two main motivations: First, allowing a larger variety of gadgets may support more applications. In particular, our work is the first to prove a randomized lifting theorem for logarithmic-size gadgets, thus improving some applications of the theorem. Second, our result can be seen as a strong generalization of a direct-sum theorem for functions with low discrepancy. △ Less

Submitted 5 October, 2021; v1 submitted 30 April, 2019; originally announced April 2019.

Comments: This work subsumes an earlier work that appears in ICALP 2019

arXiv:1809.02519 [pdf, ps, other]

Fairness Through Causal Awareness: Learning Latent-Variable Models for Biased Data

Authors: David Madras, Elliot Creager, Toniann Pitassi, Richard Zemel

Abstract: How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data, exploring the relationship between f… ▽ More How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data, exploring the relationship between fair classification and intervention. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help learn policies that are both more accurate and fair, when presented with a historically biased dataset. △ Less

Submitted 2 December, 2018; v1 submitted 7 September, 2018; originally announced September 2018.

Comments: Accepted as a conference paper at ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) 2019

arXiv:1802.06309 [pdf, other]

Learning Adversarially Fair and Transferable Representations

Authors: David Madras, Elliot Creager, Toniann Pitassi, Richard Zemel

Abstract: In this paper, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. Motivated by a scenario where learned representations are used by third parties with unknown objectives, we propose and explore adversarial representation learning as a natural method of ensuring those parties act fairly. We connect group fairness (demographic parity, equalized od… ▽ More In this paper, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. Motivated by a scenario where learned representations are used by third parties with unknown objectives, we propose and explore adversarial representation learning as a natural method of ensuring those parties act fairly. We connect group fairness (demographic parity, equalized odds, and equal opportunity) to different adversarial objectives. Through worst-case theoretical guarantees and experimental validation, we show that the choice of this objective is crucial to fair prediction. Furthermore, we present the first in-depth experimental demonstration of fair transfer learning and demonstrate empirically that our learned representations admit fair predictions on new tasks while maintaining utility, an essential goal of fair representation learning. △ Less

Submitted 22 October, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

arXiv:1711.06664 [pdf, other]

Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer

Authors: David Madras, Toniann Pitassi, Richard Zemel

Abstract: In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say "Pass", and pass the de… ▽ More In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say "Pass", and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing "learning to defer", which generalizes rejection learning by considering the effect of other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. Even when working with inconsistent or biased users, we show that deferring models still greatly improve the accuracy and/or fairness of the entire system. △ Less

Submitted 6 September, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: Accepted as a conference paper at Neural Information Processing Systems 2018

arXiv:1710.03219 [pdf, ps, other]

Stabbing Planes

Authors: Paul Beame, Noah Fleming, Russell Impagliazzo, Antonina Kolokolova, Denis Pankratov, Toniann Pitassi, Robert Robere

Abstract: We develop a new semi-algebraic proof system called Stabbing Planes which formalizes modern branch-and-cut algorithms for integer programming and is in the style of DPLL-based modern SAT solvers. As with DPLL there is only a single rule: the current polytope can be subdivided by branching on an inequality and its "integer negation." That is, we can (nondeterministically choose) a hyperplane… ▽ More We develop a new semi-algebraic proof system called Stabbing Planes which formalizes modern branch-and-cut algorithms for integer programming and is in the style of DPLL-based modern SAT solvers. As with DPLL there is only a single rule: the current polytope can be subdivided by branching on an inequality and its "integer negation." That is, we can (nondeterministically choose) a hyperplane $ax \geq b$ with integer coefficients, which partitions the polytope into three pieces: the points in the polytope satisfying $ax \geq b$, the points satisfying $ax \leq b-1$, and the middle slab $b - 1 < ax < b$. Since the middle slab contains no integer points it can be safely discarded, and the algorithm proceeds recursively on the other two branches. Each path terminates when the current polytope is empty, which is polynomial-time checkable. Among our results, we show that Stabbing Planes can efficiently simulate the Cutting Planes proof system, and is equivalent to a tree-like variant of the RCP system of [Krajicek98]. As well, we show that it possesses short proofs of the canonical family of systems of $\mathbb{F}_2$-linear equations known as the Tseitin formulas. Finally, we prove linear lower bounds on the rank of Stabbing Planes refutations by adapting lower bounds in communication complexity and use these bounds in order to show that Stabbing Planes proofs cannot be balanced. In doing so, we show that real communication protocols cannot be balanced and establish the first lower bound on the real communication complexity of the set disjointness function. △ Less

Submitted 17 March, 2023; v1 submitted 9 October, 2017; originally announced October 2017.

arXiv:1706.02207 [pdf, ps, other]

On The Communication Complexity of High-Dimensional Permutations

Authors: Nati Linial, and Toniann Pitassi, Adi Shraibman

Abstract: We study the multiparty communication complexity of high dimensional permutations, in the Number On the Forehead (NOF) model. This model is due to Chandra, Furst and Lipton (CFL) who also gave a nontrivial protocol for the Exactly-n problem where three players receive integer inputs and need to decide if their inputs sum to a given integer $n$. There is a considerable body of literature dealing wi… ▽ More We study the multiparty communication complexity of high dimensional permutations, in the Number On the Forehead (NOF) model. This model is due to Chandra, Furst and Lipton (CFL) who also gave a nontrivial protocol for the Exactly-n problem where three players receive integer inputs and need to decide if their inputs sum to a given integer $n$. There is a considerable body of literature dealing with the same problem, where $(\mathbb{N},+)$ is replaced by some other abelian group. Our work can be viewed as a far-reaching extension of this line of work. We show that the known lower bounds for that group-theoretic problem apply to all high dimensional permutations. We introduce new proof techniques that appeal to recent advances in Additive Combinatorics and Ramsey theory. We reveal new and unexpected connections between the NOF communication complexity of high dimensional permutations and a variety of well known and thoroughly studied problems in combinatorics. Previous protocols for Exactly-n all rely on the construction of large sets of integers without a 3-term arithmetic progression. No direct algorithmic protocol was previously known for the problem, and we provide the first such algorithm. This suggests new ways to significantly improve the CFL protocol. Many new open questions are presented throughout. △ Less

Submitted 27 November, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1703.07666 [pdf, other]

Query-to-Communication Lifting for BPP

Authors: Mika Göös, Toniann Pitassi, Thomas Watson

Abstract: For any $n$-bit boolean function $f$, we show that the randomized communication complexity of the composed function $f\circ g^n$, where $g$ is an index gadget, is characterized by the randomized decision tree complexity of $f$. In particular, this means that many query complexity separations involving randomized models (e.g., classical vs. quantum) automatically imply analogous separations in comm… ▽ More For any $n$-bit boolean function $f$, we show that the randomized communication complexity of the composed function $f\circ g^n$, where $g$ is an index gadget, is characterized by the randomized decision tree complexity of $f$. In particular, this means that many query complexity separations involving randomized models (e.g., classical vs. quantum) automatically imply analogous separations in communication complexity. △ Less

Submitted 22 March, 2017; originally announced March 2017.

Comments: 21 pages

arXiv:1703.02469 [pdf, ps, other]

Random CNFs are Hard for Cutting Planes

Authors: Noah Fleming, Denis Pankratov, Toniann Pitassi, Robert Robere

Abstract: The random k-SAT model is the most important and well-studied distribution over k-SAT instances. It is closely connected to statistical physics; it is used as a testbench for satisfiability algorithms, and average-case hardness over this distribution has also been linked to hardness of approximation via Feige's hypothesis. We prove that any Cutting Planes refutation for random k-SAT requires expon… ▽ More The random k-SAT model is the most important and well-studied distribution over k-SAT instances. It is closely connected to statistical physics; it is used as a testbench for satisfiability algorithms, and average-case hardness over this distribution has also been linked to hardness of approximation via Feige's hypothesis. We prove that any Cutting Planes refutation for random k-SAT requires exponential size, for k that is logarithmic in the number of variables, in the (interesting) regime where the number of clauses guarantees that the formula is unsatisfiable with high probability. △ Less

Submitted 7 March, 2017; originally announced March 2017.

arXiv:1607.00443 [pdf, ps, other]

Algebraic Proof Complexity: Progress, Frontiers and Challenges

Authors: Tonnian Pitassi, Iddo Tzameret

Abstract: We survey recent progress in the proof complexity of strong proof systems and its connection to algebraic circuit complexity, showing how the synergy between the two gives rise to new approaches to fundamental open questions, solutions to old problems, and new directions of research. In particular, we focus on tight connections between proof complexity lower bounds (namely, lower bounds on the siz… ▽ More We survey recent progress in the proof complexity of strong proof systems and its connection to algebraic circuit complexity, showing how the synergy between the two gives rise to new approaches to fundamental open questions, solutions to old problems, and new directions of research. In particular, we focus on tight connections between proof complexity lower bounds (namely, lower bounds on the size of proofs of certain tautologies), algebraic circuit lower bounds, and the Polynomial Identity Testing problem from derandomization theory. △ Less

Submitted 1 July, 2016; originally announced July 2016.

Comments: Complexity Column of the ACM SIGLOG News, ACM New York, NY, USA, July 2016

MSC Class: 03F20; 68Q25

arXiv:1506.02629 [pdf, other]

Generalization in Adaptive Data Analysis and Holdout Reuse

Authors: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth

Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on th… ▽ More Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. An investigation of this gap has recently been initiated by the authors in (Dwork et al., 2014), where we focused on the problem of estimating expectations of adaptively chosen functions. In this paper, we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set. Reusing a holdout set adaptively multiple times can easily lead to overfitting to the holdout set itself. We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting. We illustrate the advantages of our algorithm over the standard use of the holdout set via a simple synthetic experiment. We also formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in (Dwork et al., 2014) is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce. △ Less

Submitted 25 September, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

arXiv:1411.2664 [pdf, ps, other]

Preserving Statistical Validity in Adaptive Data Analysis

Authors: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth

Abstract: A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference ass… ▽ More A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of $m$ adaptively chosen functions on an unknown distribution given $n$ random samples. We show that, surprisingly, there is a way to estimate an exponential in $n$ number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question. △ Less

Submitted 2 March, 2016; v1 submitted 10 November, 2014; originally announced November 2014.

Comments: Updated related work with recent developments

arXiv:1404.3820 [pdf, ps, other]

Circuit complexity, proof complexity, and polynomial identity testing

Authors: Joshua A. Grochow, Toniann Pitassi

Abstract: We introduce a new algebraic proof system, which has tight connections to (algebraic) circuit complexity. In particular, we show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits (VNP is not equal to VP). As a corollary to the proof, we also show that super-polynomial lower bounds on the nu… ▽ More We introduce a new algebraic proof system, which has tight connections to (algebraic) circuit complexity. In particular, we show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits (VNP is not equal to VP). As a corollary to the proof, we also show that super-polynomial lower bounds on the number of lines in Polynomial Calculus proofs (as opposed to the usual measure of number of monomials) imply the Permanent versus Determinant Conjecture. Note that, prior to our work, there was no proof system for which lower bounds on an arbitrary tautology implied any computational lower bound. Our proof system helps clarify the relationships between previous algebraic proof systems, and begins to shed light on why proof complexity lower bounds for various proof systems have been so much harder than lower bounds on the corresponding circuit classes. In doing so, we highlight the importance of polynomial identity testing (PIT) for understanding proof complexity. More specifically, we introduce certain propositional axioms satisfied by any Boolean circuit computing PIT. We use these PIT axioms to shed light on AC^0[p]-Frege lower bounds, which have been open for nearly 30 years, with no satisfactory explanation as to their apparent difficulty. We show that either: a) Proving super-polynomial lower bounds on AC^0[p]-Frege implies VNP does not have polynomial-size circuits of depth d - a notoriously open question for d at least 4 - thus explaining the difficulty of lower bounds on AC^0[p]-Frege, or b) AC^0[p]-Frege cannot efficiently prove the depth d PIT axioms, and hence we have a lower bound on AC^0[p]-Frege. Using the algebraic structure of our proof system, we propose a novel way to extend techniques from algebraic circuit complexity to prove lower bounds in proof complexity. △ Less

Submitted 15 April, 2014; originally announced April 2014.

ACM Class: F.2.2; F.4.1; F.1.3

arXiv:1401.3458 [pdf]

doi 10.1613/jair.2648

Solving #SAT and Bayesian Inference with Backtracking Search

Authors: Fahiem Bacchus, Shannon Dalmao, Toniann Pitassi

Abstract: Inference in Bayes Nets (BAYES) is an important problem with numerous applications in probabilistic reasoning. Counting the number of satisfying assignments of a propositional formula (#SAT) is a closely related problem of fundamental theoretical importance. Both these problems, and others, are members of the class of sum-of-products (SUMPROD) problems. In this paper we show that standard backtrac… ▽ More Inference in Bayes Nets (BAYES) is an important problem with numerous applications in probabilistic reasoning. Counting the number of satisfying assignments of a propositional formula (#SAT) is a closely related problem of fundamental theoretical importance. Both these problems, and others, are members of the class of sum-of-products (SUMPROD) problems. In this paper we show that standard backtracking search when augmented with a simple memoization scheme (caching) can solve any sum-of-products problem with time complexity that is at least as good any other state-of-the-art exact algorithm, and that it can also achieve the best known time-space tradeoff. Furthermore, backtracking's ability to utilize more flexible variable orderings allows us to prove that it can achieve an exponential speedup over other standard algorithms for SUMPROD on some instances. The ideas presented here have been utilized in a number of solvers that have been applied to various types of sum-of-product problems. These system's have exploited the fact that backtracking can naturally exploit more of the problem's structure to achieve improved performance on a range of probleminstances. Empirical evidence of this performance gain has appeared in published works describing these solvers, and we provide references to these works. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 34, pages 391-442, 2009

arXiv:1311.2355 [pdf, other]

Communication Lower Bounds via Critical Block Sensitivity

Authors: Mika Göös, Toniann Pitassi

Abstract: We use critical block sensitivity, a new complexity measure introduced by Huynh and Nordström (STOC 2012), to study the communication complexity of search problems. To begin, we give a simple new proof of the following central result of Huynh and Nordström: if $S$ is a search problem with critical block sensitivity $b$, then every randomised two-party protocol solving a certain two-party lift of… ▽ More We use critical block sensitivity, a new complexity measure introduced by Huynh and Nordström (STOC 2012), to study the communication complexity of search problems. To begin, we give a simple new proof of the following central result of Huynh and Nordström: if $S$ is a search problem with critical block sensitivity $b$, then every randomised two-party protocol solving a certain two-party lift of $S$ requires $Ω(b)$ bits of communication. Besides simplicity, our proof has the advantage of generalising to the multi-party setting. We combine these results with new critical block sensitivity lower bounds for Tseitin and Pebbling search problems to obtain the following applications: (1) Monotone Circuit Depth: We exhibit a monotone $n$-variable function in NP whose monotone circuits require depth $Ω(n/\log n)$; previously, a bound of $Ω(\sqrt{n})$ was known (Raz and Wigderson, JACM 1992). Moreover, we prove a $Θ(\sqrt{n})$ monotone depth bound for a function in monotone P. (2) Proof Complexity: We prove new rank lower bounds as well as obtain the first length--space lower bounds for semi-algebraic proof systems, including Lovász--Schrijver and Lasserre (SOS) systems. In particular, these results extend and simplify the works of Beame et al. (SICOMP 2007) and Huynh and Nordström. △ Less

Submitted 11 July, 2016; v1 submitted 11 November, 2013; originally announced November 2013.

Comments: 33 pages, 6 figures

arXiv:1305.4696 [pdf, ps, other]

Tight Bounds for Set Disjointness in the Message Passing Model

Authors: Mark Braverman, Faith Ellen, Rotem Oshman, Toniann Pitassi, Vinod Vaikuntanathan

Abstract: In a multiparty message-passing model of communication, there are $k$ players. Each player has a private input, and they communicate by sending messages to one another over private channels. While this model has been used extensively in distributed computing and in multiparty computation, lower bounds on communication complexity in this model and related models have been somewhat scarce. In recent… ▽ More In a multiparty message-passing model of communication, there are $k$ players. Each player has a private input, and they communicate by sending messages to one another over private channels. While this model has been used extensively in distributed computing and in multiparty computation, lower bounds on communication complexity in this model and related models have been somewhat scarce. In recent work \cite{phillips12,woodruff12,woodruff13}, strong lower bounds of the form $Ω(n \cdot k)$ were obtained for several functions in the message-passing model; however, a lower bound on the classical Set Disjointness problem remained elusive. In this paper, we prove tight lower bounds of the form $Ω(n \cdot k)$ for the Set Disjointness problem in the message passing model. Our bounds are obtained by develo** information complexity tools in the message-passing model, and then proving an information complexity lower bound for Set Disjointness. As a corollary, we show a tight lower bound for the task allocation problem \cite{DruckerKuhnOshman} via a reduction from Set Disjointness. △ Less

Submitted 20 May, 2013; originally announced May 2013.

arXiv:1212.2452 [pdf]

Value Elimination: Bayesian Inference via Backtracking Search

Authors: Fahiem Bacchus, Shannon Dalmao, Toniann Pitassi

Abstract: Backtracking search is a powerful algorithmic paradigm that can be used to solve many problems. It is in a certain sense the dual of variable elimination; but on many problems, e.g., SAT, it is vastly superior to variable elimination in practice. Motivated by this we investigate the application of backtracking search to the problem of Bayesian inference (Bayes). We show that natura… ▽ More Backtracking search is a powerful algorithmic paradigm that can be used to solve many problems. It is in a certain sense the dual of variable elimination; but on many problems, e.g., SAT, it is vastly superior to variable elimination in practice. Motivated by this we investigate the application of backtracking search to the problem of Bayesian inference (Bayes). We show that natural generalizations of known techniques allow backtracking search to achieve performance guarantees similar to standard algorithms for Bayes, and that there exist problems on which backtracking can in fact do much better. We also demonstrate that these ideas can be applied to implement a Bayesian inference engine whose performance is competitive with standard algorithms. Since backtracking search can very naturally take advantage of context specific structure, the potential exists for performance superior to standard algorithms on many problems. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-20-28

arXiv:1109.4910 [pdf, other]

Inapproximability of Treewidth, One-Shot Pebbling, and Related Layout Problems

Authors: Per Austrin, Toniann Pitassi, Yu Wu

Abstract: We study the approximability of a number of graph problems: treewidth and pathwidth of graphs, one-shot black (and black-white) pebbling costs of directed acyclic graphs, and a variety of different graph layout problems such as minimum cut linear arrangement and interval graph completion. We show that, assuming the recently introduced Small Set Expansion Conjecture, all of these problems are hard… ▽ More We study the approximability of a number of graph problems: treewidth and pathwidth of graphs, one-shot black (and black-white) pebbling costs of directed acyclic graphs, and a variety of different graph layout problems such as minimum cut linear arrangement and interval graph completion. We show that, assuming the recently introduced Small Set Expansion Conjecture, all of these problems are hard to approximate within any constant factor. △ Less

Submitted 22 September, 2011; originally announced September 2011.

arXiv:1104.3913 [pdf, other]

Fairness Through Awareness

Authors: Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel

Abstract: We study fairness in classification, where individuals are classified, e.g., admitted to a university, and the goal is to prevent discrimination against individuals based on their membership in some group, while maintaining utility for the classifier (the university). The main conceptual contribution of this paper is a framework for fair classification comprising (1) a (hypothetical) task-specific… ▽ More We study fairness in classification, where individuals are classified, e.g., admitted to a university, and the goal is to prevent discrimination against individuals based on their membership in some group, while maintaining utility for the classifier (the university). The main conceptual contribution of this paper is a framework for fair classification comprising (1) a (hypothetical) task-specific metric for determining the degree to which individuals are similar with respect to the classification task at hand; (2) an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly. We also present an adaptation of our approach to achieve the complementary goal of "fair affirmative action," which guarantees statistical parity (i.e., the demographics of the set of individuals receiving any classification are the same as the demographics of the underlying population), while treating similar individuals as similarly as possible. Finally, we discuss the relationship of fairness to privacy: when fairness implies privacy, and how tools developed in the context of differential privacy may be applied to fairness. △ Less

Submitted 28 November, 2011; v1 submitted 19 April, 2011; originally announced April 2011.

arXiv:0912.0568 [pdf, ps, other]

Hardness Amplification in Proof Complexity

Authors: Paul Beame, Trinh Huynh, Toniann Pitassi

Abstract: We present a general method for converting any family of unsatisfiable CNF formulas that is hard for one of the simplest proof systems, tree resolution, into formulas that require large rank in any proof system that manipulates polynomials or polynomial threshold functions of degree at most k (known as Th(k) proofs). Such systems include Lovasz-Schrijver and Cutting Planes proof systems as well… ▽ More We present a general method for converting any family of unsatisfiable CNF formulas that is hard for one of the simplest proof systems, tree resolution, into formulas that require large rank in any proof system that manipulates polynomials or polynomial threshold functions of degree at most k (known as Th(k) proofs). Such systems include Lovasz-Schrijver and Cutting Planes proof systems as well as their high degree analogues. These are based on analyzing two new proof systems, denoted by T^cc(k) and R^cc(k). The proof lines of T^cc(k) are arbitrary Boolean functions, each of which can be evaluated by an efficient k-party randomized communication protocol. They include Th{k-1} proofs as a special case. R^cc(k) proofs are stronger and only require that each inference be locally checkable by an efficient k-party randomized communication protocol. Our main results are the following: (1) When k is O(loglogn), for any unsatisfiable CNF formula F requiring resolution rank r, there is a related CNF formula G=Lift_k(F) requiring refutation rank r^Omega(1/k) log^O(1) n in all R^cc(k) systems. (2) There are strict hierarchies for T^cc(k) and R^cc(k) systems with respect to k when k is O(loglogn in that there are unsatisfiable CNF formulas requiring large rank R^cc(k) refutations but having log^O(1) n rank Th(k) refutations. (3) When k is O(loglogn) there are 2^(n^Omega(1/k)) lower bounds on the size of tree-like T^cc(k) refutations for large classes of lifted CNF formulas. (4) A general method for producing integrality gaps for low rank R^cc(2) inference (and hence Cutting Planes and Th(1) inference) based on related gaps for low rank resolution. These gaps are optimal for MAX-2t-SAT. △ Less

Submitted 2 December, 2009; originally announced December 2009.

Comments: 28 pages

arXiv:0802.3860 [pdf, ps, other]

Separating NOF communication complexity classes RP and NP

Authors: Matei David, Toniann Pitassi

Abstract: We provide a non-explicit separation of the number-on-forehead communication complexity classes RP and NP when the number of players is up to δlog(n) for any δ<1. Recent lower bounds on Set-Disjointness [LS08,CA08] provide an explicit separation between these classes when the number of players is only up to o(loglog(n)). We provide a non-explicit separation of the number-on-forehead communication complexity classes RP and NP when the number of players is up to δlog(n) for any δ<1. Recent lower bounds on Set-Disjointness [LS08,CA08] provide an explicit separation between these classes when the number of players is only up to o(loglog(n)). △ Less

Submitted 26 February, 2008; originally announced February 2008.

ACM Class: F.1.3

Showing 1–39 of 39 results for author: Pitassi, T