-
On Computationally Efficient Multi-Class Calibration
Authors:
Parikshit Gopalan,
Lunjia Hu,
Guy N. Rothblum
Abstract:
Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration…
▽ More
Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees.
Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.
△ Less
Submitted 8 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Omnipredictors for Regression and the Approximate Rank of Convex Functions
Authors:
Parikshit Gopalan,
Princewill Okoroafor,
Prasad Raghavendra,
Abhishek Shetty,
Mihir Singhal
Abstract:
Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of…
▽ More
Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions.
Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $ε$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/ε)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Agnostically Learning Single-Index Models using Omnipredictors
Authors:
Aravind Gollakota,
Parikshit Gopalan,
Adam R. Klivans,
Konstantinos Stavropoulos
Abstract:
We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boun…
▽ More
We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$^+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
When Does Optimizing a Proper Loss Yield Calibration?
Authors:
Jarosław Błasiok,
Parikshit Gopalan,
Lunjia Hu,
Preetum Nakkiran
Abstract:
Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the…
▽ More
Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in Kakade-Foster (2008), Błasiok et al. (2023). Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal.
△ Less
Submitted 8 December, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Loss Minimization Yields Multicalibration for Large Neural Networks
Authors:
Jarosław Błasiok,
Parikshit Gopalan,
Lunjia Hu,
Adam Tauman Kalai,
Preetum Nakkiran
Abstract:
Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions.
In this work, we consider the setting where the protected groups can be represented by neural networks of size $k$, and the…
▽ More
Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions.
In this work, we consider the setting where the protected groups can be represented by neural networks of size $k$, and the predictors are neural networks of size $n > k$. We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$. We also give evidence that our bound on the number of unlucky values is tight, given our proof technique. Previously, results of the flavor that loss minimization yields multicalibration were known only for predictors that were near the ground truth, hence were rather limited in applicability. Unlike these, our results rely on the expressivity of neural nets and utilize the representation of the predictor.
△ Less
Submitted 7 December, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration
Authors:
Parikshit Gopalan,
Michael P. Kim,
Omer Reingold
Abstract:
We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis $h$; then, the adversary plays in response, and for each level set of the predictor $\{x \in \mathcal{X} : h(x) = v\}$ selects a (different) loss-minimizing hypothesis $c_v \in \mathcal{C}$; the predictor wins if $h$ competes with the a…
▽ More
We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis $h$; then, the adversary plays in response, and for each level set of the predictor $\{x \in \mathcal{X} : h(x) = v\}$ selects a (different) loss-minimizing hypothesis $c_v \in \mathcal{C}$; the predictor wins if $h$ competes with the adaptive adversary's loss. Despite the strength of the adversary, we demonstrate the feasibility Swap Agnostic Learning for any convex loss.
Somewhat surprisingly, the result follows through an investigation into the connections between Omniprediction and Multicalibration. Omniprediction is a new notion of optimality for predictors that strengthtens classical notions such as agnostic learning. It asks for loss minimization guarantees (relative to a hypothesis class) that apply not just for a specific loss function, but for any loss belonging to a rich family of losses. A recent line of work shows that omniprediction is implied by multicalibration and related multi-group fairness notions. This unexpected connection raises the question: is multi-group fairness necessary for omniprediction?
Our work gives the first affirmative answer to this question. We establish an equivalence between swap variants of omniprediction and multicalibration and swap agnostic learning. Further, swap multicalibration is essentially equivalent to the standard notion of multicalibration, so existing learning algorithms can be used to achieve any of the three notions. Building on this characterization, we paint a complete picture of the relationship between different variants of multi-group fairness, omniprediction, and Outcome Indistinguishability. This inquiry reveals a unified notion of OI that captures all existing notions of omniprediction and multicalibration.
△ Less
Submitted 21 January, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
A Unifying Theory of Distance from Calibration
Authors:
Jarosław Błasiok,
Parikshit Gopalan,
Lunjia Hu,
Preetum Nakkiran
Abstract:
We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular me…
▽ More
We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity.
We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is polynomially related to this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal in a natural model for measuring calibration which we term the prediction-only access model. Our work thus establishes fundamental lower and upper bounds on measuring the distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.
△ Less
Submitted 31 March, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Loss Minimization through the Lens of Outcome Indistinguishability
Authors:
Parikshit Gopalan,
Lunjia Hu,
Michael P. Kim,
Omer Reingold,
Udi Wieder
Abstract:
We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic t…
▽ More
We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests--based on a collection of losses and hypothesis class--a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature's true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses.
This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape.
△ Less
Submitted 8 December, 2022; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Low-Degree Multicalibration
Authors:
Parikshit Gopalan,
Michael P. Kim,
Mihir Singhal,
Shengjia Zhao
Abstract:
Introduced as a notion of algorithmic fairness, multicalibration has proved to be a powerful and versatile concept with implications far beyond its original intent. This stringent notion -- that predictions be well-calibrated across a rich class of intersecting subpopulations -- provides its strong guarantees at a cost: the computational and sample complexity of learning multicalibrated predictors…
▽ More
Introduced as a notion of algorithmic fairness, multicalibration has proved to be a powerful and versatile concept with implications far beyond its original intent. This stringent notion -- that predictions be well-calibrated across a rich class of intersecting subpopulations -- provides its strong guarantees at a cost: the computational and sample complexity of learning multicalibrated predictors are high, and grow exponentially with the number of class labels. In contrast, the relaxed notion of multiaccuracy can be achieved more efficiently, yet many of the most desirable properties of multicalibration cannot be guaranteed assuming multiaccuracy alone. This tension raises a key question: Can we learn predictors with multicalibration-style guarantees at a cost commensurate with multiaccuracy?
In this work, we define and initiate the study of Low-Degree Multicalibration. Low-Degree Multicalibration defines a hierarchy of increasingly-powerful multi-group fairness notions that spans multiaccuracy and the original formulation of multicalibration at the extremes. Our main technical contribution demonstrates that key properties of multicalibration, related to fairness and accuracy, actually manifest as low-degree properties. Importantly, we show that low-degree multicalibration can be significantly more efficient than full multicalibration. In the multi-class setting, the sample complexity to achieve low-degree multicalibration improves exponentially (in the number of classes) over full multicalibration. Our work presents compelling evidence that low-degree multicalibration represents a sweet spot, pairing computational and sample efficiency with strong fairness and accuracy guarantees.
△ Less
Submitted 16 June, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
KL Divergence Estimation with Multi-group Attribution
Authors:
Parikshit Gopalan,
Nina Narodytska,
Omer Reingold,
Vatsal Sharan,
Udi Wieder
Abstract:
Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) f…
▽ More
Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlap** subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Omnipredictors
Authors:
Parikshit Gopalan,
Adam Tauman Kalai,
Omer Reingold,
Vatsal Sharan,
Udi Wieder
Abstract:
Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be kn…
▽ More
Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be known at the time of learning, requiring the algorithm to use a best-guess loss function. We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action.
We introduce the notion of an (${\mathcal{L}},\mathcal{C}$)-omnipredictor, which could be used to optimize any loss in a family ${\mathcal{L}}$. Once the loss function is set, the outputs of the predictor can be post-processed (a simple univariate data-independent transformation of individual predictions) to do well compared with any hypothesis from the class $\mathcal{C}$. The post processing is essentially what one would perform if the outputs of the predictor were true probabilities of the uncertain events. In a sense, omnipredictors extract all the predictive power from the class $\mathcal{C}$, irrespective of the loss function in $\mathcal{L}$.
We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration, a notion introduced in the context of algorithmic fairness. In addition, we show how multicalibration can be viewed as a solution concept for agnostic boosting, shedding new light on past results. Finally, we transfer our insights back to the context of algorithmic fairness by providing omnipredictors for multi-group loss minimization.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Multicalibrated Partitions for Importance Weights
Authors:
Parikshit Gopalan,
Omer Reingold,
Vatsal Sharan,
Udi Wieder
Abstract:
The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergen…
▽ More
The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where $R$ and $P$ are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about $R$ and $P$ or on the importance weights themselves.
In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets $C \in \mathcal{C}$, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets $C \in \mathcal{C}$. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
The anisotropic quasi-static permittivity of single-crystal beta-Ga2O3
Authors:
Prashanth Gopalan,
Sean Knight,
Ashish Chanana,
Megan Stokey,
Praneeth Ranga,
Michael A. Scarpulla,
Sriram Krishnamoorthy,
V. Darakchieva,
Zbigniew Galazka,
Klaus Irmscher,
Andreas Fiedler,
Steve Blair,
Mathias Schubert,
Berardi Sensale-Rodriguez
Abstract:
The quasi-static anisotropic permittivity parameters of electrically insulating gallium oxide (beta-Ga2O3) were determined by terahertz spectroscopy. Polarization-resolved frequency domain spectroscopy in the spectral range from 200 GHz to 1 THz was carried out on bulk crystals along different orientations. Principal directions for permittivity were determined along crystallographic axes c, and b,…
▽ More
The quasi-static anisotropic permittivity parameters of electrically insulating gallium oxide (beta-Ga2O3) were determined by terahertz spectroscopy. Polarization-resolved frequency domain spectroscopy in the spectral range from 200 GHz to 1 THz was carried out on bulk crystals along different orientations. Principal directions for permittivity were determined along crystallographic axes c, and b, and reciprocal lattice direction a*. No significant frequency dispersion in the real part of dielectric permittivity was observed in the measured spectral range. Our results are in excellent agreement with recent radio-frequency capacitance measurements as well as with extrapolations from recent infrared measurements of phonon mode and high frequency contributions, and close the knowledge gap for these parameters in the terahertz spectral range. Our results are important for applications of beta-Ga2O3 in high-frequency electronic devices
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Overlook: Differentially Private Exploratory Visualization for Big Data
Authors:
Pratiksha Thaker,
Mihai Budiu,
Parikshit Gopalan,
Udi Wieder,
Matei Zaharia
Abstract:
Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a s…
▽ More
Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store.
We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. The key idea in Overlook is a virtual synopsis that can be evaluated incrementally, without extra space storage or expensive precomputation. Overlook simply executes queries using an existing engine, such as a SQL DBMS, and adds noise to their results. Because Overlook's synopses do not require costly precomputation or storage, data curators can also use Overlook to explore the impact of privacy parameters interactively. Overlook offers a rich visual query interface based on the open source Hillview system. Overlook achieves accuracy comparable to existing synopsis-based systems, while offering better performance and removing the need for extra storage.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Scalable near-infrared graphene plasmonic resonators exhibiting strong non-local and electron quantization effects
Authors:
Joel. F. Siegel,
Jonathan H. Dwyer,
Anjali Suresh,
Nathaniel S. Safron,
Margaret Fortman,
Chenghao Wan,
Jonathan W. Choi,
Wei Wei,
Vivek Saraswat,
Wyatt A. Behn,
Mikhail A. Kats,
Michael S. Arnold,
Padma Gopalan,
Victor W. Brar
Abstract:
Graphene plasmonic resonators have been broadly studied in the terahertz and mid-infrared ranges because of their electrical tunability and large confinement factors which can enable dramatic enhancement of light-matter coupling. In this work, we demonstrate that the characteristic scaling laws of graphene plasmons change for smaller (< 40 nm) plasmonic wavelengths, expanding the operational frequ…
▽ More
Graphene plasmonic resonators have been broadly studied in the terahertz and mid-infrared ranges because of their electrical tunability and large confinement factors which can enable dramatic enhancement of light-matter coupling. In this work, we demonstrate that the characteristic scaling laws of graphene plasmons change for smaller (< 40 nm) plasmonic wavelengths, expanding the operational frequencies of graphene plasmonic resonators into the near-infrared (NIR) and modifying their optical confinement properties. We utilize a novel bottom-up block copolymer lithography method that substantially improves upon top-down methods to create resonators as narrow as 12 nm over centimeter-scale areas. Measurements of these structures reveal that their plasmonic resonances are strongly influenced by non-local and quantum effects, which push their resonant frequency into the NIR (2.2 um), almost double the frequency of previous experimental works. The confinement factors of these resonators, meanwhile, reach 137 +/- 25, amongst the largest reported in literature for an optical cavity. While our findings indicate that the enhancement of some 'forbidden' transitions are an order of magnitude weaker than predicted, the combined NIR response and large confinement of these structures make them an attractive platform to explore ultra-strongly enhanced spontaneous emission.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
PIDForest: Anomaly Detection via Partial Identification
Authors:
Parikshit Gopalan,
Vatsal Sharan,
Udi Wieder
Abstract:
We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum densit…
▽ More
We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Finding Skewed Subcubes Under a Distribution
Authors:
Parikshit Gopalan,
Roie Levin,
Udi Wieder
Abstract:
Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcu…
▽ More
Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in $ψ$. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection.
In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality.
On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to $ψ$ where finding skewed subcubes might be easier.
△ Less
Submitted 12 November, 2020; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Hillview: A trillion-cell spreadsheet for big data
Authors:
Mihai Budiu,
Parikshit Gopalan,
Lalith Suresh,
Udi Wieder,
Han Kruiger,
Marcos K. Aguilera
Abstract:
Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketche…
▽ More
Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Manifestation of kinetic-inductance in spectrally-narrow terahertz plasmon resonances in thin-film Cd3As2
Authors:
Ashish Chanana,
Neda Loftizadeh,
Hugo O. Condori Quispe,
Prashanth Gopalan,
Joshua R. Winger,
Steve Blair,
Ajay Nahata,
Vikram Deshpande,
Michael A. Scarpulla,
Berardi Sensale-Rodriguez
Abstract:
Three-dimensional (3D) semimetals have been predicted and demonstrated to have a wide variety of interesting properties associated with its linear energy dispersion. In analogy to two-dimensional (2D) Dirac semimetals, such as graphene, Cd3As2, a 3D semimetal, has shown ultra-high mobility, large Fermi velocity, and has been hypothesized to support plasmons at terahertz frequencies. In this work,…
▽ More
Three-dimensional (3D) semimetals have been predicted and demonstrated to have a wide variety of interesting properties associated with its linear energy dispersion. In analogy to two-dimensional (2D) Dirac semimetals, such as graphene, Cd3As2, a 3D semimetal, has shown ultra-high mobility, large Fermi velocity, and has been hypothesized to support plasmons at terahertz frequencies. In this work, we demonstrate synthesis of high-quality large-area Cd3As2 thin-films through thermal evaporation as well as the experimental realization of plasmonic structures consisting of periodic arrays of Cd3As2 stripes. These arrays exhibit sharp resonances at terahertz frequencies with associated quality-factors (Q) as high as ~ 3.7. Such spectrally-narrow resonances can be understood on the basis of a large kinetic-inductance, resulting from a long momentum scattering time, which in our films can approach ~1 ps at room-temperature. Moreover, we demonstrate an ultrafast tunable response through excitation of photo-induced carriers in optical pump / terahertz probe experiments. Our results evidence that the intrinsic 3D nature of Cd3As2 provides for a very robust platform for terahertz plasmonic applications. Overall, our observations pave a way for the development of myriad terahertz (opto) electronic devices based on Cd3As2 and other 3D Dirac semimetals, benefiting from strong coupling of terahertz radiation, ultrafast transient response, magneto-plasmon properties, etc. Moreover, the long momentum scattering time, thus large kinetic inductance in Cd3As2, also holds enormous potential for the re-design of passive elements such as inductors and hence can have a profound impact in the field of RF integrated circuits.
△ Less
Submitted 10 November, 2018;
originally announced November 2018.
-
Efficient Anomaly Detection via Matrix Sketching
Authors:
Vatsal Sharan,
Parikshit Gopalan,
Udi Wieder
Abstract:
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results sho…
▽ More
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results showing that \emph{any} sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation inequalities for operators arising in the computation of these measures.
△ Less
Submitted 27 November, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
Stable and Consistent Membership at Scale with Rapid
Authors:
Lalith Suresh,
Dahlia Malkhi,
Parikshit Gopalan,
Ivan Porto Carreiro,
Zeeshan Lokhandwala
Abstract:
We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each…
▽ More
We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each process individually. Implementing these insights translates into a simple membership algorithm with low communication overhead.
We present evidence that our strategy suffices to drive unanimous detection almost-everywhere, even when complex network conditions arise, such as one-way reachability problems, firewall misconfigurations, and high packet loss. Furthermore, we present both empirical evidence and analyses that proves that the almost-everywhere detection happens with high probability. To complete the design, Rapid contains a leaderless consensus protocol that converts multi-process cut detections into a view-change decision. The resulting membership service works both in fully decentralized as well as logically centralized modes.
We present an evaluation of Rapid in moderately scalable cloud settings. Rapid bootstraps 2000 node clusters 2-5.8x faster than prevailing tools such as Memberlist and ZooKeeper, remains stable in face of complex failure scenarios, and provides strong consistency guarantees. It is easy to integrate Rapid into existing distributed applications, of which we demonstrate two.
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
Tuning Ising superconductivity with layer and spin-orbit coupling in two-dimensional transition-metal dichalcogenides
Authors:
Sergio C. de la Barrera,
Michael R. Sinko,
Devashish P. Gopalan,
Nikhil Sivadas,
Kyle L. Seyler,
Kenji Watanabe,
Takashi Taniguchi,
Adam W. Tsen,
Xiaodong Xu,
Di Xiao,
Benjamin M. Hunt
Abstract:
Systems that simultaneously exhibit superconductivity and spin-orbit coupling are predicted to provide a route toward topological superconductivity and unconventional electron pairing, driving significant contemporary interest in these materials. Monolayer transition-metal dichalcogenide (TMD) superconductors in particular lack inversion symmetry, enforcing a spin-triplet component of the supercon…
▽ More
Systems that simultaneously exhibit superconductivity and spin-orbit coupling are predicted to provide a route toward topological superconductivity and unconventional electron pairing, driving significant contemporary interest in these materials. Monolayer transition-metal dichalcogenide (TMD) superconductors in particular lack inversion symmetry, enforcing a spin-triplet component of the superconducting wavefunction that increases with the strength of spin-orbit coupling. In this work, we present an experimental and theoretical study of two intrinsic TMD superconductors with large spin-orbit coupling in the atomic layer limit, metallic 2H-TaS$_2$ and 2H-NbSe$_2$. For the first time in TaS$_2$, we investigate the superconducting properties as the material is reduced to a monolayer and show that high-field measurements point to the largest upper critical field thus reported for an intrinsic TMD superconductor. In few-layer samples, we find that the enhancement of the upper critical field is sustained by the dominance of spin-orbit coupling over weak interlayer coupling, providing additional platforms for unconventional superconducting states in two dimensions.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
The Implications of Grain Size Variation in Magnetic Field Alignment of Block Copolymer Blends
Authors:
Yekaterina Rokhlenko,
Paweł W. Majewski,
Steven R. Larson,
Padma Gopalan,
Kevin G. Yager,
Chinedum O. Osuji
Abstract:
Recent experiments have highlighted the intrinsic magnetic anisotropy in coil-coil diblock copolymers, specifically in poly(styrene-b-4-vinylpyridine) (PS-b-P4VP), that enables magnetic field alignment at field strengths of a few tesla. We consider here the alignment response of two low molecular weight (MW) lamallae-forming PS-b-P4VP systems. Cooling across the disorder-order transition temperatu…
▽ More
Recent experiments have highlighted the intrinsic magnetic anisotropy in coil-coil diblock copolymers, specifically in poly(styrene-b-4-vinylpyridine) (PS-b-P4VP), that enables magnetic field alignment at field strengths of a few tesla. We consider here the alignment response of two low molecular weight (MW) lamallae-forming PS-b-P4VP systems. Cooling across the disorder-order transition temperature ($\mathrm{T_{odt}}$) results in strong alignment for the higher MW sample (5.5K), whereas little alignment is discernible for the lower MW system (3.6K). This disparity under otherwise identical conditions of field strength and cooling rate suggests that different average grain sizes are produced during slow cooling of these materials, with larger grains formed in the higher MW material. Blending the block copolymers results in homogeneous samples which display $\mathrm{T_{odt}}$, d-spacings and grain sizes that are intermediate between the two neat diblocks. Similarly, the alignment quality displays a smooth variation with the concentration of the higher MW diblock in the blends and the size of grains likewise interpolates between limits set by the neat diblocks, with a factor of 3.5X difference in the grain size observed in high vs low MW neat diblocks. These results highlight the importance of grain growth kinetics in dictating the field response in block copolymers and suggests an unconventional route for the manipulation of such kinetics.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Maximally Recoverable Codes for Grid-like Topologies
Authors:
Parikshit Gopalan,
Guangda Hu,
Swastik Kopparty,
Shubhangi Saraf,
Carol Wang,
Sergey Yekhanin
Abstract:
The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we view as the main coding theoretic barrier to deploying longer codes in storage: at large lengths, failures are not independent and correlated failures are inev…
▽ More
The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we view as the main coding theoretic barrier to deploying longer codes in storage: at large lengths, failures are not independent and correlated failures are inevitable. This motivates designing codes that allow quick data recovery even after large correlated failures, and which have efficient encoding and decoding. We propose that code design for distributed storage be viewed as a two-step process. The first step is choose a topology of the code, which incorporates knowledge about the correlated failures that need to be handled, and ensures local recovery from such failures. In the second step one specifies a code with the chosen topology by choosing coefficients from a finite field. In this step, one tries to balance reliability (which is better over larger fields) with encoding and decoding efficiency (which is better over smaller fields). This work initiates an in-depth study of this reliability/efficiency tradeoff. We consider the field-size needed for achieving maximal recoverability: the strongest reliability possible with a given topology. We propose a family of topologies called grid-like topologies which unify a number of topologies considered both in theory and practice, and prove a collection of results about maximally recoverable codes in such topologies including the first super-polynomial lower bound on the field size.
△ Less
Submitted 20 September, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.
-
Degree and Sensitivity: tails of two distributions
Authors:
Parikshit Gopalan,
Rocco Servedio,
Avishay Tal,
Avi Wigderson
Abstract:
The sensitivity of a Boolean function f is the maximum over all inputs x, of the number of sensitive coordinates of x. The well-known sensitivity conjecture of Nisan (see also Nisan and Szegedy) states that every sensitivity-s Boolean function can be computed by a polynomial over the reals of degree poly(s). The best known upper bounds on degree, however, are exponential rather than polynomial in…
▽ More
The sensitivity of a Boolean function f is the maximum over all inputs x, of the number of sensitive coordinates of x. The well-known sensitivity conjecture of Nisan (see also Nisan and Szegedy) states that every sensitivity-s Boolean function can be computed by a polynomial over the reals of degree poly(s). The best known upper bounds on degree, however, are exponential rather than polynomial in s.
Our main result is an approximate version of the conjecture: every Boolean function with sensitivity s can be epsilon-approximated (in L_2) by a polynomial whose degree is O(s log(1/epsilon)). This is the first improvement on the folklore bound of s/epsilon. Further, we show that improving the bound to O(s^c log(1/epsilon)^d)$ for any d < 1 and any c > 0 will imply the sensitivity conjecture. Thus our result is essentially the best one can hope for without proving the conjecture.
We postulate a robust analogue of the sensitivity conjecture: if most inputs to a Boolean function f have low sensitivity, then most of the Fourier mass of f is concentrated on small subsets, and present an approach towards proving this conjecture.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
Formation of hexagonal Boron Nitride on Graphene-covered Copper Surfaces
Authors:
Devashish P. Gopalan,
Patrick C. Mende,
Sergio C. de la Barrera,
Shonali Dhingra,
Jun Li,
Kehao Zhang,
Nicholas A. Simonson,
Joshua A. Robinson,
Ning Lu,
Qingxiao Wang,
Moon J. Kim,
Brian D'Urso,
Randall M. Feenstra
Abstract:
Graphene-covered copper surfaces have been exposed to borazine, (BH)3(NH)3, with the resulting surfaces characterized by low-energy electron microscopy. Although the intent of the experiment was to form hexagonal boron nitride (h-BN) on top of the graphene, such layers were not obtained. Rather, in isolated surface areas, h-BN is found to form micrometer-size islands that substitute for the graphe…
▽ More
Graphene-covered copper surfaces have been exposed to borazine, (BH)3(NH)3, with the resulting surfaces characterized by low-energy electron microscopy. Although the intent of the experiment was to form hexagonal boron nitride (h-BN) on top of the graphene, such layers were not obtained. Rather, in isolated surface areas, h-BN is found to form micrometer-size islands that substitute for the graphene. Additionally, over nearly the entire surface, the properties of the layer that was originally graphene is observed to change in a manner that is consistent with the formation of a mixed h-BN/graphene alloy, i.e. h-BNC alloy. Furthermore, following the deposition of the borazine, a small fraction of the surface is found to consist of bare copper, indicating etching of the overlying graphene. The inability to form h-BN layers on top of graphene is discussed in terms of the catalytic behavior of the underlying copper surface and the decomposition of the borazine on top of the graphene.
△ Less
Submitted 10 February, 2016; v1 submitted 15 September, 2015;
originally announced September 2015.
-
Magnetic alignment of block copolymer microdomains by intrinsic chain anisotropy
Authors:
Yekaterina Rokhlenko,
Kai Zhang,
Manesh Gopinadhan,
Steve R. Larson,
Pawel W. Majewski,
Kevin G. Yager,
Padma Gopalan,
Corey S. O'Hern,
Chinedum O. Osuji
Abstract:
We examine the role of intrinsic chain susceptibility anisotropy in magnetic field directed self-assembly of a block copolymer using \textit{in situ} X-ray scattering. Alignment of a lamellar mesophase is observed on cooling across the disorder-order transition with the resulting orientational order inversely proportional to the cooling rate. We discuss the origin of the susceptibility anisotropy,…
▽ More
We examine the role of intrinsic chain susceptibility anisotropy in magnetic field directed self-assembly of a block copolymer using \textit{in situ} X-ray scattering. Alignment of a lamellar mesophase is observed on cooling across the disorder-order transition with the resulting orientational order inversely proportional to the cooling rate. We discuss the origin of the susceptibility anisotropy, $Δχ$, that drives alignment, and calculate its magnitude using coarse-grained molecular dynamics to sample conformations of surface-tethered chains, finding $Δχ\approx 2\times10^{-8}$. From field-dependent scattering data we estimate grains of $\approx1.2$ $μ$m are present during alignment. These results demonstrate that intrinsic anisotropy is sufficient to support strong field-induced mesophase alignment and suggest a versatile strategy for field control of orientational order in block copolymers.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions
Authors:
Parikshit Gopalan,
Noam Nisan,
Rocco A. Servedio,
Kunal Talwar,
Avi Wigderson
Abstract:
A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. While this conjecture implies that smoot…
▽ More
A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. While this conjecture implies that smooth functions are easy to compute in the simplest computational model, to date no non-trivial upper bounds were known for such functions in any computational model, including unrestricted Boolean circuits. Even a bound on the description length of such functions better than the trivial $2^n$ does not seem to have been known.
In this work, we establish the first computational upper bounds on smooth Boolean functions:
1) We show that every sensitivity s function is uniquely specified by its values on a Hamming ball of radius 2s. We use this to show that such functions can be computed by circuits of size $n^{O(s)}$.
2) We show that sensitivity s functions satisfy a strong pointwise noise-stability guarantee for random noise of rate O(1/s). We use this to show that these functions have formulas of depth O(s log n).
3) We show that sensitivity s functions can be (locally) self-corrected from worst-case noise of rate $\exp(-O(s))$.
All our results are simple, and follow rather directly from (variants of) the basic fact that that the function value at few points in small neighborhoods of a given point determine its function value via a majority vote. Our results confirm various consequences of the conjecture. They may be viewed as providing a new form of evidence towards its validity, as well as new directions towards attacking it.
△ Less
Submitted 10 August, 2015;
originally announced August 2015.
-
Pseudorandomness via the discrete Fourier transform
Authors:
Parikshit Gopalan,
Daniel Kane,
Raghu Meka
Abstract:
We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error…
▽ More
We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in logspace that have been considered in the literature, including halfspaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL - a central question in complexity theory.
Our construction combines ideas from a large body of prior work, ranging from a classical construction of [NN93] to the recent gradually increasing independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some novel analytic machinery which might find other applications.
△ Less
Submitted 18 November, 2015; v1 submitted 14 June, 2015;
originally announced June 2015.
-
Public projects, Boolean functions and the borders of Border's theorem
Authors:
Parikshit Gopalan,
Noam Nisan,
Tim Roughgarden
Abstract:
Border's theorem gives an intuitive linear characterization of the feasible interim allocation rules of a Bayesian single-item environment, and it has several applications in economic and algorithmic mechanism design. All known generalizations of Border's theorem either restrict attention to relatively simple settings, or resort to approximation. This paper identifies a complexity-theoretic barrie…
▽ More
Border's theorem gives an intuitive linear characterization of the feasible interim allocation rules of a Bayesian single-item environment, and it has several applications in economic and algorithmic mechanism design. All known generalizations of Border's theorem either restrict attention to relatively simple settings, or resort to approximation. This paper identifies a complexity-theoretic barrier that indicates, assuming standard complexity class separations, that Border's theorem cannot be extended significantly beyond the state-of-the-art. We also identify a surprisingly tight connection between Myerson's optimal auction theory, when applied to public project settings, and some fundamental results in the analysis of Boolean functions.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Pseudorandomness for concentration bounds and signed majorities
Authors:
Parikshit Gopalan,
Daniel Kane,
Raghu Meka
Abstract:
The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL.…
▽ More
The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL. In this work we make progress towards this goal by obtaining near-optimal generators for two important special cases:
1) We give a near optimal derandomization of the Chernoff bound for independent, uniformly random bits. Specifically, we show how to generate a x in {1,-1}^n using $\tilde{O}(\log (n/ε))$ random bits such that for any unit vector u, <u,x> matches the sub-Gaussian tail behaviour predicted by the Chernoff bound up to error eps.
2) We construct a generator which fools halfspaces with {0,1,-1} coefficients with error eps with a seed-length of $\tilde{O}(\log(n/ε))$. This includes the important special case of majorities.
In both cases, the best previous results required seed-length of $O(\log n + \log^2(1/ε))$.
Technically, our work combines new Fourier-analytic tools with the iterative dimension reduction techniques and the gradually increasing independence paradigm of previous works (KaneMN11, CelisRSW13, GopalanMRTV12).
△ Less
Submitted 17 November, 2014;
originally announced November 2014.
-
Inequalities and tail bounds for elementary symmetric polynomial with applications
Authors:
Parikshit Gopalan,
Amir Yehudayoff
Abstract:
We study the extent of independence needed to approximate the product of bounded random variables in expectation, a natural question that has applications in pseudorandomness and min-wise independent hashing.
For random variables whose absolute value is bounded by $1$, we give an error bound of the form $σ^{Ω(k)}$ where $k$ is the amount of independence and $σ^2$ is the total variance of the sum…
▽ More
We study the extent of independence needed to approximate the product of bounded random variables in expectation, a natural question that has applications in pseudorandomness and min-wise independent hashing.
For random variables whose absolute value is bounded by $1$, we give an error bound of the form $σ^{Ω(k)}$ where $k$ is the amount of independence and $σ^2$ is the total variance of the sum. Previously known bounds only applied in more restricted settings, and were quanitively weaker. We use this to give a simpler and more modular analysis of a construction of min-wise independent hash functions and pseudorandom generators for combinatorial rectangles due to Gopalan et al., which also slightly improves their seed-length.
Our proof relies on a new analytic inequality for the elementary symmetric polynomials $S_k(x)$ for $x \in \mathbb{R}^n$ which we believe to be of independent interest. We show that if $|S_k(x)|,|S_{k+1}(x)|$ are small relative to $|S_{k-1}(x)|$ for some $k>0$ then $|S_\ell(x)|$ is also small for all $\ell > k$. From these, we derive tail bounds for the elementary symmetric polynomials when the inputs are only $k$-wise independent.
△ Less
Submitted 10 August, 2015; v1 submitted 14 February, 2014;
originally announced February 2014.
-
Spectroscopic Properties of Nanotube-Chromophore Hybrids
Authors:
Changshui Huang,
Randy K. Wang,
Bryan M. Wong,
David J. McGee,
François Léonard,
Yun Jun Kim,
Kirsten F. Johnson,
Michael S. Arnold,
Mark A. Eriksson,
Padma Gopalan
Abstract:
Recently, individual single-walled carbon nanotubes (SWNTs) functionalized with azo-benzene chromophores were shown to form a new class of hybrid nanomaterials for optoelectronics applications. Here we use a number of experimental techniques and theory to understand the binding, orientation, and nature of coupling between chromophores and the nanotubes, all of which are of relevance to future opti…
▽ More
Recently, individual single-walled carbon nanotubes (SWNTs) functionalized with azo-benzene chromophores were shown to form a new class of hybrid nanomaterials for optoelectronics applications. Here we use a number of experimental techniques and theory to understand the binding, orientation, and nature of coupling between chromophores and the nanotubes, all of which are of relevance to future optimization of these hybrid materials. We find that the binding energy between chromophores and nanotubes depends strongly on the type of tether that is used to bind the chromophores to the nanotubes, with pyrene tethers resulting in more than 90% of the bound chromophores during processing. DFT calculations show that the binding energy of the chromophores to the nanotubes is maximized for chromophores parallel to the nanotube sidewall, even with the use of tethers; second harmonic generation shows that there is nonetheless a partial radial orientation of the chromophores on the nanotubes. We find weak electronic coupling between the chromophores and the SWNTs, consistent with non-covalent binding. The chromophore-nanotube coupling, while weak, is sufficient to quench the chromophore fluorescence. Stern-Volmer plots are non-linear, which supports a combination of static and dynamic quenching processes. The chromophore orientation is an important variable for chromophore-nanotube phototransistors, and our experiments suggest the possibility for further optimizing this orientational degree of freedom.
△ Less
Submitted 21 December, 2013;
originally announced December 2013.
-
Functionalization of Single-Wall Carbon Nanotubes with Chromophores of Opposite Internal Dipole Orientation
Authors:
Yuanchun Zhao,
Changshui Huang,
Myungwoong Kim,
Bryan M. Wong,
François Léonard,
Padma Gopalan,
Mark A. Eriksson
Abstract:
We report the functionalization of carbon nanotubes with two azobenzene-based chromophores with large internal dipole moments and opposite dipole orientations. The molecules are attached to the nanotubes non-covalently via a pyrene tether. A combination of characterization techniques shows uniform molecular coverage on the nanotubes, with minimal aggregation of excess chromophores on the substrate…
▽ More
We report the functionalization of carbon nanotubes with two azobenzene-based chromophores with large internal dipole moments and opposite dipole orientations. The molecules are attached to the nanotubes non-covalently via a pyrene tether. A combination of characterization techniques shows uniform molecular coverage on the nanotubes, with minimal aggregation of excess chromophores on the substrate. The large on/off ratios and the sub-threshold swings of the nanotube-based field-effect transistors (FETs) are preserved after functionalization, and different shifts in threshold voltage are observed for each chromophore. Ab initio calculations verify the properties of the synthesized chromophores and indicate very small charge transfer, confirming a strong, non-covalent functionalization.
△ Less
Submitted 21 December, 2013;
originally announced December 2013.
-
Scalable Recommendation with Poisson Factorization
Authors:
Prem Gopalan,
Jake M. Hofman,
David M. Blei
Abstract:
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factori…
▽ More
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user's limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large real-world user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms state-of-the-art matrix factorization methods.
△ Less
Submitted 20 May, 2014; v1 submitted 7 November, 2013;
originally announced November 2013.
-
Locally Testable Codes and Cayley Graphs
Authors:
Parikshit Gopalan,
Salil Vadhan,
Yuan Zhou
Abstract:
We give two new characterizations of ($\F_2$-linear) locally testable error-correcting codes in terms of Cayley graphs over $\F_2^h$:
\begin{enumerate} \item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ whose set of generators is significantly larger than $h$ and has no short linear dependencies, but yields a shortest-path metric that embeds into $\ell_1$ with constant d…
▽ More
We give two new characterizations of ($\F_2$-linear) locally testable error-correcting codes in terms of Cayley graphs over $\F_2^h$:
\begin{enumerate} \item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ whose set of generators is significantly larger than $h$ and has no short linear dependencies, but yields a shortest-path metric that embeds into $\ell_1$ with constant distortion. This extends and gives a converse to a result of Khot and Naor (2006), which showed that codes with large dual distance imply Cayley graphs that have no low-distortion embeddings into $\ell_1$.
\item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ that has significantly more than $h$ eigenvalues near 1, which have no short linear dependencies among them and which "explain" all of the large eigenvalues. This extends and gives a converse to a recent construction of Barak et al. (2012), which showed that locally testable codes imply Cayley graphs that are small-set expanders but have many large eigenvalues. \end{enumerate}
△ Less
Submitted 23 August, 2013;
originally announced August 2013.
-
Explicit Maximally Recoverable Codes with Locality
Authors:
Parikshit Gopalan,
Cheng Huang,
Bob Jenkins,
Sergey Yekhanin
Abstract:
Consider a systematic linear code where some (local) parity symbols depend on few prescribed symbols, while other (heavy) parity symbols may depend on all data symbols. Local parities allow to quickly recover any single symbol when it is erased, while heavy parities provide tolerance to a large number of simultaneous erasures. A code as above is maximally-recoverable if it corrects all erasure pat…
▽ More
Consider a systematic linear code where some (local) parity symbols depend on few prescribed symbols, while other (heavy) parity symbols may depend on all data symbols. Local parities allow to quickly recover any single symbol when it is erased, while heavy parities provide tolerance to a large number of simultaneous erasures. A code as above is maximally-recoverable if it corrects all erasure patterns which are information theoretically recoverable given the code topology. In this paper we present explicit families of maximally-recoverable codes with locality. We also initiate the study of the trade-off between maximal recoverability and alphabet size.
△ Less
Submitted 19 July, 2013; v1 submitted 15 July, 2013;
originally announced July 2013.
-
Better Pseudorandom Generators from Milder Pseudorandom Restrictions
Authors:
Parikshit Gopalan,
Raghu Meka,
Omer Reingold,
Luca Trevisan,
Salil Vadhan
Abstract:
We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-len…
▽ More
We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-length O(log (n/epsilon)) for error epsilon. Previously, only constructions with seed-length O(\log^{3/2} n) or O(\log^2 n) were known for these classes with polynomially small error.
The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces.
△ Less
Submitted 28 September, 2012;
originally announced October 2012.
-
Making the long code shorter, with applications to the Unique Games Conjecture
Authors:
Boaz Barak,
Parikshit Gopalan,
Johan Hastad,
Raghu Meka,
Prasad Raghavendra,
David Steurer
Abstract:
The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following:
1. For any eps > 0, we show the existence of an n…
▽ More
The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following:
1. For any eps > 0, we show the existence of an n vertex graph G where every set of o(n) vertices has expansion 1 - eps, but G's adjacency matrix has more than exp(log^delta n) eigenvalues larger than 1 - eps, where delta depends only on eps. This answers an open question of Arora, Barak and Steurer (FOCS 2010) who asked whether one can improve over the noise graph on the Boolean hypercube that has poly(log n) such eigenvalues.
2. A gadget that reduces unique games instances with linear constraints modulo K into instances with alphabet k with a blowup of K^polylog(K), improving over the previously known gadget with blowup of 2^K.
3. An n variable integrality gap for Unique Games that that survives exp(poly(log log n)) rounds of the SDP + Sherali Adams hierarchy, improving on the previously known bound of poly(log log n).
We show a connection between the local testability of linear codes and small set expansion in certain related Cayley graphs, and use this connection to derandomize the noise graph on the Boolean hypercube.
△ Less
Submitted 2 November, 2011;
originally announced November 2011.
-
On the Locality of Codeword Symbols
Authors:
Parikshit Gopalan,
Cheng Huang,
Huseyin Simitci,
Sergey Yekhanin
Abstract:
Consider a linear [n,k,d]_q code C. We say that that i-th coordinate of C has locality r, if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper we carry out an in-depth study of the r…
▽ More
Consider a linear [n,k,d]_q code C. We say that that i-th coordinate of C has locality r, if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper we carry out an in-depth study of the relations between these parameters.
We establish a tight bound for the redundancy n-k in terms of the message length, the distance, and the locality of information coordinates. We refer to codes attaining the bound as optimal. We prove some structure theorems about optimal codes, which are particularly strong for small distances. This gives a fairly complete picture of the tradeoffs between codewords length, worst-case distance and locality of information symbols.
We then consider the locality of parity check symbols and erasure correction beyond worst case distance for optimal codes. Using our structure theorem, we obtain a tight bound for the locality of parity symbols possible in such codes for a broad class of parameter settings. We prove that there is a tradeoff between having good locality for parity checks and the ability to correct erasures beyond the minimum distance.
△ Less
Submitted 18 June, 2011;
originally announced June 2011.
-
Polynomial-Time Approximation Schemes for Knapsack and Related Counting Problems using Branching Programs
Authors:
Parikshit Gopalan,
Adam Klivans,
Raghu Meka
Abstract:
We give a deterministic, polynomial-time algorithm for approximately counting the number of {0,1}-solutions to any instance of the knapsack problem. On an instance of length n with total weight W and accuracy parameter eps, our algorithm produces a (1 + eps)-multiplicative approximation in time poly(n,log W,1/eps). We also give algorithms with identical guarantees for general integer knapsack, the…
▽ More
We give a deterministic, polynomial-time algorithm for approximately counting the number of {0,1}-solutions to any instance of the knapsack problem. On an instance of length n with total weight W and accuracy parameter eps, our algorithm produces a (1 + eps)-multiplicative approximation in time poly(n,log W,1/eps). We also give algorithms with identical guarantees for general integer knapsack, the multidimensional knapsack problem (with a constant number of constraints) and for contingency tables (with a constant number of rows). Previously, only randomized approximation schemes were known for these problems due to work by Morris and Sinclair and work by Dyer.
Our algorithms work by constructing small-width, read-once branching programs for approximating the underlying solution space under a carefully chosen distribution. As a byproduct of this approach, we obtain new query algorithms for learning functions of k halfspaces with respect to the uniform distribution on {0,1}^n. The running time of our algorithm is polynomial in the accuracy parameter eps. Previously even for the case of k=2, only algorithms with an exponential dependence on eps were known.
△ Less
Submitted 18 August, 2010;
originally announced August 2010.
-
Fooling functions of halfspaces under product distributions
Authors:
P. Gopalan,
R. O'Donnell,
Y. Wu,
D. Zuckerman
Abstract:
We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probab…
▽ More
We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilities bounded away from 0.
Our first main result shows that a recent pseudorandom generator construction of Meka and Zuckerman [MZ09], when suitably modifed, can fool arbitrary functions of d halfspaces under product distributions where each coordinate has bounded fourth moment. To eps-fool any size-s, depth-d decision tree of halfspaces, our pseudorandom generator uses seed length O((d log(ds/eps)+log n) log(ds/eps)). For monotone functions of d halfspaces, the seed length can be improved to O((d log(d/eps)+log n) log(d/eps)). We get better bounds for larger eps; for example, to 1/polylog(n)-fool all monotone functions of (log n)= log log n halfspaces, our generator requires a seed of length just O(log n). Our second main result generalizes the work of Diakonikolas et al. [DGJ+09] to show that bounded independence suffices to fool functions of halfspaces under product distributions. Assuming each coordinate satisfies a certain stronger moment condition, we show that any function computable by a size-s, depth-d decision tree of halfspaces is eps-fooled by O(d^4s^2/eps^2)-wise independence.
△ Less
Submitted 11 January, 2010;
originally announced January 2010.
-
Bounded Independence Fools Halfspaces
Authors:
Ilias Diakonikolas,
Parikshit Gopalan,
Ragesh Jaiswal,
Rocco Servedio,
Emanuele Viola
Abstract:
We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pse…
▽ More
We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pseudorandom generators G: {-1,1}^s --> {-1,1}^n that fool halfspaces. Specifically, we fool halfspaces with error eps and seed length s = k \log n = O(\log n \cdot \log^2(1/\eps) /\eps^2).
Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Computational Complexity 2007).
△ Less
Submitted 21 February, 2009;
originally announced February 2009.
-
List Decoding Tensor Products and Interleaved Codes
Authors:
Parikshit Gopalan,
Venkatesan Guruswami,
Prasad Raghavendra
Abstract:
We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. We show that for {\em every} code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinator…
▽ More
We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. We show that for {\em every} code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinatorial bounds for some natural codes including multivariate polynomials where the degree in each variable is bounded. We show that for {\em every} code, its list decoding radius remains unchanged under $m$-wise interleaving for an integer $m$. This generalizes a recent result of Dinur et al \cite{DGKS}, who proved such a result for interleaved Hadamard codes (equivalently, linear transformations). Using the notion of generalized Hamming weights, we give better list size bounds for {\em both} tensoring and interleaving of binary linear codes. By analyzing the weight distribution of these codes, we reduce the task of bounding the list size to bounding the number of close-by low-rank codewords. For decoding linear transformations, using rank-reduction together with other ideas, we obtain list size bounds that are tight over small fields.
△ Less
Submitted 26 November, 2008;
originally announced November 2008.
-
Optically Modulated Conduction in Chromophore-Functionalized Single-Wall Carbon Nanotubes
Authors:
J. M. Simmons,
I. In,
V. E. Campbell,
T. J. Mark,
F. Leonard,
P. Gopalan,
M. A. Eriksson
Abstract:
We demonstrate an optically active nanotube-hybrid material by functionalizing single-wall nanotubes with an azo-based chromophore. Upon UV illumination, the conjugated chromophore undergoes a cis-trans isomerization leading to a charge redistribution near the nanotube. This charge redistribution changes the local electrostatic environment, shifting the threshold voltage and increasing the condu…
▽ More
We demonstrate an optically active nanotube-hybrid material by functionalizing single-wall nanotubes with an azo-based chromophore. Upon UV illumination, the conjugated chromophore undergoes a cis-trans isomerization leading to a charge redistribution near the nanotube. This charge redistribution changes the local electrostatic environment, shifting the threshold voltage and increasing the conductivity of the nanotube transistor. For a ~1-2% coverage, we measure a shift in the threshold voltage of up to 1.2 V. Further, the conductance change is reversible and repeatable over long periods of time, indicating that the chromophore functionalized nanotubes are useful for integrated nano-photodetectors.
△ Less
Submitted 2 March, 2007;
originally announced March 2007.
-
Polynomials that Sign Represent Parity and Descartes' Rule of Signs
Authors:
Saugata Basu,
Nayantara Bhatnagar,
Parikshit Gopalan,
Richard J. Lipton
Abstract:
A real polynomial $P(X_1,..., X_n)$ sign represents $f: A^n \to \{0,1\}$ if for every $(a_1, ..., a_n) \in A^n$, the sign of $P(a_1,...,a_n)$ equals $(-1)^{f(a_1,...,a_n)}$. Such sign representations are well-studied in computer science and have applications to computational complexity and computational learning theory. In this work, we present a systematic study of tradeoffs between degree and…
▽ More
A real polynomial $P(X_1,..., X_n)$ sign represents $f: A^n \to \{0,1\}$ if for every $(a_1, ..., a_n) \in A^n$, the sign of $P(a_1,...,a_n)$ equals $(-1)^{f(a_1,...,a_n)}$. Such sign representations are well-studied in computer science and have applications to computational complexity and computational learning theory. In this work, we present a systematic study of tradeoffs between degree and sparsity of sign representations through the lens of the parity function. We attempt to prove bounds that hold for any choice of set $A$. We show that sign representing parity over $\{0,...,m-1\}^n$ with the degree in each variable at most $m-1$ requires sparsity at least $m^n$. We show that a tradeoff exists between sparsity and degree, by exhibiting a sign representation that has higher degree but lower sparsity. We show a lower bound of $n(m -2) + 1$ on the sparsity of polynomials of any degree representing parity over $\{0,..., m-1\}^n$. We prove exact bounds on the sparsity of such polynomials for any two element subset $A$. The main tool used is Descartes' Rule of Signs, a classical result in algebra, relating the sparsity of a polynomial to its number of real roots. As an application, we use bounds on sparsity to derive circuit lower bounds for depth-two AND-OR-NOT circuits with a Threshold Gate at the top. We use this to give a simple proof that such circuits need size $1.5^n$ to compute parity, which improves the previous bound of ${4/3}^{n/2}$ due to Goldmann (1997). We show a tight lower bound of $2^n$ for the inner product function over $\{0,1\}^n \times \{0, 1\}^n$.
△ Less
Submitted 26 February, 2007;
originally announced February 2007.
-
The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies
Authors:
Parikshit Gopalan,
Phokion G. Kolaitis,
Elitza Maneva,
Christos H. Papadimitriou
Abstract:
Boolean satisfiability problems are an important benchmark for questions about complexity, algorithms, heuristics and threshold phenomena. Recent work on heuristics, and the satisfiability threshold has centered around the structure and connectivity of the solution space. Motivated by this work, we study structural and connectivity-related properties of the space of solutions of Boolean satisfia…
▽ More
Boolean satisfiability problems are an important benchmark for questions about complexity, algorithms, heuristics and threshold phenomena. Recent work on heuristics, and the satisfiability threshold has centered around the structure and connectivity of the solution space. Motivated by this work, we study structural and connectivity-related properties of the space of solutions of Boolean satisfiability problems and establish various dichotomies in Schaefer's framework.
On the structural side, we obtain dichotomies for the kinds of subgraphs of the hypercube that can be induced by the solutions of Boolean formulas, as well as for the diameter of the connected components of the solution space. On the computational side, we establish dichotomy theorems for the complexity of the connectivity and st-connectivity questions for the graph of solutions of Boolean formulas. Our results assert that the intractable side of the computational dichotomies is PSPACE-complete, while the tractable side - which includes but is not limited to all problems with polynomial time algorithms for satisfiability - is in P for the st-connectivity question, and in coNP for the connectivity question. The diameter of components can be exponential for the PSPACE-complete cases, whereas in all other cases it is linear; thus, small diameter and tractability of the connectivity problems are remarkably aligned. The crux of our results is an expressibility theorem showing that in the tractable cases, the subgraphs induced by the solution space possess certain good structural properties, whereas in the intractable cases, the subgraphs can be arbitrary.
△ Less
Submitted 3 October, 2007; v1 submitted 13 September, 2006;
originally announced September 2006.
-
A Seamless Integration of Association Rule Mining with Database Systems
Authors:
Raj P. Gopalan,
Tariq Nuruddin,
Yudho Giri Sucahyo
Abstract:
The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate association rule mining with database systems. In this paper, we describe a seamless integration scheme for database queries and association rule discovery using a comm…
▽ More
The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate association rule mining with database systems. In this paper, we describe a seamless integration scheme for database queries and association rule discovery using a common query optimizer for both. Query trees of expressions in an extended algebra are used for internal representation in the optimizer. The algebraic representation is flexible enough to deal with constrained association rule queries and other variations of association rule specifications. We propose modularization to simplify the query tree for complex tasks in data mining. It paves the way for making use of existing algorithms for constructing query plans in the optimization process. How the integration scheme we present will facilitate greater user control over the data mining process is also discussed. The work described in this paper forms part of a larger project for fully integrating data mining with database management.
△ Less
Submitted 28 June, 2001;
originally announced June 2001.