Skip to main content

Showing 1–29 of 29 results for author: Blanc, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16340  [pdf, ps, other

    cs.CC

    A Strong Direct Sum Theorem for Distributional Query Complexity

    Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

    Abstract: Consider the expected query complexity of computing the $k$-fold direct product $f^{\otimes k}$ of a function $f$ to error $\varepsilon$ with respect to a distribution $μ^k$. One strategy is to sequentially compute each of the $k$ copies to error $\varepsilon/k$ with respect to $μ$ and apply the union bound. We prove a strong direct sum theorem showing that this naive strategy is essentially optim… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 34 pages, 4 figures, CCC 2024

  2. arXiv:2310.01551  [pdf, other

    cs.LG cs.AI cs.DS

    Harnessing the Power of Choices in Decision Tree Learning

    Authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari

    Abstract: We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

    ACM Class: I.2.0; I.2.m

  3. arXiv:2307.04039  [pdf, ps, other

    cs.CC cs.DS

    A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers

    Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

    Abstract: We prove a strong composition theorem for junta complexity and show how such theorems can be used to generically boost the performance of property testers. The $\varepsilon$-approximate junta complexity of a function $f$ is the smallest integer $r$ such that $f$ is $\varepsilon$-close to a function that depends only on $r$ variables. A strong composition theorem states that if $f$ has large… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: 44 pages, 1 figure, FOCS 2023

  4. arXiv:2303.16208  [pdf, ps, other

    stat.ML cs.CC cs.DS cs.LG

    Lifting uniform learners via distributional decomposition

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by… ▽ More

    Submitted 29 March, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: To appear in STOC 2023

  5. arXiv:2302.08661  [pdf, other

    cs.LG cs.DS cs.IT

    Subsampling Suffices for Adaptive Data Analysis

    Authors: Guy Blanc

    Abstract: Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the sem… ▽ More

    Submitted 20 September, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Improves upon a prior version which appeared at STOC 2023. See section 3 for details of these improvements

  6. arXiv:2211.02257  [pdf, ps, other

    cs.CC cs.DS

    Certification with an NP Oracle

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan

    Abstract: In the certification problem, the algorithm is given a function $f$ with certificate complexity $k$ and an input $x^\star$, and the goal is to find a certificate of size $\le \text{poly}(k)$ for $f$'s value at $x^\star$. This problem is in $\mathsf{NP}^{\mathsf{NP}}$, and assuming $\mathsf{P} \ne \mathsf{NP}$, is not in $\mathsf{P}$. Prior works, dating back to Valiant in 1984, have therefore soug… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 25 pages, 2 figures, ITCS 2023

  7. arXiv:2209.03112  [pdf, ps, other

    cs.LG cs.DS

    Multitask Learning via Shared Features: Algorithms and Hardness

    Authors: Konstantina Bairaktari, Guy Blanc, Li-Yang Tan, Jonathan Ullman, Lydia Zakynthinou

    Abstract: We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $γ$, which is based on a simultaneous boosting technique and requires only… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  8. arXiv:2207.07072  [pdf, ps, other

    cs.DS cs.LG

    A Query-Optimal Algorithm for Finding Counterfactuals

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Li-Yang Tan

    Abstract: We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model $f : X^d \to \{0,1\}$ and instance $x^\star$, our algorithm makes \[ {S(f)^{O(Δ_f(x^\star))}\cdot \log d}\] queries to $f$ and returns {an {\sl optimal}} counterfactual for $x^\star$: a nearest instance $x'$ to $x^\star$ for which $f(x')\ne f(x^\star)$. Here $S(f)$ is th… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 22 pages, ICML 2022

  9. arXiv:2206.14431  [pdf, other

    cs.DS cs.LG stat.ML

    Open Problem: Properly learning decision trees in polynomial time?

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: The authors recently gave an $n^{O(\log\log n)}$ time membership query algorithm for properly learning decision trees under the uniform distribution (Blanc et al., 2021). The previous fastest algorithm for this problem ran in $n^{O(\log n)}$ time, a consequence of Ehrenfeucht and Haussler (1989)'s classic algorithm for the distribution-free setting. In this article we highlight the natural open pr… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 5 pages, to appear at the Open Problem sessions at COLT 2022

  10. arXiv:2206.08899  [pdf, ps, other

    cs.LG cs.DS

    Popular decision tree algorithms are provably noise tolerant

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide near-matching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been centr… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  11. arXiv:2201.07736  [pdf, ps, other

    cs.DS cs.CC

    The Query Complexity of Certification

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Li-Yang Tan

    Abstract: We study the problem of {\sl certification}: given queries to a function $f : \{0,1\}^n \to \{0,1\}$ with certificate complexity $\le k$ and an input $x^\star$, output a size-$k$ certificate for $f$'s value on $x^\star$. This abstractly models a central problem in explainable machine learning, where we think of $f$ as a blackbox model that we seek to explain the predictions of. For monotone func… ▽ More

    Submitted 6 April, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 30 pages, to appear in STOC'22. Edit: fixed typos and added references

  12. arXiv:2111.10352  [pdf, ps, other

    cs.LG cs.CC cs.DS

    On the power of adaptivity in statistical adversaries

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: We study a fundamental question concerning adversarial noise models in statistical problems where the algorithm receives i.i.d. draws from a distribution $\mathcal{D}$. The definitions of these adversaries specify the type of allowable corruptions (noise model) as well as when these corruptions can be made (adaptivity); the latter differentiates between oblivious adversaries that can only corrupt… ▽ More

    Submitted 29 June, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: To appear in COLT 2022. Updated with reviewer feedback

  13. arXiv:2111.01576  [pdf, ps, other

    cs.LG cs.CC cs.DS

    Provably efficient, succinct, and precise explanations

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We consider the problem of explaining the predictions of an arbitrary blackbox model $f$: given query access to $f$ and an instance $x$, output a small set of $x$'s features that in conjunction essentially determines $f(x)$. We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lac… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  14. arXiv:2109.00637  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Properly learning decision trees in almost polynomial time

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: We give an $n^{O(\log\log n)}$-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over $\{\pm 1\}^n$. Even in the realizable setting, the previous fastest runtime was $n^{O(\log n)}$, a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision tr… ▽ More

    Submitted 1 November, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 21 pages, to appear in FOCS 2021

  15. arXiv:2107.00819  [pdf, other

    cs.LG cs.DS stat.ML

    Decision tree heuristics can fail, even in the smoothed setting

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possibl… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

    Comments: To appear in RANDOM 2021

  16. arXiv:2106.05579  [pdf, ps, other

    cs.DS

    Multiway Online Correlated Selection

    Authors: Guy Blanc, Moses Charikar

    Abstract: We give a $0.5368$-competitive algorithm for edge-weighted online bipartite matching. Prior to our work, the best competitive ratio was $0.5086$ due to Fahrbach, Huang, Tao, and Zadimoghaddam (FOCS 2020). They achieved their breakthrough result by develo** a subroutine called \emph{online correlated selection} (OCS) which takes as input a sequence of pairs and selects one item from each pair. Im… ▽ More

    Submitted 20 September, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Added motivation and a more thorough overview of our paper in the first few sections, fixed a few typos

  17. arXiv:2105.03594  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Learning stochastic decision trees

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $η$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm runs in time $n^{O(\log(s/\varepsilon)/\varepsilon^2)}$ and returns a hypothesis with error within an additive $2η+ \varepsilon$ of the Bayes optimal. An a… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: To appear in ICALP 2021

  18. arXiv:2012.08735  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Reconstructing decision trees

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give the first {\sl reconstruction algorithm} for decision trees: given queries to a function $f$ that is $\mathrm{opt}$-close to a size-$s$ decision tree, our algorithm provides query access to a decision tree $T$ where: $\circ$ $T$ has size $S = s^{O((\log s)^2/\varepsilon^3)}$; $\circ$ $\mathrm{dist}(f,T)\le O(\mathrm{opt})+\varepsilon$; $\circ$ Every query to $T$ is answered with… ▽ More

    Submitted 22 May, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: To appear in ICALP 2022

  19. arXiv:2011.01584  [pdf, other

    cs.LG cs.DS

    Estimating decision tree learnability with polylogarithmic sample complexity

    Authors: Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

    Abstract: We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than info… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 25 pages, to appear in NeurIPS 2020

  20. arXiv:2010.11381  [pdf, ps, other

    cs.DS cs.CC

    Query strategies for priced information, revisited

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We consider the problem of designing query strategies for priced information, introduced by Charikar et al. In this problem the algorithm designer is given a function $f : \{0,1\}^n \to \{-1,1\}$ and a price associated with each of the $n$ coordinates. The goal is to design a query strategy for determining $f$'s value on unknown inputs for minimum cost. Prior works on this problem have focused o… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  21. arXiv:2010.08633  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Universal guarantees for decision tree induction via a higher-order splitting criterion

    Authors: Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

    Abstract: We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitt… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  22. arXiv:2008.06431  [pdf, other

    stat.ML cs.LG

    Efficient hyperparameter optimization by way of PAC-Bayes bound minimization

    Authors: John J. Cherian, Andrew G. Taube, Robert T. McGibbon, Panagiotis Angelikopoulos, Guy Blanc, Michael Snarski, Daniel D. Richman, John L. Klepeis, David E. Shaw

    Abstract: Identifying optimal values for a high-dimensional set of hyperparameters is a problem that has received growing attention given its importance to large-scale machine learning applications such as neural architecture search. Recently developed optimization methods can be used to select thousands or even millions of hyperparameters. Such methods often yield overfit models, however, leading to poor p… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  23. arXiv:2006.00743  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Provable guarantees for decision tree induction: the agnostic setting

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give strengthened provable guarantees on the performance of widely employed and empirically successful {\sl top-down decision tree learning heuristics}. While prior works have focused on the realizable setting, we consider the more realistic and challenging {\sl agnostic} setting. We show that for all monotone functions~$f$ and parameters $s\in \mathbb{N}$, these heuristics construct a decision… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: 20 pages, to appear in ICML 2020

  24. arXiv:1912.11831  [pdf, other

    cs.CR cs.LG cs.NE eess.SP

    Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders

    Authors: Mustafizur Rahman Shahid, Gregory Blanc, Zonghua Zhang, Hervé Debar

    Abstract: Nowadays, IoT devices have been widely deployed for enabling various smart services, such as, smart home or e-healthcare. However, security remains as one of the paramount concern as many IoT devices are vulnerable. Moreover, IoT malware are constantly evolving and getting more sophisticated. IoT devices are intended to perform very specific tasks, so their networking behavior is expected to be re… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

    Journal ref: 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), Sep 2019, Cambridge, United States. pp.1-5

  25. arXiv:1912.03042  [pdf, ps, other

    cs.CC cs.DS

    Constructive derandomization of query algorithms

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give efficient deterministic algorithms for converting randomized query algorithms into deterministic ones. We first give an algorithm that takes as input a randomized $q$-query algorithm $R$ with description length $N$ and a parameter $\varepsilon$, runs in time $\mathrm{poly}(N) \cdot 2^{O(q/\varepsilon)}$, and returns a deterministic $O(q/\varepsilon)$-query algorithm $D$ that $\varepsilon$-… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  26. arXiv:1911.07375  [pdf, other

    cs.DS cs.CC cs.LG

    Top-down induction of decision trees: rigorous guarantees and inherent limitations

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: Consider the following heuristic for building a decision tree for a function $f : \{0,1\}^n \to \{\pm 1\}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the left and right subtrees respectively; terminate once the tree is an $\varepsilon$-approximation of $f$. We analyze the quality of this heuristic, obtaining near-ma… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

  27. arXiv:1904.09080  [pdf, other

    cs.LG stat.ML

    Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

    Authors: Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

    Abstract: We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm o… ▽ More

    Submitted 22 July, 2020; v1 submitted 19 April, 2019; originally announced April 2019.

  28. arXiv:1712.00527  [pdf, other

    cs.LG

    Adaptive Sampled Softmax with Kernel Based Sampling

    Authors: Guy Blanc, Steffen Rendle

    Abstract: Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it too expensive for many real-world problems. A common approach to speed up training involves sampling only some of the classes at each training step. It is known… ▽ More

    Submitted 1 August, 2018; v1 submitted 1 December, 2017; originally announced December 2017.

  29. Combining Technical and Financial Impacts for Countermeasure Selection

    Authors: Gustavo Gonzalez-Granadillo, Christophe Ponchel, Gregory Blanc, Hervé Debar

    Abstract: Research in information security has generally focused on providing a comprehensive interpretation of threats, vulnerabilities, and attacks, in particular to evaluate their danger and prioritize responses accordingly. Most of the current approaches propose advanced techniques to detect intrusions and complex attacks but few of these approaches propose well defined methodologies to react against a… ▽ More

    Submitted 16 October, 2014; originally announced November 2014.

    Comments: In Proceedings AIDP 2014, arXiv:1410.3226

    Journal ref: EPTCS 165, 2014, pp. 1-14