Skip to main content

Showing 1–17 of 17 results for author: Lee, J C H

.
  1. arXiv:2312.11769  [pdf, other

    cs.LG cs.DS cs.IT math.ST stat.ML

    Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Thanasis Pittas

    Abstract: We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the sam… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2311.12784  [pdf, ps, other

    math.ST cs.IT cs.LG stat.ML

    Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments

    Authors: Trung Dang, Jasper C. H. Lee, Maoyuan Song, Paul Valiant

    Abstract: There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distribution… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 27 pages, to appear in NeurIPS 2023. Abstract shortened to fit arXiv limit

  3. arXiv:2311.08022  [pdf, ps, other

    cs.AI cs.LG

    Two-Stage Predict+Optimize for Mixed Integer Linear Programs with Unknown Parameters in Constraints

    Authors: Xinyi Hu, Jasper C. H. Lee, Jimmy H. M. Lee

    Abstract: Consider the setting of constrained optimization, with some parameters unknown at solving time and requiring prediction from relevant features. Predict+Optimize is a recent framework for end-to-end training supervised learning models for such predictions, incorporating information about the optimization problem in the training process in order to yield better predictions in terms of the quality of… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2306.16573  [pdf, other

    math.ST cs.IT cs.LG math.PR stat.ML

    Finite-Sample Symmetric Mean Estimation with Fisher Information Rate

    Authors: Shivam Gupta, Jasper C. H. Lee, Eric Price

    Abstract: The mean of an unknown variance-$σ^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{σ^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: COLT 2023

  5. arXiv:2305.00966  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Ankit Pensia, Thanasis Pittas

    Abstract: We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  6. arXiv:2303.06698  [pdf, ps, other

    cs.LG cs.AI math.OC

    Branch & Learn with Post-hoc Correction for Predict+Optimize with Unknown Parameters in Constraints

    Authors: Xinyi Hu, Jasper C. H. Lee, Jimmy H. M. Lee

    Abstract: Combining machine learning and constrained optimization, Predict+Optimize tackles optimization problems containing parameters that are unknown at the time of solving. Prior works focus on cases with unknowns only in the objectives. A new framework was recently proposed to cater for unknowns also in constraints by introducing a loss function, called Post-hoc Regret, that takes into account the cost… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  7. arXiv:2302.02497  [pdf, other

    math.ST cs.IT cs.LG math.PR stat.ML

    High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors

    Authors: Shivam Gupta, Jasper C. H. Lee, Eric Price

    Abstract: In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $λ$, and want to estimate $λ$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cramér-Rao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends o… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

  8. arXiv:2211.16333  [pdf, ps, other

    cs.DS cs.LG math.ST stat.ML

    Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Ankit Pensia

    Abstract: We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: To appear in NeurIPS 2022

  9. arXiv:2209.03668  [pdf, other

    cs.AI cs.LG math.OC

    Predict+Optimize for Packing and Covering LPs with Unknown Parameters in Constraints

    Authors: Xinyi Hu, Jasper C. H. Lee, Jimmy H. M. Lee

    Abstract: Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unk… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  10. arXiv:2206.02348  [pdf, other

    math.ST cs.DS cs.IT cs.LG stat.ML

    Finite-Sample Maximum Likelihood Estimation of Location

    Authors: Shivam Gupta, Jasper C. H. Lee, Eric Price, Paul Valiant

    Abstract: We consider 1-dimensional location estimation, where we estimate a parameter $λ$ from $n$ samples $λ+ η_i$, with each $η_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cramér-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where… ▽ More

    Submitted 18 July, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Corrected an inaccuracy in the description of the experimental setup. Also updated funding acknowledgements

  11. arXiv:2205.01672  [pdf, other

    cs.LG cs.AI math.OC

    Branch & Learn for Recursively and Iteratively Solvable Problems in Predict+Optimize

    Authors: Xinyi Hu, Jasper C. H. Lee, Jimmy H. M. Lee, Allen Z. Zhong

    Abstract: This paper proposes Branch & Learn, a framework for Predict+Optimize to tackle optimization problems containing parameters that are unknown at the time of solving. Given an optimization problem solvable by a recursive algorithm satisfying simple conditions, we show how a corresponding learning algorithm can be constructed directly and methodically from the recursive algorithm. Our framework applie… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

  12. arXiv:2011.08384  [pdf, ps, other

    math.ST cs.DS cs.IT cs.LG stat.ML

    Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$

    Authors: Jasper C. H. Lee, Paul Valiant

    Abstract: We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, "our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distribution of matching variance." Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

  13. arXiv:2007.07878  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

    Authors: Uthsav Chitra, Kimberly Ding, Jasper C. H. Lee, Benjamin J. Raphael

    Abstract: Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an $\textit{anomaly family}$. For example, in temporal data the anomaly family may be time intervals, while… ▽ More

    Submitted 11 June, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: Accepted to ICML 2021

  14. arXiv:1912.07673  [pdf, ps, other

    cs.DS cs.CG

    Finding the Mode of a Kernel Density Estimate

    Authors: Jasper C. H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, Wai Ming Tai

    Abstract: Given points $p_1, \dots, p_n$ in $\mathbb{R}^d$, how do we find a point $x$ which maximizes $\frac{1}{n} \sum_{i=1}^n e^{-\|p_i - x\|^2}$? In other words, how do we find the maximizing point, or mode of a Gaussian kernel density estimation (KDE) centered at $p_1, \dots, p_n$? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding p… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  15. arXiv:1904.09228  [pdf, other

    cs.LG cs.DS stat.ML

    Uncertainty about Uncertainty: Optimal Adaptive Algorithms for Estimating Mixtures of Unknown Coins

    Authors: Jasper C. H. Lee, Paul Valiant

    Abstract: Given a mixture between two populations of coins, "positive" coins that each have -- unknown and potentially different -- bias $\geq\frac{1}{2}+Δ$ and "negative" coins with bias $\leq\frac{1}{2}-Δ$, we consider the task of estimating the fraction $ρ$ of positive coins to within additive error $ε$. We achieve an upper and lower bound of $Θ(\fracρ{ε^2Δ^2}\log\frac{1}δ)$ samples for a $1-δ$ probabili… ▽ More

    Submitted 5 February, 2021; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: Full paper updated to reflect the new result in our SODA 2021 proceedings version: our new sample complexity lower bound includes dependence on the failure probability, and hence is simultaneously tight in all of the problem parameters up to a constant multiplicative factor

  16. arXiv:1806.04325  [pdf, ps, other

    cs.AI

    Augmenting Stream Constraint Programming with Eventuality Conditions

    Authors: Jasper C. H. Lee, Jimmy H. M. Lee, Allen Z. Zhong

    Abstract: Stream constraint programming is a recent addition to the family of constraint programming frameworks, where variable domains are sets of infinite streams over finite alphabets. Previous works showed promising results for its applicability to real-world planning and control problems. In this paper, motivated by the modelling of planning applications, we improve the expressiveness of the framework… ▽ More

    Submitted 6 August, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

    Comments: Added proofs and an appendix containing a constraint model that was not included in the previous version

  17. arXiv:1511.04466  [pdf, other

    cs.DS

    Optimizing Star-Convex Functions

    Authors: Jasper C. H. Lee, Paul Valiant

    Abstract: We introduce a polynomial time algorithm for optimizing the class of star-convex functions, under no restrictions except boundedness on a region about the origin, and Lebesgue measurability. The algorithm's performance is polynomial in the requested number of digits of accuracy, contrasting with the previous best known algorithm of Nesterov and Polyak that has exponential dependence, and that furt… ▽ More

    Submitted 11 May, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: 30 pages (including appendices)