Skip to main content

Showing 1–27 of 27 results for author: Nguyen, V A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.02317  [pdf, other

    cs.LG cs.AI stat.ML

    Generative Conditional Distributions by Neural (Entropic) Optimal Transport

    Authors: Bao Nguyen, Binh Nguyen, Hieu Trung Nguyen, Viet Anh Nguyen

    Abstract: Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 15 pages, 8 figures

  2. arXiv:2405.20124  [pdf, other

    stat.ML cs.LG math.OC

    A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

    Authors: Man-Chung Yue, Yves Rychener, Daniel Kuhn, Viet Anh Nguyen

    Abstract: The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a pri… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2312.09862  [pdf, other

    math.ST stat.ME

    Wasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying Extremes

    Authors: Xuhui Zhang, Jose Blanchet, Youssef Marzouk, Viet Anh Nguyen, Sven Wang

    Abstract: We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the $p$-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are es… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  4. arXiv:2210.01413  [pdf, other

    math.OC cs.LG stat.ML

    Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints

    Authors: Jia** Li, Sirui Lin, Jose Blanchet, Viet Anh Nguyen

    Abstract: Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. F… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022

  5. arXiv:2202.10723  [pdf, other

    cs.LG cs.AI stat.ML

    Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics

    Authors: Tam Le, Truyen Nguyen, Dinh Phung, Viet Anh Nguyen

    Abstract: Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport me… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: AISTATS 2022

  6. arXiv:2202.03071  [pdf, other

    cs.LG math.OC stat.ML

    Distributionally Robust Fair Principal Components via Geodesic Descents

    Authors: Hieu Vu, Toan Tran, Man-Chung Yue, Viet Anh Nguyen

    Abstract: Principal component analysis is a simple yet useful dimensionality reduction technique in modern machine learning pipelines. In consequential domains such as college admission, healthcare and credit approval, it is imperative to take into account emerging criteria such as the fairness and the robustness of the learned projection. In this paper, we propose a distributionally robust optimization pro… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: International Conference on Learning Representations (ICLR) 2022

  7. arXiv:2202.00871  [pdf, other

    stat.ME q-fin.MF

    Bayesian Imputation with Optimal Look-Ahead-Bias and Variance Tradeoff

    Authors: Jose Blanchet, Fernando Hernandez, Viet Anh Nguyen, Markus Pelger, Xuhui Zhang

    Abstract: Missing time-series data is a prevalent problem in many prescriptive analytics models in operations management, healthcare and finance. Imputation methods for time-series data are usually applied to the full panel data with the purpose of training a prescriptive model for a downstream out-of-sample task. For example, the imputation of missing asset returns may be applied before estimating an optim… ▽ More

    Submitted 11 April, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: This work merges and supersedes arXiv:2102.12736

  8. arXiv:2109.14875  [pdf, other

    stat.ML cs.LG math.OC

    Adversarial Regression with Doubly Non-negative Weighting Matrices

    Authors: Tam Le, Truyen Nguyen, Makoto Yamada, Jose Blanchet, Viet Anh Nguyen

    Abstract: Many machine learning tasks that involve predicting an output response can be solved by training a weighted regression model. Unfortunately, the predictive power of this type of models may severely deteriorate under low sample sizes or under covariate perturbations. Reweighting the training samples has aroused as an effective mitigation strategy to these problems. In this paper, we propose a novel… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS2021)

  9. arXiv:2108.02120  [pdf, other

    math.ST cs.LG math.OC stat.ML

    Statistical Analysis of Wasserstein Distributionally Robust Estimators

    Authors: Jose Blanchet, Karthyek Murthy, Viet Anh Nguyen

    Abstract: We consider statistical methods which invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting Distributionally Robust Op… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  10. arXiv:2106.01070  [pdf, ps, other

    stat.ML cs.CY cs.LG math.ST

    Testing Group Fairness via Optimal Transport Projections

    Authors: Nian Si, Karthyek Murthy, Jose Blanchet, Viet Anh Nguyen

    Abstract: We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Machine Learning 2021

  11. arXiv:2106.00322  [pdf, other

    cs.LG math.OC stat.ML

    Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts

    Authors: Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

    Abstract: Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  12. arXiv:2105.12022  [pdf, other

    math.OC cs.LG stat.ML

    Principal Component Hierarchy for Sparse Quadratic Programs

    Authors: Robbie Vreugdenhil, Viet Anh Nguyen, Armin Eftekhari, Peyman Mohajerin Esfahani

    Abstract: We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we p… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Journal ref: ICML 2021

  13. arXiv:2103.16451  [pdf, other

    q-fin.PM math.OC stat.ML

    Robustifying Conditional Portfolio Decisions via Optimal Transport

    Authors: Viet Anh Nguyen, Fan Zhang, Shanshan Wang, Jose Blanchet, Erick Delage, Yinyu Ye

    Abstract: We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariat… ▽ More

    Submitted 9 April, 2024; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: 1 figure

  14. arXiv:2103.06828  [pdf, other

    cs.LG math.OC stat.ML

    Wasserstein Robust Classification with Fairness Constraints

    Authors: Yijie Wang, Viet Anh Nguyen, Grani A. Hanasusanto

    Abstract: We propose a distributionally robust classification model with a fairness constraint that encourages the classifier to be fair in view of the equality of opportunity criterion. We use a type-$\infty$ Wasserstein ambiguity set centered at the empirical distribution to model distributional uncertainty and derive a conservative reformulation for the worst-case equal opportunity unfairness measure. We… ▽ More

    Submitted 11 July, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

  15. arXiv:2102.12736   

    stat.ML cs.LG

    Time-Series Imputation with Wasserstein Interpolation for Optimal Look-Ahead-Bias and Variance Tradeoff

    Authors: Jose Blanchet, Fernando Hernandez, Viet Anh Nguyen, Markus Pelger, Xuhui Zhang

    Abstract: Missing time-series data is a prevalent practical problem. Imputation methods in time-series data often are applied to the full panel data with the purpose of training a model for a downstream out-of-sample task. For example, in finance, imputation of missing returns may be applied prior to training a portfolio optimization model. Unfortunately, this practice may result in a look-ahead-bias in the… ▽ More

    Submitted 11 April, 2023; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: This paper has been superseded by arXiv:2202.00871

  16. arXiv:2012.04800  [pdf, other

    cs.LG cs.CY stat.ML

    A Statistical Test for Probabilistic Fairness

    Authors: Bahar Taskesen, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

    Abstract: Algorithms are now routinely used to make consequential decisions that affect human lives. Examples include college admissions, medical interventions or law enforcement. While algorithms empower us to harness all information hidden in vast amounts of data, they may inadvertently amplify existing biases in the available datasets. This concern has sparked increasing interest in fair machine learning… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  17. arXiv:2010.05373  [pdf, other

    stat.ML cs.LG math.ST

    Distributionally Robust Local Non-parametric Conditional Estimation

    Authors: Viet Anh Nguyen, Fan Zhang, Jose Blanchet, Erick Delage, Yinyu Ye

    Abstract: Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perfo… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  18. arXiv:2010.05321  [pdf, ps, other

    stat.ML cs.LG math.ST

    Distributionally Robust Parametric Maximum Likelihood Estimation

    Authors: Viet Anh Nguyen, Xuhui Zhang, Jose Blanchet, Angelos Georghiou

    Abstract: We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited training sample size, is sensitive to noise and may perform poorly on downstream predictive tasks. To mitigate these issues, we propose a distributionally robust max… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  19. arXiv:2009.06111  [pdf, other

    stat.ML cs.LG

    Machine Learning's Dropout Training is Distributionally Robust Optimal

    Authors: Jose Blanchet, Yang Kang, Jose Luis Montiel Olea, Viet Anh Nguyen, Xuhui Zhang

    Abstract: This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game, nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some f… ▽ More

    Submitted 14 April, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

  20. arXiv:2007.09530  [pdf, other

    cs.LG stat.ML

    A Distributionally Robust Approach to Fair Classification

    Authors: Bahar Taskesen, Viet Anh Nguyen, Daniel Kuhn, Jose Blanchet

    Abstract: We propose a distributionally robust logistic regression model with an unfairness penalty that prevents discrimination with respect to sensitive attributes such as gender or ethnicity. This model is equivalent to a tractable convex optimization problem if a Wasserstein ball centered at the empirical distribution on the training data is used to model distributional uncertainty and if a new convex u… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

  21. arXiv:2007.04458  [pdf, other

    cs.LG stat.ML

    Robust Bayesian Classification Using an Optimistic Score Ratio

    Authors: Viet Anh Nguyen, Nian Si, Jose Blanchet

    Abstract: We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  22. arXiv:1911.03539  [pdf, other

    math.OC cs.LG math.ST stat.ML

    Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

    Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Peyman Mohajerin Esfahani

    Abstract: We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator -- that is, a measurable function of the observation -- and a fictitious adversary choosing a prior -- that is, a pair of signal and… ▽ More

    Submitted 27 January, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

  23. arXiv:1910.10583  [pdf, other

    cs.LG math.OC stat.ML

    Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation

    Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann

    Abstract: The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  24. arXiv:1910.07817  [pdf, other

    math.OC cs.LG stat.ML

    Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization

    Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann

    Abstract: A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

  25. arXiv:1908.08729  [pdf, other

    stat.ML cs.LG math.OC

    Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

    Authors: Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh

    Abstract: Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the s… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: 36 pages

  26. arXiv:1809.08830  [pdf, other

    math.OC cs.LG stat.ML

    Wasserstein Distributionally Robust Kalman Filtering

    Authors: Soroosh Shafieezadeh-Abadeh, Viet Anh Nguyen, Daniel Kuhn, Peyman Mohajerin Esfahani

    Abstract: We study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions. We show that the optimal estimator and the least favorable distribution form a Nash equilibrium. Despite the non-convex nature of the ambiguity set, we prove that the estimation problem is equivalent to a tractable convex program. We further devis… ▽ More

    Submitted 1 October, 2018; v1 submitted 24 September, 2018; originally announced September 2018.

  27. arXiv:1805.07194  [pdf, other

    math.OC q-fin.PM stat.ML

    Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator

    Authors: Viet Anh Nguyen, Daniel Kuhn, Peyman Mohajerin Esfahani

    Abstract: We introduce a distributionally robust maximum likelihood estimation model with a Wasserstein ambiguity set to infer the inverse covariance matrix of a $p$-dimensional Gaussian random vector from $n$ independent samples. The proposed model minimizes the worst case (maximum) of Stein's loss across all normal reference distributions within a prescribed Wasserstein distance from the normal distributi… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: 30 pages, 6 figures, 2 tables