Skip to main content

Showing 1–24 of 24 results for author: Mehta, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.18630  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Hyperparameter Optimization with Checkpointed Model Weights

    Authors: Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, Arun George Zachariah

    Abstract: When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for mor… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: See the project website at https://research.nvidia.com/labs/toronto-ai/FMS/

    MSC Class: 68T05 ACM Class: I.2.6; G.1.6; D.2.8

  2. arXiv:2404.05155  [pdf, other

    cs.LG cs.GT stat.ML

    On the price of exact truthfulness in incentive-compatible online learning with bandit feedback: A regret lower bound for WSU-UX

    Authors: Ali Mortazavi, Junhao Lin, Nishant A. Mehta

    Abstract: In one view of the classical game of prediction with expert advice with binary outcomes, in each round, each expert maintains an adversarially chosen belief and honestly reports this belief. We consider a recently introduced, strategic variant of this problem with selfish (reputation-seeking) experts, where each expert strategically reports in order to maximize their expected future reputation bas… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted to AISTATS 2024

  3. arXiv:2403.01315  [pdf, ps, other

    cs.LG stat.ML

    Near-optimal Per-Action Regret Bounds for Slee** Bandits

    Authors: Quan Nguyen, Nishant A. Mehta

    Abstract: We derive near-optimal per-action regret bounds for slee** bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal slee** regrets. Compared to the min… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: V2: corrected Theorem 8 (FTARL's high probability bound) from log(1/delta) to log(K/delta)

  4. arXiv:2312.01167  [pdf, other

    cs.CV cs.LG stat.ML

    Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning

    Authors: Vinay K Verma, Nikhil Mehta, Kevin J Liang, Aakansha Mishra, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models as… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024. arXiv admin note: substantial text overlap with arXiv:2102.11856

  5. arXiv:2301.04268  [pdf, other

    cs.LG cs.AI stat.ML

    Adversarial Online Multi-Task Reinforcement Learning

    Authors: Quan Nguyen, Nishant A. Mehta

    Abstract: We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $λ$-separability… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: To appear at the 34th International Conference on Algorithmic Learning Theory (ALT 2023)

  6. arXiv:2111.05299  [pdf, other

    cs.IT cs.AI cs.LG q-bio.NC stat.ML

    Can Information Flows Suggest Targets for Interventions in Neural Circuits?

    Authors: Praveen Venkatesh, Sanghamitra Dutta, Neil Mehta, Pulkit Grover

    Abstract: Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framew… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). (29 pages; 61 figures)

    Journal ref: Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

  7. arXiv:2102.11856  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning

    Authors: Vinay Kumar Verma, Kevin Liang, Nikhil Mehta, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to trai… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Under Review

  8. arXiv:2010.12618  [pdf, other

    stat.ML cs.LG

    Counterfactual Representation Learning with Balancing Weights

    Authors: Serge Assaad, Shuxi Zeng, Chenyang Tao, Shounak Datta, Nikhil Mehta, Ricardo Henao, Fan Li, Lawrence Carin

    Abstract: A key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. In this work, we discuss the pitfalls of these strategies - such as a steep trade-off between achieving balance and predictive power - and present a remedy via the integration of balancing wei… ▽ More

    Submitted 23 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

  9. arXiv:2008.05687  [pdf, other

    cs.LG stat.ML

    WAFFLe: Weight Anonymized Factorization for Federated Learning

    Authors: Weituo Hao, Nikhil Mehta, Kevin J Liang, Pengyu Cheng, Mostafa El-Khamy, Lawrence Carin

    Abstract: In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore,… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  10. arXiv:2004.10098  [pdf, other

    cs.LG stat.ML

    Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors

    Authors: Nikhil Mehta, Kevin J Liang, Vinay K Verma, Lawrence Carin

    Abstract: Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,… ▽ More

    Submitted 27 April, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 Post-conference updates: Fixed typo in equation (11) and updated references

  11. arXiv:2003.03456  [pdf, other

    cs.LG cs.AI stat.ML

    A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

    Authors: P Sharoff, Nishant A. Mehta, Ravi Ganti

    Abstract: We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the chosen action yields a stochastic reward. The agent seeks to maximize its cumulative reward over a finite time budget, with the option of "giving up" on a current a… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: 16 pages, AISTATS 2020

  12. arXiv:2003.00355  [pdf, other

    stat.ML cs.LG

    Survival Cluster Analysis

    Authors: Paidamoyo Chapfuwa, Chunyuan Li, Nikhil Mehta, Lawrence Carin, Ricardo Henao

    Abstract: Conventional survival analysis approaches estimate risk scores or individualized time-to-event distributions conditioned on covariates. In practice, there is often great population-level phenotypic heterogeneity, resulting from (unknown) subpopulations with diverse risk profiles or survival distributions. As a result, there is an unmet need in survival analysis for identifying subpopulations with… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted at ACM CHIL 2020. Code: this https URL, https://github.com/paidamoyo/survival_cluster_analysis

  13. arXiv:1910.13521  [pdf, other

    cs.LG stat.ML

    Dying Experts: Efficient Algorithms with Optimal Regret Bounds

    Authors: Hamid Shayestehmanesh, Sajjad Azami, Nishant A. Mehta

    Abstract: We study a variant of decision-theoretic online learning in which the set of experts that are available to Learner can shrink over time. This is a restricted version of the well-studied slee** experts problem, itself a generalization of the fundamental game of prediction with expert advice. Similar to many works in this direction, our benchmark is the ranking regret. Various results suggest that… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: 18 Pages, NeurIPS 2019

  14. arXiv:1910.09227  [pdf, other

    math.ST cs.LG stat.ME

    Safe-Bayesian Generalized Linear Regression

    Authors: Rianne de Heide, Alisa Kirichenko, Nishant Mehta, Peter Grünwald

    Abstract: We study generalized Bayesian inference under misspecification, i.e. when the model is 'wrong but useful'. Generalized Bayes equips the likelihood with a learning rate $η$. We show that for generalized linear models (GLMs), $η$-generalized Bayes concentrates around the best approximation of the truth within the model for specific $η\neq 1$, even under severely misspecified noise, as long as the ta… ▽ More

    Submitted 29 May, 2021; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Final version. Accepted to AISTATS 2020

  15. arXiv:1905.05738  [pdf, other

    cs.LG cs.SI stat.ML

    Stochastic Blockmodels meet Graph Neural Networks

    Authors: Nikhil Mehta, Lawrence Carin, Piyush Rai

    Abstract: Stochastic blockmodels (SBM) and their variants, $e.g.$, mixed-membership and overlap** stochastic blockmodels, are latent variable based generative models for graphs. They have proven to be successful for various tasks, such as discovering the community structure and link prediction on graph-structured data. Recently, graph neural networks, $e.g.$, graph convolutional networks, have also emerge… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  16. arXiv:1905.04655  [pdf, other

    cs.CL cs.AI cs.RO stat.ML

    Improving Natural Language Interaction with Robots Using Advice

    Authors: Nikhil Mehta, Dan Goldwasser

    Abstract: Over the last few years, there has been growing interest in learning models for physically grounded language understanding tasks, such as the popular blocks world domain. These works typically view this problem as a single-step process, in which a human operator gives an instruction and an automated agent is evaluated on its ability to execute it. In this paper we take the first step towards incre… ▽ More

    Submitted 12 May, 2019; originally announced May 2019.

    Comments: Accepted as a short paper at NAACL 2019 (8 pages)

  17. arXiv:1710.07732  [pdf, other

    cs.LG stat.ML

    A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

    Authors: Peter D. Grünwald, Nishant A. Mehta

    Abstract: We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexit… ▽ More

    Submitted 20 October, 2017; originally announced October 2017.

    Comments: 38 pages

  18. arXiv:1609.03319  [pdf, other

    cs.LG stat.ML

    CompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method

    Authors: Nishant A. Mehta, Alistair Rendell, Anish Varghese, Christfried Webers

    Abstract: The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method's full-matrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is compu… ▽ More

    Submitted 4 October, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

    Comments: only updated acknowledgements

  19. arXiv:1605.00252  [pdf, other

    cs.LG stat.ML

    Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes

    Authors: Peter D. Grünwald, Nishant A. Mehta

    Abstract: We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $η$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Baye… ▽ More

    Submitted 5 November, 2019; v1 submitted 1 May, 2016; originally announced May 2016.

    Comments: accepted to JMLR pending minor final modifications

  20. arXiv:1507.02592  [pdf, other

    cs.LG stat.ML

    Fast rates in statistical and online learning

    Authors: Tim van Erven, Peter D. Grünwald, Nishant A. Mehta, Mark D. Reid, Robert C. Williamson

    Abstract: The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most… ▽ More

    Submitted 1 September, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

    Comments: 69 pages, 3 figures

    Journal ref: Journal of Machine Learning Research 6(54):1793-1861, 2015

  21. arXiv:1406.3781  [pdf, other

    cs.LG stat.ML

    From Stochastic Mixability to Fast Rates

    Authors: Nishant A. Mehta, Robert C. Williamson

    Abstract: Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\mathsf{P}$ and returns a hypothesis $f$ chosen from a fixed class $\mathcal{F}$ with small loss $\ell$. In the parametric setting, depending upon $(\ell, \mathcal{F},\mathsf{P})$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rate… ▽ More

    Submitted 22 November, 2014; v1 submitted 14 June, 2014; originally announced June 2014.

    Comments: 21 pages, accepted to NIPS 2014

  22. arXiv:1209.2784  [pdf, other

    cs.LG stat.ML

    Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

    Authors: Nishant A. Mehta, Dongryeol Lee, Alexander G. Gray

    Abstract: Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxat… ▽ More

    Submitted 13 September, 2012; originally announced September 2012.

    Comments: appearing at NIPS 2012

  23. arXiv:1202.4050  [pdf, other

    cs.LG stat.ML

    On the Sample Complexity of Predictive Sparse Coding

    Authors: Nishant A. Mehta, Alexander G. Gray

    Abstract: The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have… ▽ More

    Submitted 7 October, 2012; v1 submitted 17 February, 2012; originally announced February 2012.

    Comments: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real datasets

  24. arXiv:1005.0188  [pdf, other

    cs.LG stat.ML

    Generative and Latent Mean Map Kernels

    Authors: Nishant A. Mehta, Alexander G. Gray

    Abstract: We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain c… ▽ More

    Submitted 3 May, 2010; originally announced May 2010.

    Comments: 16 pages, 1 figure, 1 table