Skip to main content

Showing 1–50 of 112 results for author: Ma, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01837  [pdf, ps, other

    stat.ML cs.IT cs.LG

    To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning

    Authors: Tao Ma, Xuzhi Yang, Zoltan Szabo

    Abstract: Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.13447  [pdf, other

    math.ST cs.IT cs.LG stat.ML

    High-probability minimax lower bounds

    Authors: Tianyi Ma, Kabir A. Verchand, Richard J. Samworth

    Abstract: The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 37 pages, 3 figures

    MSC Class: 62C20; 62B10

  3. arXiv:2406.06802  [pdf, other

    stat.ML cs.LG

    Satisficing Exploration in Bandit Optimization

    Authors: Qing Feng, Tianyi Ma, Ruihao Zhu

    Abstract: Motivated by the concept of satisficing in decision-making, we consider the problem of satisficing exploration in bandit optimization. In this setting, the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshold value) as frequently as possible. The performance is measured by satisficing regret, which is the cumulative deficit of the chosen arm's mean reward… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2404.00474  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Linguistic Calibration of Long-Form Generations

    Authors: Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto

    Abstract: Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form gen… ▽ More

    Submitted 4 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: ICML 2024. Code available at https://github.com/tatsu-lab/linguistic_calibration

  5. arXiv:2402.12875  [pdf, other

    cs.LG cs.CC stat.ML

    Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

    Authors: Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

    Abstract: Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 38 pages, 10 figures. Accepted by ICLR 2024

  6. arXiv:2401.07111  [pdf, other

    stat.AP stat.CO

    Bayesian Signal Matching for Transfer Learning in ERP-Based Brain Computer Interface

    Authors: Tianwen Ma, Jane E. Huggins, Jian Kang

    Abstract: An Event-Related Potential (ERP)-based Brain-Computer Interface (BCI) Speller System assists people with disabilities communicate by decoding electroencephalogram (EEG) signals. A P300-ERP embedded in EEG signals arises in response to a rare, but relevant event (target) among a series of irrelevant events (non-target). Different machine learning methods have constructed binary classifiers to detec… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 34 pages, 6 figures, 4 tables

  7. arXiv:2401.00624  [pdf, other

    stat.ME

    Semi-Confirmatory Factor Analysis for High-Dimensional Data with Interconnected Community Structures

    Authors: Yifan Yang, Tianzhou Ma, Chuan Bi, Shuo Chen

    Abstract: Confirmatory factor analysis (CFA) is a statistical method for identifying and confirming the presence of latent factors among observed variables through the analysis of their covariance structure. Compared to alternative factor models, CFA offers interpretable common factors with enhanced specificity and a more adaptable approach to modeling covariance structures. However, the application of CFA… ▽ More

    Submitted 27 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  8. arXiv:2307.12461  [pdf, ps, other

    cs.LG stat.ML

    Rates of Approximation by ReLU Shallow Neural Networks

    Authors: Tong Mao, Ding-Xuan Zhou

    Abstract: Neural networks activated by the rectified linear unit (ReLU) play a central role in the recent development of deep learning. The topic of approximating functions from Hölder spaces by these networks is crucial for understanding the efficiency of the induced learning algorithms. Although the topic has been well investigated in the setting of deep neural networks with many layers of hidden neurons,… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  9. arXiv:2307.11007  [pdf, other

    cs.LG math.OC stat.ML

    Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

    Authors: Kaiyue Wen, Zhiyuan Li, Tengyu Ma

    Abstract: Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investi… ▽ More

    Submitted 22 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 34 pages,11 figures

  10. arXiv:2305.17126  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models as Tool Makers

    Authors: Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

    Abstract: Recent research has highlighted the potential of large language models (LLMs) to improve their problem-solving capabilities with the aid of suitable external tools. In our work, we further advance this concept by introducing a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two phases: 1) to… ▽ More

    Submitted 10 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Code available at https://github.com/ctlllll/LLM-ToolMaker

  11. arXiv:2305.05722  [pdf

    cs.LG stat.AP

    Enhancing Clinical Predictive Modeling through Model Complexity-Driven Class Proportion Tuning for Class Imbalanced Data: An Empirical Study on Opioid Overdose Prediction

    Authors: Yinan Liu, Xinyu Dong, Weimin Lyu, Richard N. Rosenthal, Rachel Wong, Tengfei Ma, Fusheng Wang

    Abstract: Class imbalance problems widely exist in the medical field and heavily deteriorates performance of clinical predictive models. Most techniques to alleviate the problem rebalance class proportions and they predominantly assume the rebalanced proportions should be a function of the original data and oblivious to the model one uses. This work challenges this prevailing assumption and proposes that li… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  12. arXiv:2303.14281  [pdf, other

    stat.ML cs.LG

    Sequential Knockoffs for Variable Selection in Reinforcement Learning

    Authors: Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

    Abstract: In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the sta… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  13. arXiv:2303.03520  [pdf, other

    stat.ME

    The Effect of Alcohol Consumption on Brain Ageing: A New Causal Inference Framework for Incomplete and Massive Phenomic Data

    Authors: Chixiang Chen, Shuo Chen, Zhenyao Ye, Xu Shi, Tianzhou Ma

    Abstract: Although substance use, such as alcohol consumption, is known to be associated with cognitive decline during ageing, its direct influence on the central nervous system remains unclear. In this study, we aim to investigate the potential influence of alcohol intake frequency on accelerated brain ageing by estimating the mean potential brain-age gap (BAG) index, the difference between brain age and a… ▽ More

    Submitted 4 March, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Contact: [email protected]

  14. arXiv:2301.00363  [pdf

    cs.CV cs.LG stat.AP

    Map** smallholder cashew plantations to inform sustainable tree crop expansion in Benin

    Authors: Leikun Yin, Rahul Ghosh, Chenxi Lin, David Hale, Christoph Weigl, James Obarowski, Junxiong Zhou, Jessica Till, Xiaowei Jia, Troy Mao, Vipin Kumar, Zhenong **

    Abstract: Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could su… ▽ More

    Submitted 15 January, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

    Journal ref: Remote Sensing of Environment, 295, p.113695 (2023)

  15. arXiv:2212.13294  [pdf, other

    stat.ME

    Multivariate Bayesian variable selection with application to multi-trait genetic fine map**

    Authors: Travis Canida, Hongjie Ke, Shuo Chen, Zhenayo Ye, Tianzhou Ma

    Abstract: Variable selection has played a critical role in modern statistical learning and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but most of these methods consider selecting variables for only one response. As more data is being collected nowadays, it is common to analyze multiple related re… ▽ More

    Submitted 1 March, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

    Comments: 46 pages, 4 figures

  16. arXiv:2212.13292  [pdf, other

    stat.ME

    Robust distance correlation for variable screening

    Authors: Tianzhou Ma, Hongjie Ke, Zhao Ren

    Abstract: High-dimensional data are commonly seen in modern statistical applications, variable selection methods play indispensable roles in identifying the critical features for scientific discoveries. Traditional best subset selection methods are computationally intractable with a large number of features, while regularization methods such as Lasso, SCAD and their variants perform poorly in ultrahigh-dime… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

  17. arXiv:2211.14699  [pdf, other

    cs.LG stat.ML

    A Theoretical Study of Inductive Biases in Contrastive Learning

    Authors: Jeff Z. HaoChen, Tengyu Ma

    Abstract: Understanding self-supervised learning is important but challenging. Previous theoretical works study the role of pretraining losses, and view neural networks as general black boxes. However, the recent work of Saunshi et al. argues that the model architecture -- a component largely ignored by previous works -- also has significant influences on the downstream performance of self-supervised learni… ▽ More

    Submitted 8 April, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: ICLR 2023

  18. arXiv:2211.11719  [pdf, other

    cs.LG stat.ML

    First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

    Authors: Kefan Dong, Tengyu Ma

    Abstract: Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural netwo… ▽ More

    Submitted 1 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: added citations and fixed typos

  19. arXiv:2211.05729  [pdf, other

    cs.LG math.OC stat.ML

    How Does Sharpness-Aware Minimization Minimize Sharpness?

    Authors: Kaiyue Wen, Tengyu Ma, Zhiyuan Li

    Abstract: Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient… ▽ More

    Submitted 5 January, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 94 pages, 1 figure

  20. arXiv:2208.08472  [pdf, other

    stat.ME

    Bayesian response adaptive randomization design with a composite endpoint of mortality and morbidity

    Authors: Zhongying Xu, Andriy I. Bandos, Tianzhou Ma, Lu Tang, Victor B. Talisa, Chung-Chou H. Chang

    Abstract: Allocating patients to treatment arms during a trial based on the observed responses accumulated prior to the decision point, and sequential adaptation of this allocation,, could minimize the expected number of failures or maximize total benefit to patients. In this study, we developed a Bayesian response adaptive randomization (RAR) design targeting the endpoint of organ support-free days (OSFD)… ▽ More

    Submitted 31 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  21. arXiv:2207.08977  [pdf, other

    cs.LG stat.ML

    Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

    Authors: Ananya Kumar, Tengyu Ma, Percy Liang, Aditi Raghunathan

    Abstract: We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM. In this paper, we find that ID-calibrated ensembles -- where we s… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to UAI 2022

  22. arXiv:2207.04369  [pdf, other

    stat.AP

    Balancing Producer Fairness and Efficiency via Prior-Weighted Rating System Design

    Authors: Thomas Ma, Michael S. Bernstein, Ramesh Johari, Nikhil Garg

    Abstract: Online marketplaces use rating systems to promote the discovery of high-quality products. However, these systems also lead to high variance in producers' economic outcomes: a new producer who sells high-quality items, may unluckily receive one low rating early on, negatively impacting their future popularity. We investigate the design of rating systems that balance the goals of identifying high-qu… ▽ More

    Submitted 25 November, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: 12 pages, 8 figures, submitted to TheWebConf 2024

  23. arXiv:2111.15086  [pdf, other

    stat.ME

    Scalable Semiparametric Spatio-temporal Regression for Large Data Analysis

    Authors: Ting Fung Ma, Fangfang Wang, Jun Zhu, Anthony R. Ives, Katarzyna E. Lewińska

    Abstract: With the rapid advances of data acquisition techniques, spatio-temporal data are becoming increasingly abundant in a diverse array of disciplines. Here we develop spatio-temporal regression methodology for analyzing large amounts of spatially referenced data collected over time, motivated by environmental studies utilizing remotely sensed satellite data. In particular, we specify a semiparametric… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  24. arXiv:2111.03741  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

    Authors: Margalit Glasgow, Honglin Yuan, Tengyu Ma

    Abstract: Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the ex… ▽ More

    Submitted 11 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted to AISTATS 2022. The first two authors contributed equally

  25. arXiv:2110.05025  [pdf, other

    cs.LG cs.CV stat.ML

    Self-supervised Learning is More Robust to Dataset Imbalance

    Authors: Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

    Abstract: Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experime… ▽ More

    Submitted 22 May, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

  26. arXiv:2107.13163  [pdf, ps, other

    cs.LG stat.ML

    Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

    Authors: Colin Wei, Yining Chen, Tengyu Ma

    Abstract: A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meanin… ▽ More

    Submitted 30 March, 2023; v1 submitted 28 July, 2021; originally announced July 2021.

  27. arXiv:2107.06650  [pdf, other

    cs.GT cs.LG stat.ML

    An Efficient Deep Distribution Network for Bid Shading in First-Price Auctions

    Authors: Tian Zhou, Hao He, Shengjun Pan, Niklas Karlsson, Bharatbhushan Shetty, Brendan Kitts, Djordje Gligorijevic, San Gultekin, Tingyu Mao, Junwei Pan, Jianlong Zhang, Aaron Flores

    Abstract: Since 2019, most ad exchanges and sell-side platforms (SSPs), in the online advertising industry, shifted from second to first price auctions. Due to the fundamental difference between these auctions, demand-side platforms (DSPs) have had to update their bidding strategies to avoid bidding unnecessarily high and hence overpaying. Bid shading was proposed to adjust the bid price intended for second… ▽ More

    Submitted 15 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'21), August 14-18, 2021, Singapore

  28. arXiv:2107.05719  [pdf, other

    stat.ML cs.LG stat.ME

    Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

    Authors: Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

    Abstract: When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution ca… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  29. arXiv:2106.09913  [pdf, other

    cs.LG stat.ML

    Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

    Authors: Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski

    Abstract: Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. Despite a proliferation of proposal algorithms for this task, assessing their performance both theoretically and empirically is still very challenging. Distributional matching algorithms such as (Conditional) Domain Adversarial Networks [Ganin et al., 2016, Long et al… ▽ More

    Submitted 22 November, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: We acknowledge that the previous version of this paper (v1) contained an error - Theorem 3.2 was incorrect. We removed this theorem and updated the rest of the paper in v2

  30. arXiv:2106.09226  [pdf, other

    cs.LG stat.ML

    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

    Authors: Colin Wei, Sang Michael Xie, Tengyu Ma

    Abstract: Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downs… ▽ More

    Submitted 20 April, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

  31. arXiv:2106.06530  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Label Noise SGD Provably Prefers Flat Global Minimizers

    Authors: Alex Damian, Tengyu Ma, Jason D. Lee

    Abstract: In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to… ▽ More

    Submitted 4 December, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: 57 pages, 5 figures, NeurIPS 2021

  32. arXiv:2106.04156  [pdf, other

    cs.LG stat.ML

    Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

    Authors: Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

    Abstract: Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while kee** negative pairs far apart. Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of… ▽ More

    Submitted 23 June, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted as an oral to NeurIPS 2021

  33. arXiv:2105.13975  [pdf, other

    cs.LG cs.AI stat.ML

    Relation Matters in Sampling: A Scalable Multi-Relational Graph Neural Network for Drug-Drug Interaction Prediction

    Authors: Arthur Feeney, Rishabh Gupta, Veronika Thost, Rico Angell, Gayathri Chandu, Yash Adhikari, Tengfei Ma

    Abstract: Sampling is an established technique to scale graph neural networks to large graphs. Current approaches however assume the graphs to be homogeneous in terms of relations and ignore relation types, critically important in biomedical graphs. Multi-relational graphs contain various types of relations that usually come with variable frequency and have different importance for the problem at hand. We p… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  34. arXiv:2103.13462  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Why Do Local Methods Solve Nonconvex Problems?

    Authors: Tengyu Ma

    Abstract: Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

  35. arXiv:2102.11297  [pdf, other

    cs.LG stat.CO stat.ML

    You Only Compress Once: Optimal Data Compression for Estimating Linear Models

    Authors: Jeffrey Wong, Eskil Forsell, Randall Lewis, Tobias Mao, Matthew Wardrop

    Abstract: Linear models are used in online decision making, such as in machine learning, policy algorithms, and experimentation platforms. Many engineering systems that use linear models achieve computational efficiency through distributed systems and expert configuration. While there are strengths to this approach, it is still difficult to have an environment that enables researchers to interactively itera… ▽ More

    Submitted 3 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: v2: Further reduce matrix algebra and fix typo in Section 5.3.3. Improve the relationships across Section 5.3.1, 5.3.2, and 5.3.3. v3: Change citation styles and update Section 5.3.2

  36. arXiv:2102.03450  [pdf, other

    cs.LG stat.ML

    Wasserstein Graph Neural Networks for Graphs with Missing Attributes

    Authors: Zhixian Chen, Tengfei Ma, Yangqiu Song, Yang Wang

    Abstract: Missing node attributes is a common problem in real-world graphs. Graph neural networks have been demonstrated power in graph representation learning while their performance is affected by the completeness of graph information. Most of them are not specified for missing-attribute graphs and fail to leverage incomplete attribute information effectively. In this paper, we propose an innovative node… ▽ More

    Submitted 16 February, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

  37. arXiv:2012.04550  [pdf, other

    cs.LG stat.ML

    In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

    Authors: Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

    Abstract: Consider a prediction setting with few in-distribution labeled examples and many unlabeled examples both in- and out-of-distribution (OOD). The goal is to learn a model which performs well both in-distribution and OOD. In these settings, auxiliary information is often cheaply available for every input. How should we best leverage this auxiliary information for the prediction task? Empirically acro… ▽ More

    Submitted 7 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: ICLR 2021

  38. arXiv:2010.11356  [pdf, ps, other

    stat.ML cs.LG

    Beyond Lazy Training for Over-parameterized Tensor Decomposition

    Authors: Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

    Abstract: Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020; the first two authors contribute equally

  39. arXiv:2010.04591  [pdf, other

    stat.ML cs.LG eess.SY

    Physics-Informed Gaussian Process Regression for Probabilistic States Estimation and Forecasting in Power Grids

    Authors: Tong Ma, David Alonso Barajas-Solano, Ramakrishna Tipireddy, Alexandre M. Tartakovsky

    Abstract: Real-time state estimation and forecasting is critical for efficient operation of power grids. In this paper, a physics-informed Gaussian process regression (PhI-GPR) method is presented and used for probabilistic forecasting and estimating the phase angle, angular speed, and wind mechanical power of a three-generator power grid system using sparse measurements. In standard data-driven Gaussian pr… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    MSC Class: 62G99

  40. arXiv:2010.03622  [pdf, other

    cs.LG stat.ML

    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

    Authors: Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

    Abstract: Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised lear… ▽ More

    Submitted 20 April, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Published at ICLR 2021

  41. arXiv:2009.09259  [pdf, other

    cs.GT cs.IR cs.LG stat.ML

    Bid Shading by Win-Rate Estimation and Surplus Maximization

    Authors: Shengjun Pan, Brendan Kitts, Tian Zhou, Hao He, Bharatbhushan Shetty, Aaron Flores, Djordje Gligorijevic, Junwei Pan, Tingyu Mao, San Gultekin, Jianlong Zhang

    Abstract: This paper describes a new win-rate based bid shading algorithm (WR) that does not rely on the minimum-bid-to-win feedback from a Sell-Side Platform (SSP). The method uses a modified logistic regression to predict the profit from each possible shaded bid price. The function form allows fast maximization at run-time, a key requirement for Real-Time Bidding (RTB) systems. We report production result… ▽ More

    Submitted 19 September, 2020; originally announced September 2020.

    Comments: AdKDD 2020

  42. arXiv:2007.04596  [pdf, ps, other

    cs.LG math.OC stat.ML

    Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

    Authors: Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

    Abstract: We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}|W^{\star}x|$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Conference on Learning Theory (COLT) 2020

  43. arXiv:2007.04395  [pdf, other

    cs.LG cs.AI stat.ML

    Multilevel Graph Matching Networks for Deep Graph Similarity Learning

    Authors: Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling Ji

    Abstract: While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., be… ▽ More

    Submitted 7 August, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS)

  44. arXiv:2006.16205  [pdf, other

    cs.LG stat.ML

    Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

    Authors: Sang Michael Xie, Tengyu Ma, Percy Liang

    Abstract: We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the… ▽ More

    Submitted 24 October, 2023; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2021 Long talk

  45. arXiv:2006.15766  [pdf, other

    cs.LG stat.ML

    Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

    Authors: Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

    Abstract: Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed. Heteroskedasticity and imbalance challenge deep learning algorithms due to the difficulty of distinguishing among mislabeled, ambiguous, and rare examples. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a… ▽ More

    Submitted 18 March, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

    Comments: to appear in ICLR 2021

  46. arXiv:2006.14481  [pdf, other

    cs.LG stat.ML

    Active Online Learning with Hidden Shifting Domains

    Authors: Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

    Abstract: Online machine learning systems need to adapt to domain shifts. Meanwhile, acquiring label at every timestep is expensive. We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains. For online linear regression with oblivious adversaries, we provide a tight tradeoff that dep… ▽ More

    Submitted 25 February, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

  47. arXiv:2006.10288  [pdf, other

    stat.ML cs.LG

    Individual Calibration with Randomized Forecasting

    Authors: Shengjia Zhao, Tengyu Ma, Stefano Ermon

    Abstract: Machine learning applications often require calibrated predictions, e.g. a 90\% credible interval should contain the true outcome 90\% of the times. However, typical definitions of calibration only require this to hold on average, and offer no guarantees on predictions made on individual samples. Thus, predictions can be systematically over or under confident on certain subgroups, leading to issue… ▽ More

    Submitted 9 September, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  48. arXiv:2006.10032  [pdf, other

    cs.LG stat.ML

    Self-training Avoids Using Spurious Features Under Domain Shift

    Authors: Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

    Abstract: In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close. In practice, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. We identify and analyze one particular setting where the domain shift can be large, but these algorithms provably work: certa… ▽ More

    Submitted 7 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  49. arXiv:2006.08950  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Federated Accelerated Stochastic Gradient Descent

    Authors: Honglin Yuan, Tengyu Ma

    Abstract: We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for distributed optimization. FedAc is the first provable acceleration of FedAvg that improves convergence speed and communication efficiency on various types of convex functions. For example, for strongly convex and smooth functions, when using… ▽ More

    Submitted 5 June, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020. Best paper in International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2020 (FL-ICML'20). Code repository see https://github.com/hongliny/FedAc-NeurIPS20

  50. arXiv:2006.08875  [pdf, other

    cs.LG cs.AI stat.ML

    Model-based Adversarial Meta-Reinforcement Learning

    Authors: Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

    Abstract: Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this pape… ▽ More

    Submitted 27 February, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted by NeurIPS 2020. Code at https://github.com/LinZichuan/AdMRL