Skip to main content

Showing 1–50 of 111 results for author: Cao, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.15439  [pdf

    physics.soc-ph stat.AP

    Heterogeneous peer effects of college roommates on academic performance

    Authors: Yi Cao, Tao Zhou, Jian Gao

    Abstract: Understanding how student peers influence learning outcomes is crucial for effective education management in complex social systems. The complexities of peer selection and evolving peer relationships, however, pose challenges for identifying peer effects using static observational data. Here we use both null-model and regression approaches to examine peer effects using longitudinal data from 5,272… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 56 pages, 4 figures, 2 tables, with Supplementary Information

    Journal ref: Nature Communications, 15(1), 4785 (2024)

  2. arXiv:2406.10650  [pdf, other

    stat.ML cs.LG

    The Implicit Bias of Adam on Separable Data

    Authors: Chenyang Zhang, Difan Zou, Yuan Cao

    Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  3. arXiv:2402.03295  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

    Authors: Yongchang Hao, Yanshuai Cao, Lili Mou

    Abstract: Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the in… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  4. arXiv:2402.03293  [pdf, other

    cs.LG cs.AI stat.ML

    Flora: Low-Rank Adapters Are Secretly Gradient Compressors

    Authors: Yongchang Hao, Yanshuai Cao, Lili Mou

    Abstract: Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model perform… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted @ ICML 2024

  5. arXiv:2401.13624  [pdf, other

    stat.ML cs.LG

    Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

    Authors: Zhongjie Shi, Fanghui Liu, Yuan Cao, Johan A. K. Suykens

    Abstract: Adversarial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations. However, it is empirically observed that adversarial training on over-parameterized networks often suffers from the \textit{robust overfitting}: it can achieve almost zero adversarial training error while the robust generalization performance is not promising. In th… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  6. arXiv:2311.13958  [pdf, other

    stat.ML cs.CV cs.LG

    Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

    Authors: **g**g Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao, Xiaoqin Zhang, Xianta Jiang

    Abstract: Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-b… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  7. arXiv:2311.04550  [pdf, other

    cs.LG stat.ML

    Regression with Cost-based Rejection

    Authors: Xin Cheng, Yuzhou Cao, Haobo Wang, Hongxin Wei, Bo An, Lei Feng

    Abstract: Learning with rejection is an important framework that can refrain from making predictions to avoid critical mispredictions by balancing between prediction and rejection. Previous studies on cost-based rejection only focused on the classification setting, which cannot handle the continuous and infinite target space in the regression setting. In this paper, we investigate a novel regression problem… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted by NeurIPS 2023

  8. Split Knockoffs for Multiple Comparisons: Controlling the Directional False Discovery Rate

    Authors: Yang Cao, Xinwei Sun, Yuan Yao

    Abstract: Multiple comparisons in hypothesis testing often encounter structural constraints in various applications. For instance, in structural Magnetic Resonance Imaging for Alzheimer's Disease, the focus extends beyond examining atrophic brain regions to include comparisons of anatomically adjacent regions. These constraints can be modeled as linear transformations of parameters, where the sign patterns… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Journal of the American Statistical Association, 2023

  9. arXiv:2306.11680  [pdf, other

    cs.LG math.OC stat.ML

    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

    Authors: Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu

    Abstract: We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-Ω(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in t… ▽ More

    Submitted 11 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 53 pages, 2 figures

  10. arXiv:2303.17940  [pdf, other

    stat.ML cs.LG

    Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

    Authors: Xuran Meng, Yuan Cao, Difan Zou

    Abstract: Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regular… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  11. arXiv:2303.08433  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Benefits of Mixup for Feature Learning

    Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

    Abstract: Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 72 pages, 4 figures

  12. arXiv:2303.05793  [pdf, other

    stat.ME

    Analyzing covariate clustering effects in healthcare cost subgroups: insights and applications for prediction

    Authors: Zhengxiao Li, Yifan Huang, Yang Cao

    Abstract: Healthcare cost prediction is a challenging task due to the high-dimensionality and high correlation among covariates. Additionally, the skewed, heavy-tailed, and often multi-modal nature of cost data can complicate matters further due to unobserved heterogeneity. In this study, we propose a novel framework for finite mixture regression models that incorporates covariate clustering methods to bett… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 36 pages; 7 figures

  13. arXiv:2302.02334  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

    Authors: Chenyu Zheng, Guoqiang Wu, Fan Bao, Yue Cao, Chongxuan Li, Jun Zhu

    Abstract: A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the sta… ▽ More

    Submitted 29 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted by ICML 2023, 58 pages

  14. arXiv:2210.10479  [pdf, other

    physics.ao-ph astro-ph.EP stat.AP

    Inferring changes to the global carbon cycle with WOMBAT v2.0, a hierarchical flux-inversion framework

    Authors: Michael Bertolacci, Andrew Zammit-Mangion, Andrew Schuh, Beata Bukosa, Jenny Fisher, Yi Cao, Aleya Kaushik, Noel Cressie

    Abstract: The natural cycles of the surface-to-atmosphere fluxes of carbon dioxide (CO$_2$) and other important greenhouse gases are changing in response to human influences. These changes need to be quantified to understand climate change and its impacts, but this is difficult to do because natural fluxes occur over large spatial and temporal scales. To infer trends in fluxes and identify phase shifts and… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  15. arXiv:2210.10003  [pdf, other

    stat.AP math.OC stat.ML

    $k$-Means Clustering for Persistent Homology

    Authors: Yueqi Cao, Prudence Leung, Anthea Monod

    Abstract: Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we… ▽ More

    Submitted 25 November, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 21 pages, 6 figures

  16. arXiv:2208.09897  [pdf, other

    math.ST cs.LG stat.ML

    Multiple Descent in the Multiple Random Feature Model

    Authors: Xuran Meng, Jianfeng Yao, Yuan Cao

    Abstract: Recent works have demonstrated a double descent phenomenon in over-parameterized learning. Although this phenomenon has been investigated by recent works, it has not been fully understood in theory. In this paper, we investigate the multiple descent phenomenon in a class of multi-component prediction models. We first consider a ''double random feature model'' (DRFM) concatenating two types of rand… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: 89 pages, 9 figures. Version 3 adds new description of triple descent in certain double random feature model, deletes the discussion of NTK regimes, and adds more literature references

    MSC Class: 62R07

  17. arXiv:2207.03943  [pdf, ps, other

    math.MG stat.ME

    A Geometric Condition for Uniqueness of Fréchet Means of Persistence Diagrams

    Authors: Yueqi Cao, Anthea Monod

    Abstract: The Fréchet mean is an important statistical summary and measure of centrality of data; it has been defined and studied for persistent homology captured by persistence diagrams. However, the complicated geometry of the space of persistence diagrams implies that the Fréchet mean for a given set of persistence diagrams is not necessarily unique, which prohibits theoretical guarantees for empirical m… ▽ More

    Submitted 4 May, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 15 pages, 3 figures

  18. arXiv:2206.09908  [pdf, other

    math.ST stat.ML

    Learning Optimal Flows for Non-Equilibrium Importance Sampling

    Authors: Yu Cao, Eric Vanden-Eijnden

    Abstract: Many applications in computational sciences and statistical inference require the computation of expectations with respect to complex high-dimensional distributions with unknown normalization constants, as well as the estimation of these constants. Here we develop a method to perform these calculations based on generating samples from a simple base distribution, transporting them by the flow gener… ▽ More

    Submitted 24 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

  19. arXiv:2205.10833  [pdf, other

    stat.AP

    Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

    Authors: Yixiao Cao, **gchen Hu

    Abstract: The large number of publicly available survey datasets of wide variety, albeit useful, raise respondent-level privacy concerns. The synthetic data approach to data privacy and confidentiality has been shown useful in terms of privacy protection and utility preservation. This paper aims at illustrating how synthetic data can facilitate the dissemination of highly sensitive information about youth r… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  20. arXiv:2204.09155  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Approximating Persistent Homology for Large Datasets

    Authors: Yueqi Cao, Anthea Monod

    Abstract: Persistent homology is an important methodology from topological data analysis which adapts theory from algebraic topology to data settings and has been successfully implemented in many applications. It produces a statistical summary in the form of a persistence diagram, which captures the shape and size of the data. Despite its widespread use, persistent homology is simply impossible to implement… ▽ More

    Submitted 18 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: 24 pages, 9 figures

  21. arXiv:2202.06526  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting in Two-layer Convolutional Neural Networks

    Authors: Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu

    Abstract: Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there i… ▽ More

    Submitted 14 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: 42 pages, 1 figure. Version 3 improves the presentation and adds a comparison with a concurrent work

  22. arXiv:2112.15250  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting in Adversarially Robust Linear Classification

    Authors: **ghui Chen, Yuan Cao, Quanquan Gu

    Abstract: "Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community. To explain this surprising phenomenon, a series of works have provided theoretical justification in over-parameterized linear regression, classification, and kernel methods. However, it is not clear if benign overfitt… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: 24 pages, 5 figures

  23. arXiv:2110.15253  [pdf, other

    cs.LG stat.ML

    Understanding How Encoder-Decoder Architectures Attend

    Authors: Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

    Abstract: Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 10+14 pages, 16 figures. NeurIPS 2021

  24. arXiv:2108.11371  [pdf, other

    cs.LG math.OC stat.ML

    Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

    Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

    Abstract: Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a significantly worse test error in many deep learning applications such as image classification, even with a fine-tuned regularization. In this paper, we provide a theo… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 42 pages, 2 figures and 1 table

  25. arXiv:2106.08864  [pdf, other

    cs.LG stat.ML

    Multi-Class Classification from Single-Class Data with Confidences

    Authors: Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

    Abstract: Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 23 pages, 1 figure

  26. arXiv:2106.08360  [pdf, other

    stat.ME

    Multi-sample estimation of centered log-ratio matrix in microbiome studies

    Authors: Yezheng Li, Hongzhe Li, Yuanpei Cao

    Abstract: In microbiome studies, one of the ways of studying bacterial abundances is to estimate bacterial composition based on the sequencing read counts. Various transformations are then applied to such compositional data for downstream statistical analysis, among which the centered log-ratio (clr) transformation is most commonly used. Due to limited sequencing depth and DNA dropouts, many rare bacteria… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  27. arXiv:2104.13628  [pdf, other

    cs.LG math.ST stat.ML

    Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

    Authors: Yuan Cao, Quanquan Gu, Mikhail Belkin

    Abstract: Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mi… ▽ More

    Submitted 2 January, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 27 pages, 3 figures. In NeurIPS 2021

  28. arXiv:2104.01672  [pdf, other

    stat.ML cs.LG math.AT

    Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures

    Authors: Yueqi Cao, Athanasios Vlontzos, Luca Schmidtke, Bernhard Kainz, Anthea Monod

    Abstract: Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously c… ▽ More

    Submitted 6 July, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: 29 pages, 10 figures, 4 tables

    MSC Class: 68P15; 68P20; 55N31

    Journal ref: Information and Inference: A Journal of the IMA, Volume 12, Issue 3 (2023)

  29. Controlling the False Discovery Rate in Transformational Sparsity: Split Knockoffs

    Authors: Yang Cao, Xinwei Sun, Yuan Yao

    Abstract: Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include t… ▽ More

    Submitted 16 October, 2023; v1 submitted 30 March, 2021; originally announced March 2021.

    Journal ref: Journal of the Royal Statistical Society Series B: Statistical Methodology, 2023

  30. arXiv:2102.06879  [pdf, other

    stat.ML cs.LG

    Learning from Similarity-Confidence Data

    Authors: Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

    Abstract: Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data. In this paper, we investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data, where we aim to learn an effective binary classifier from only unlabeled data pairs equipped with confidence that illustrates th… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Comments: 33 pages, 5 figures

  31. arXiv:2102.04004  [pdf, other

    stat.AP astro-ph.IM

    WOMBAT: A fully Bayesian global flux-inversion framework

    Authors: Andrew Zammit-Mangion, Michael Bertolacci, Jenny Fisher, Ann Stavert, Matthew L. Rigby, Yi Cao, Noel Cressie

    Abstract: WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertaint… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: 46 pages, 13 figures

  32. arXiv:2101.05635  [pdf, other

    stat.ME

    Bayesian inference with tmbstan for a state-space model with VAR(1) state equation

    Authors: Yihan Cao, Jarle Tufto

    Abstract: When using R package tmbstan for Bayesian inference, the built-in feature Laplace approximation to the marginal likelihood with random effects integrated out can be switched on and off. There exists no guideline on whether Laplace approximation should be used to achieve better efficiency especially when the statistical model for estimating selection is complicated. To answer this question, we cond… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  33. arXiv:2101.01152  [pdf, other

    cs.LG math.OC stat.ML

    Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

    Authors: Spencer Frei, Yuan Cao, Quanquan Gu

    Abstract: We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Eq… ▽ More

    Submitted 15 February, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: 30 pages, 10 figures

  34. arXiv:2011.14145  [pdf, other

    cs.LG math.OC stat.ML

    A Backward SDE Method for Uncertainty Quantification in Deep Learning

    Authors: Richard Archibald, Feng Bao, Yanzhao Cao, He Zhang

    Abstract: We develop a probabilistic machine learning method, which formulates a class of stochastic neural networks by a stochastic optimal control problem. An efficient stochastic gradient descent algorithm is introduced under the stochastic maximum principle framework. Numerical experiments for applications of stochastic neural networks are carried out to validate the effectiveness of our methodology.

    Submitted 3 April, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

  35. arXiv:2010.15114  [pdf, other

    cs.LG cs.CL stat.ML

    The geometry of integration in text classification RNNs

    Authors: Kyle Aitken, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

    Abstract: Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing… ▽ More

    Submitted 3 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 9+19 pages, 30 figures; v2: smaller file size

  36. arXiv:2010.05250  [pdf, other

    stat.ML cs.CV cs.LG

    Domain Agnostic Learning for Unbiased Authentication

    Authors: Jian Liang, Yuren Cao, Shuang Li, Bing Bai, Hao Li, Fei Wang, Kun Bai

    Abstract: Authentication is the task of confirming the matching relationship between a data instance and a given identity. Typical examples of authentication problems include face recognition and person re-identification. Data-driven authentication could be affected by undesired biases, i.e., the models are often trained in one domain (e.g., for people wearing spring outfits) while applied in other domains… ▽ More

    Submitted 23 November, 2020; v1 submitted 11 October, 2020; originally announced October 2020.

  37. arXiv:2010.00539  [pdf, other

    cs.LG math.OC stat.ML

    Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

    Authors: Spencer Frei, Yuan Cao, Quanquan Gu

    Abstract: We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in… ▽ More

    Submitted 13 February, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: 25 pages, 1 table

  38. arXiv:2009.09925  [pdf, other

    cs.LG stat.ML

    Graph Based Multi-layer K-means++ (G-MLKM) for Sensory Pattern Analysis in Constrained Spaces

    Authors: Feng Tao, Rengan Suresh, Johnathan Votion, Yongcan Cao

    Abstract: In this paper, we focus on develo** a novel unsupervised machine learning algorithm, named graph based multi-layer k-means++ (G-MLKM), to solve data-target association problem when targets move on a constrained space and minimal information of the targets can be obtained by sensors. Instead of employing the traditional data-target association methods that are based on statistical probabilities,… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  39. arXiv:2009.09577  [pdf, other

    cs.LG cs.RO stat.ML

    Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization

    Authors: Feng Tao, Yongcan Cao

    Abstract: In this paper, we study the problem of obtaining a control policy that can mimic and then outperform expert demonstrations in Markov decision processes where the reward function is unknown to the learning agent. One main relevant approach is the inverse reinforcement learning (IRL), which mainly focuses on inferring a reward function from expert demonstrations. The obtained control policy by IRL a… ▽ More

    Submitted 22 September, 2020; v1 submitted 20 September, 2020; originally announced September 2020.

    Comments: 12 pages, 5 figures

  40. arXiv:2009.08063  [pdf, other

    cs.LG cs.CR stat.ML

    FLAME: Differentially Private Federated Learning in the Shuffle Model

    Authors: Ruixuan Liu, Yang Cao, Hong Chen, Ruoyang Guo, Masatoshi Yoshikawa

    Abstract: Federated Learning (FL) is a promising machine learning paradigm that enables the analyzer to train a model without collecting users' raw data. To ensure users' privacy, differentially private federated learning has been intensively studied. The existing works are mainly based on the \textit{curator model} or \textit{local model} of differential privacy. However, both of them have pros and cons. T… ▽ More

    Submitted 20 March, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: accepted by AAAI-21

  41. arXiv:2008.06653  [pdf, other

    cs.LG stat.ML

    Evaluating Lossy Compression Rates of Deep Generative Models

    Authors: Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

    Abstract: The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge. Log-likelihood is an appealing metric due to its grounding in statistics and information theory, but it can be challenging to estimate for implicit generative models, and scalar-valued metrics give an incomplete picture of a model's qua… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  42. arXiv:2006.12101  [pdf, other

    cs.LG cs.CR cs.DB stat.ML

    P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

    Authors: Shun Takagi, Tsubasa Takahashi, Yang Cao, Masatoshi Yoshikawa

    Abstract: How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately h… ▽ More

    Submitted 7 March, 2022; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: The version accepted at ICDE 2021 includes wrong proof in the Wishart mechanism. The current version fixes the problem

  43. Adversarial Infidelity Learning for Model Interpretation

    Authors: Jian Liang, Bing Bai, Yuren Cao, Kun Bai, Fei Wang

    Abstract: Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the… ▽ More

    Submitted 2 August, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), August 23--27, 2020, Virtual Event, USA

  44. arXiv:2005.14426  [pdf, other

    cs.LG math.OC stat.ML

    Agnostic Learning of a Single Neuron with Gradient Descent

    Authors: Spencer Frei, Yuan Cao, Quanquan Gu

    Abstract: We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(σ(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function $σ$ is an arbitrary Lipschitz and non-decreasin… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: 31 pages, 3 tables. This version improves the risk bound from O(OPT^1/2) to O(OPT) for strictly increasing activation functions

  45. arXiv:2003.12060  [pdf, other

    cs.CV cs.LG stat.ML

    Negative Margin Matters: Understanding Margin in Few-shot Classification

    Authors: Bin Liu, Yue Cao, Yutong Lin, Qi Li, Zheng Zhang, Mingsheng Long, Han Hu

    Abstract: This paper introduces a negative margin loss to metric learning based few-shot learning methods. The negative margin loss significantly outperforms regular softmax loss, and achieves state-of-the-art accuracy on three standard few-shot classification benchmarks with few bells and whistles. These results are contrary to the common practice in the metric learning field, that the margin is zero or po… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: Code is available at https://github.com/bl0/negative-margin.few-shot

  46. arXiv:2003.10637  [pdf, other

    cs.LG cs.CR stat.ML

    FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection

    Authors: Ruixuan Liu, Yang Cao, Masatoshi Yoshikawa, Hong Chen

    Abstract: As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning models. To prevent privacy leakages from gradients that are calculated on users' sensitive data, local differential privacy (LDP) has been considered as… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 18 pages, to be published in DASFAA 2020

  47. arXiv:2003.08793  [pdf

    cs.CV cs.LG stat.ML

    Deep Active Learning for Remote Sensing Object Detection

    Authors: Zhenshen Qu, **gda Du, Yong Cao, Qiuyu Guan, Pengbo Zhao

    Abstract: Recently, CNN object detectors have achieved high accuracy on remote sensing images but require huge labor and time costs on annotation. In this paper, we propose a new uncertainty-based active learning which can select images with more information for annotation and detector can still reach high performance with a fraction of the training images. Our method not only analyzes objects' classificati… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 6 pages, 3 figures

  48. arXiv:2003.06060  [pdf, other

    cs.LG cs.AI stat.ML

    Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

    Authors: Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

    Abstract: We show that the sum of the implicit generator log-density $\log p_g$ of a GAN with the logit score of the discriminator defines an energy function which yields the true data density when the generator is imperfect but the discriminator is optimal, thus making it possible to improve on the typical generator (with implicit density $p_g$). To make that practical, we show that sampling from this modi… ▽ More

    Submitted 7 July, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

  49. arXiv:2002.10501  [pdf, other

    cs.LG stat.ML

    Variational Hyper RNN for Sequence Modeling

    Authors: Ruizhi Deng, Yanshuai Cao, Bo Chang, Leonid Sigal, Greg Mori, Marcus A. Brubaker

    Abstract: In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. T… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

  50. arXiv:2002.05712  [pdf, other

    cs.LG cs.CV stat.ML

    Cross-Iteration Batch Normalization

    Authors: Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

    Abstract: A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple rec… ▽ More

    Submitted 25 March, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to CVPR 2021