Skip to main content

Showing 1–50 of 59 results for author: Shen, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.09084  [pdf, other

    stat.ML cs.LG

    Operator-informed score matching for Markov diffusion models

    Authors: Zheyang Shen, Chris J. Oates

    Abstract: Diffusion models are typically trained using score matching, yet score matching is agnostic to the particular forward process that defines the model. This paper argues that Markov diffusion models enjoy an advantage over other types of diffusion model, as their associated operators can be exploited to improve the training process. In particular, (i) there exists an explicit formal solution to the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint; 19 pages, 5 figures

  2. arXiv:2405.18373  [pdf, other

    stat.ML cs.LG math.OC

    A Hessian-Aware Stochastic Differential Equation for Modelling SGD

    Authors: Xiang Li, Zebang Shen, Liang Zhang, Niao He

    Abstract: Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its esca** behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these behaviors, even for simple quadratic objectives. Built on a novel stochastic backward error analysis framework, we derive the Hessian-Aware Stochastic Modified Equatio… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.14778  [pdf, ps, other

    stat.ML cs.LG

    Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

    Authors: Dimitri Meunier, Zikai Shen, Mattes Mollenhauer, Arthur Gretton, Zhu Li

    Abstract: We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression, various implementations of gradient descent and many more. Our contributions are twofold. First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2405.07552  [pdf, other

    stat.ML cs.LG stat.ME

    Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery

    Authors: Caixing Wang, Ziliang Shen

    Abstract: In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tack… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Forty-first International Conference on Machine Learning (ICML 2024), 27 pages, 4 figures, 14 tables

  5. arXiv:2405.03329  [pdf, other

    cs.LG stat.ML

    Policy Learning for Balancing Short-Term and Long-Term Rewards

    Authors: Peng Wu, Ziyu Shen, Feng Xie, Zhongyao Wang, Chunchen Liu, Yan Zeng

    Abstract: Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both lon… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2309.12600  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Multiply Robust Federated Estimation of Targeted Average Treatment Effects

    Authors: Larry Han, Zhu Shen, Jose Zubizarreta

    Abstract: Federated or multi-site studies have distinct advantages over single-site studies, including increased generalizability, the ability to study underrepresented populations, and the opportunity to study rare exposures and outcomes. However, these studies are challenging due to the need to preserve the privacy of each individual's data and the heterogeneity in their covariate distributions. We propos… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted at NeurIPS 2023

  7. arXiv:2309.05505  [pdf, other

    cs.LG stat.ML

    Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning

    Authors: Zebang Shen, Jiayuan Ye, Anmin Kang, Hamed Hassani, Reza Shokri

    Abstract: Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions,… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: ICLR 2023 revised

  8. arXiv:2308.08025  [pdf, other

    quant-ph cs.AI cs.ET cs.LG stat.ML

    Potential Energy Advantage of Quantum Economy

    Authors: Junyu Liu, Hansheng Jiang, Zuo-Jun Max Shen

    Abstract: Energy cost is increasingly crucial in the modern computing industry with the wide deployment of large-scale machine learning models and language models. For the firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. In this paper, we study the energy benefits of quantum computing vis-a-vis c… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 23 pages, many figures

  9. arXiv:2308.06717  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

    Authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani

    Abstract: In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: 72 pages, 6 figures. arXiv admin note: text overlap with arXiv:2304.07407

  10. arXiv:2305.06584  [pdf, other

    cs.LG math.OC stat.ML

    Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

    Authors: Mo Liu, Paul Grigas, Heyuan Liu, Zuo-Jun Max Shen

    Abstract: We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the de… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  11. arXiv:2304.07407  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

    Authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani

    Abstract: Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes… ▽ More

    Submitted 7 May, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: 50 pages, 4 figures

  12. arXiv:2304.01303  [pdf, ps, other

    cs.LG stat.ML

    Improved Bound for Mixing Time of Parallel Tempering

    Authors: Holden Lee, Zeyu Shen

    Abstract: In the field of sampling algorithms, MCMC (Markov Chain Monte Carlo) methods are widely used when direct sampling is not possible. However, multimodality of target distributions often leads to slow convergence and mixing. One common solution is parallel tempering. Though highly effective in practice, theoretical guarantees on its performance are limited. In this paper, we present a new lower bound… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  13. arXiv:2212.00992  [pdf, other

    cs.LG stat.ML

    Stable Learning via Sparse Variable Independence

    Authors: Han Yu, Peng Cui, Yue He, Zheyan Shen, Yong Lin, Renzhe Xu, Xingxuan Zhang

    Abstract: The problem of covariate-shift generalization has attracted intensive research attention. Previous stable learning algorithms employ sample reweighting schemes to decorrelate the covariates when there is no explicit domain information about training data. However, with finite samples, it is difficult to achieve the desirable weights that ensure perfect independence to get rid of the unstable varia… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: Accepted by AAAI 2023

  14. arXiv:2205.09459  [pdf, other

    cs.LG stat.ML

    Neural Network Architecture Beyond Width and Depth

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional archite… ▽ More

    Submitted 14 January, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Journal ref: Advances in Neural Information Processing Systems, 35:5669--5681, 2022

  15. arXiv:2111.07964  [pdf, other

    cs.LG stat.ML

    Deep Network Approximation in Terms of Intrinsic Parameters

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting… ▽ More

    Submitted 14 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:19909-19934, 2022

  16. arXiv:2111.02355  [pdf, other

    cs.LG stat.ML

    A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization

    Authors: Renzhe Xu, Xingxuan Zhang, Zheyan Shen, Tong Zhang, Peng Cui

    Abstract: Covariate-shift generalization, a typical case in out-of-distribution (OOD) generalization, requires a good performance on the unknown test distribution, which varies from the accessible training distribution in the form of covariate shift. Recently, independence-driven importance weighting algorithms in stable learning literature have shown empirical effectiveness to deal with covariate-shift gen… ▽ More

    Submitted 17 October, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: ICML 2022

  17. arXiv:2110.12351  [pdf, other

    stat.ML cs.LG

    Integrated Conditional Estimation-Optimization

    Authors: Meng Qi, Paul Grigas, Zuo-Jun Max Shen

    Abstract: Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that es… ▽ More

    Submitted 1 August, 2023; v1 submitted 24 October, 2021; originally announced October 2021.

  18. arXiv:2110.03768  [pdf, other

    stat.ML cs.LG

    De-randomizing MCMC dynamics with the diffusion Stein operator

    Authors: Zheyang Shen, Markus Heinonen, Samuel Kaski

    Abstract: Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 22 pages, 6 figures. NeurIPS 2021

  19. arXiv:2107.11732  [pdf, other

    cs.LG econ.EM q-bio.QM stat.ME

    Federated Causal Inference in Heterogeneous Observational Data

    Authors: Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

    Abstract: We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the… ▽ More

    Submitted 2 April, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

  20. arXiv:2107.02397  [pdf, other

    cs.LG stat.ML

    Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple, computable, and continuous activation function $σ$ leveraging a triangular-wave function and the softsign function. We first prove that $σ$-activated ne… ▽ More

    Submitted 26 September, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

    Journal ref: Journal of Machine Learning Research, Volume 23, Issue 276, September 2022, Pages 1--60

  21. Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\, N+2\}\big)$ and depth $\mathcal{O}(L)$ can approximate a Hölder continuous function on $[0,1]^d$ with an approximation rate… ▽ More

    Submitted 24 July, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Journal ref: Journal de Mathématiques Pures et Appliquées, Volume 157, January 2022, Pages 101-135

  22. arXiv:2012.13892  [pdf, other

    cs.LG stat.ML

    Adaptive Graph-based Generalized Regression Model for Unsupervised Feature Selection

    Authors: Yanyong Huang, Zongxin Shen, Fuxu Cai, Tianrui Li, Fengmao Lv

    Abstract: Unsupervised feature selection is an important method to reduce dimensions of high dimensional data without labels, which is benefit to avoid ``curse of dimensionality'' and improve the performance of subsequent machine learning tasks, like clustering and retrieval. How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection. Many proposed method… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

  23. arXiv:2011.04162  [pdf, other

    stat.ML cs.LG

    Sinkhorn Natural Gradient for Generative Models

    Authors: Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani

    Abstract: We consider the problem of minimizing a functional over a parametric family of probability measures, where the parameterization is characterized via a push-forward structure. An important application of this problem is in training generative adversarial networks. In this regard, we propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: accepted to NeurIPS 2020

  24. Neural Network Approximation: Three Hidden Layers Are Enough

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2^x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter… ▽ More

    Submitted 19 April, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Journal ref: Neural Networks, Volume 141, September 2021, Pages 160-173

  25. arXiv:2010.01762  [pdf, other

    cs.LG cs.CV stat.ML

    OLALA: Object-Level Active Learning for Efficient Document Layout Annotation

    Authors: Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, Weining Li

    Abstract: Document images often have intricate layout structures, with numerous content regions (e.g. texts, figures, tables) densely arranged on each page. This makes the manual annotation of layout datasets expensive and inefficient. These characteristics also challenge existing active learning methods, as image-level scoring and selection suffer from the overexposure of common objects.Inspired by recent… ▽ More

    Submitted 29 March, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

    Comments: 12 pages, 7 figures, 5 tables

  26. arXiv:2007.10449  [pdf, other

    cs.LG stat.ML

    Sinkhorn Barycenter via Functional Gradient Descent

    Authors: Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani

    Abstract: In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. Unlike previous approaches which directly operate in the space of probability measures,… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: submitted to NIPS 2020

  27. arXiv:2006.13326  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Safe Learning under Uncertain Objectives and Constraints

    Authors: Mohammad Fereydounian, Zebang Shen, Aryan Mokhtari, Amin Karbasi, Hamed Hassani

    Abstract: In this paper, we consider non-convex optimization problems under \textit{unknown} yet safety-critical constraints. Such problems naturally arise in a variety of domains including robotics, manufacturing, and medical procedures, where it is infeasible to know or identify all the constraints. Therefore, the parameter space should be explored in a conservative way to ensure that none of the constrai… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 42 pages, 2 figures

  28. arXiv:2006.12231  [pdf, other

    cs.LG math.NA stat.ML

    Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation function in each neuron and hence we call such networks Floor-ReLU networks. For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, it is shown that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can uniformly appr… ▽ More

    Submitted 26 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Journal ref: Neural Computation, Volume 33, Issue 4, April 2021, Pages 1005-1036

  29. Algorithmic Decision Making with Conditional Fairness

    Authors: Renzhe Xu, Peng Cui, Kun Kuang, Bo Li, Linjun Zhou, Zheyan Shen, Wei Cui

    Abstract: Nowadays fairness issues have raised great concerns in decision-making systems. Various fairness notions have been proposed to measure the degree to which an algorithm is unfair. In practice, there frequently exist a certain set of variables we term as fair variables, which are pre-decision covariates such as users' choices. The effects of fair variables are irrelevant in assessing the fairness of… ▽ More

    Submitted 18 July, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: KDD 2020

  30. arXiv:2006.05371  [pdf, other

    stat.ME cs.LG math.NA stat.ML

    Bayesian Probabilistic Numerical Integration with Tree-Based Models

    Authors: Harrison Zhu, Xing Liu, Ruya Kang, Zhichao Shen, Seth Flaxman, François-Xavier Briol

    Abstract: Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting ver… ▽ More

    Submitted 2 December, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  31. arXiv:2006.04414  [pdf, other

    cs.LG stat.ML

    Stable Adversarial Learning under Distributional Shifts

    Authors: Jiashuo Liu, Zheyan Shen, Peng Cui, Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin

    Abstract: Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. Recently, there are robust learning methods aiming at this problem by minimizing the worst-case risk over an uncertainty set. However, they equally treat all covariates to form the decision sets regardless of the stabilit… ▽ More

    Submitted 10 May, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 11 pages

    Journal ref: Association for the Advancement of Artificial Intelligence, 2021

  32. arXiv:2005.09856  [pdf, other

    cs.LG stat.ML

    A Novel Meta Learning Framework for Feature Selection using Data Synthesis and Fuzzy Similarity

    Authors: Zixiao Shen, Xin Chen, Jonathan M. Garibaldi

    Abstract: This paper presents a novel meta learning framework for feature selection (FS) based on fuzzy similarity. The proposed method aims to recommend the best FS method from four candidate FS methods for any given dataset. This is achieved by firstly constructing a large training data repository using data synthesis. Six meta features that represent the characteristics of the training dataset are then e… ▽ More

    Submitted 20 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

  33. arXiv:2005.05003  [pdf, other

    cs.LG stat.ML

    A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets

    Authors: Zixiao Shen, Xin Chen, Jonathan M. Garibaldi

    Abstract: In this paper, we propose a novel weighted combination feature selection method using bootstrap and fuzzy sets. The proposed method mainly consists of three processes, including fuzzy sets generation using bootstrap, weighted combination of fuzzy sets and feature ranking based on defuzzification. We implemented the proposed method by combining four state-of-the-art feature selection methods and ev… ▽ More

    Submitted 21 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  34. arXiv:2005.04888  [pdf, other

    cs.LG stat.ML

    Performance Optimization of a Fuzzy Entropy based Feature Selection and Classification Framework

    Authors: Zixiao Shen, Xin Chen, Jonathan M. Garibaldi

    Abstract: In this paper, based on a fuzzy entropy feature selection framework, different methods have been implemented and compared to improve the key components of the framework. Those methods include the combinations of three ideal vector calculations, three maximal similarity classifiers and three fuzzy entropy functions. Different feature removal orders based on the fuzzy entropy values were also compar… ▽ More

    Submitted 21 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  35. arXiv:2004.08620  [pdf, other

    cs.LG stat.ML

    Optimization in Machine Learning: A Distribution Space Approach

    Authors: Yongqiang Cai, Qianxiao Li, Zuowei Shen

    Abstract: We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training pa… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

    Comments: 26 pages, 12 figures

  36. arXiv:2003.09821  [pdf, other

    cs.LG stat.ML

    BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels

    Authors: Zan Shen, Jiang Qian, Bo** Zhuang, Shaojun Wang, **g Xiao

    Abstract: One-Shot methods have evolved into one of the most popular methods in Neural Architecture Search (NAS) due to weight sharing and single training of a supernet. However, existing methods generally suffer from two issues: predetermined number of channels in each layer which is suboptimal; and model averaging effects and poor ranking correlation caused by weight coupling and continuously expanding se… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

    Comments: 14 pages

  37. arXiv:2003.03080  [pdf, other

    stat.ML cs.LG

    Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations

    Authors: Simone Rossi, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, Maurizio Filippone

    Abstract: Variational inference techniques based on inducing variables provide an elegant framework for scalable posterior estimation in Gaussian process (GP) models. Besides enabling scalability, one of their main advantages over sparse approximations using direct marginal likelihood maximization is that they provide a robust alternative for point estimation of the inducing inputs, i.e. the location of the… ▽ More

    Submitted 23 February, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

  38. arXiv:2001.11359  [pdf, other

    cs.LG stat.ML

    FOCUS: Dealing with Label Quality Disparity in Federated Learning

    Authors: Yiqiang Chen, Xiaodong Yang, Xin Qin, Han Yu, Biao Chen, Zhiqi Shen

    Abstract: Ubiquitous systems with End-Edge-Cloud architecture are increasingly being used in healthcare applications. Federated Learning (FL) is highly useful for such applications, due to silo effect and privacy preserving. Existing FL approaches generally do not account for disparities in the quality of local data labels. However, the clients in ubiquitous systems tend to suffer from label noise due to va… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 7 pages

  39. arXiv:2001.03040  [pdf, other

    cs.LG math.NA stat.ML

    Deep Network Approximation for Smooth Functions

    Authors: Jianfeng Lu, Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: This paper establishes the (nearly) optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$.… ▽ More

    Submitted 24 September, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

    Journal ref: SIAM Journal on Mathematical Analysis, Volume 53, Issue 5, September 2021, Pages 5465-5506

  40. arXiv:1912.10382  [pdf, ps, other

    cs.LG math.OC stat.ML

    Deep Learning via Dynamical Systems: An Approximation Perspective

    Authors: Qianxiao Li, Ting Lin, Zuowei Shen

    Abstract: We build on the dynamical systems approach to deep learning, where deep residual networks are idealized as continuous-time dynamical systems, from the approximation perspective. In particular, we establish general sufficient conditions for universal approximation using continuous-time deep residual networks, which can also be understood as approximation theories in $L^p$ using flow maps of dynamic… ▽ More

    Submitted 7 June, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

    Comments: Revision 1

  41. arXiv:1911.12580  [pdf, other

    cs.LG stat.ML

    Stable Learning via Sample Reweighting

    Authors: Zheyan Shen, Peng Cui, Tong Zhang, Kun Kuang

    Abstract: We consider the problem of learning linear prediction models with model misspecification bias. In such case, the collinearity among input variables may inflate the error of parameter estimation, resulting in instability of prediction results when training and test distributions do not match. In this paper we theoretically analyze this fundamental problem and propose a sample reweighting method tha… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted as poster paper at AAAI2020

  42. arXiv:1910.14380  [pdf, other

    math.OC cs.LG stat.ML

    A Decentralized Proximal Point-type Method for Saddle Point Problems

    Authors: Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

    Abstract: In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point metho… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: 18 pages

  43. arXiv:1910.09396  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

    Authors: Jiahao Xie, Zebang Shen, Chao Zhang, Boyu Wang, Hui Qian

    Abstract: This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By e… ▽ More

    Submitted 23 October, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: 15 pages, 3 figures

  44. arXiv:1910.09223  [pdf, ps, other

    cs.LG stat.ML

    Aggregated Gradient Langevin Dynamics

    Authors: Chao Zhang, Jiahao Xie, Zebang Shen, Peilin Zhao, Tengfei Zhou, Hui Qian

    Abstract: In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the fi… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

  45. arXiv:1910.04322  [pdf, other

    math.OC cs.LG stat.ML

    One Sample Stochastic Frank-Wolfe

    Authors: Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit.… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  46. arXiv:1905.09917  [pdf, other

    stat.ML cs.LG

    Learning spectrograms with convolutional spectral kernels

    Authors: Zheyang Shen, Markus Heinonen, Samuel Kaski

    Abstract: We introduce the convolutional spectral kernel (CSK), a novel family of non-stationary, nonparametric covariance kernels for Gaussian process (GP) models, derived from the convolution between two imaginary radial basis functions. We present a principled framework to interpret CSK, as well as other deep probabilistic models, using approximated Fourier transform, yielding a concise representation of… ▽ More

    Submitted 14 October, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: 15 pages, 7 figures

  47. arXiv:1903.01540  [pdf, other

    math.OC stat.ML

    A Stochastic Trust Region Method for Non-convex Minimization

    Authors: Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro

    Abstract: We target the problem of finding a local minimum in non-convex finite-sum minimization. Towards this goal, we first prove that the trust region method with inexact gradient and Hessian estimation can achieve a convergence rate of order $\mathcal{O}(1/{k^{2/3}})$ as long as those differential estimations are sufficiently accurate. Combining such result with a novel Hessian estimator, we propose the… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

  48. Nonlinear Approximation via Compositions

    Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

    Abstract: Given a function dictionary $\cal D$ and an approximation budget $N\in\mathbb{N}^+$, nonlinear approximation seeks the linear combination of the best $N$ terms $\{T_n\}_{1\le n\le N}\subseteq{\cal D}$ to approximate a given function $f$ with the minimum approximation error\[\varepsilon_{L,f}:=\min_{\{g_n\}\subseteq{\mathbb{R}},\{T_n\}\subseteq{\cal D}}\|f(x)-\sum_{n=1}^N g_n T_n(x)\|.\]Motivated b… ▽ More

    Submitted 5 November, 2020; v1 submitted 26 February, 2019; originally announced February 2019.

    Journal ref: Neural Networks, Volume 119, November 2019, Pages 74-84

  49. arXiv:1812.03643  [pdf, other

    physics.soc-ph stat.AP

    Increasing trend of scientists to switch between topics

    Authors: An Zeng, Zhesi Shen, Jianlin Zhou, Ying Fan, Zengru Di, Yougui Wang, H. Eugene Stanley, Shlomo Havlin

    Abstract: We analyze the publication records of individual scientists, aiming to quantify the topic switching dynamics of scientists and its influence. For each scientist, the relations among her publications are characterized via shared references. We find that the co-citing network of the papers of a scientist exhibits a clear community structure where each major community represents a research topic. Our… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: 37 pages, 21 figures

  50. arXiv:1810.13306  [pdf, other

    cs.AI cs.LG stat.ML

    Automated Machine Learning: From Principles to Practices

    Authors: Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, Quanming Yao

    Abstract: Machine learning (ML) methods have been develo** rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on thi… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: This is a preliminary and will be kept updated