Skip to main content

Showing 1–50 of 59 results for author: Zheng, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  2. arXiv:2402.14090  [pdf, other

    cs.AI econ.GN stat.ML

    Social Environment Design

    Authors: Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

    Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The fra… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 Position Paper. Website at https://sed.eddie.win

  3. arXiv:2312.00186  [pdf, other

    stat.AP cs.AI

    Planning Reliability Assurance Tests for Autonomous Vehicles

    Authors: Simin Zheng, Lu Lu, Yili Hong, Jian Liu

    Abstract: Artificial intelligence (AI) technology has become increasingly prevalent and transforms our everyday life. One important application of AI technology is the development of autonomous vehicles (AV). However, the reliability of an AV needs to be carefully demonstrated via an assurance test so that the product can be used with confidence in the field. To plan for an assurance test, one needs to dete… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 29 pages, 5 figures

  4. arXiv:2311.07972  [pdf, other

    stat.ME

    Residual Importance Weighted Transfer Learning For High-dimensional Linear Regression

    Authors: Junlong Zhao, Shengbin Zheng, Chenlei Leng

    Abstract: Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner,… ▽ More

    Submitted 3 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  5. arXiv:2310.13911  [pdf, other

    stat.ME stat.AP

    Multilevel Matrix Factor Model

    Authors: Yuteng Zhang, Yongchang Hui, Junrong Song, Shurong Zheng

    Abstract: Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix t… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 47 pages, 22 figures

  6. arXiv:2309.16578  [pdf, other

    stat.ML cs.LG physics.chem-ph

    Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

    Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

    Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More

    Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

  7. arXiv:2309.04072  [pdf, ps, other

    math.NA cs.LG stat.ML

    Riemannian Langevin Monte Carlo schemes for sampling PSD matrices with fixed rank

    Authors: Tianmin Yu, Shixin Zheng, Jianfeng Lu, Govind Menon, Xiangxiong Zhang

    Abstract: This paper introduces two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$. Given an energy function $\mathcal E:\mathcal S^{n,p}_+\to \mathbb{R}$ and certain Riemannian metrics $g$ on $\mathcal S^{n,p}_+$, these schemes rely on an Euler-Maruyama discretization of the Ri… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  8. arXiv:2308.08364  [pdf, other

    stat.AP stat.ME

    False Discovery Rate Control for Lesion-Symptom Map** with Heterogeneous data via Weighted P-values

    Authors: Siyu Zheng, Alexander C. McLain, Joshua Habiger, Christopher Rorden, Julius Fridriksson

    Abstract: Lesion-symptom map** studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is non-uniformly distributed with some voxels… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    MSC Class: 62J15

  9. arXiv:2305.18258  [pdf, other

    cs.LG cs.AI cs.GT math.OC stat.ML

    Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

    Authors: Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

    Abstract: In online reinforcement learning (online RL), balancing exploration and exploitation is crucial for finding an optimal policy in a sample-efficient way. To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration. However, in order to cope with general function approximators, most of them involve impractical algorithm… ▽ More

    Submitted 25 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  10. arXiv:2211.01962  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

    Authors: Han Zhong, Wei Xiong, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

    Abstract: We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generali… ▽ More

    Submitted 30 June, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: We changed the title from the first version. We fixed a technical issue in the first version regarding the $\ell_2$ eluder technique (Lemma D.2)

  11. arXiv:2210.16350  [pdf, other

    stat.OT

    A Comparison of Reproducibility Guidelines and Its Implications on Undergraduate Statistical Education

    Authors: Siqi Zheng

    Abstract: In this paper, we replicated a Bayesian educational research project, which explores the association between broadband access and online course enrollment in the US. We summarized key findings from our replication and compared them with the original project. Based on my replication experience, we aim to demonstrate the challenges of research reproduction, even when codes and data are shared openly… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  12. arXiv:2210.01765  [pdf, other

    cs.LG q-bio.BM stat.ML

    One Transformer Can Understand Both 2D & 3D Molecular Data

    Authors: Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

    Abstract: Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to… ▽ More

    Submitted 27 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: 20 pages; ICLR 2023, Camera Ready Version; Code: https://github.com/lsj2408/Transformer-M

  13. arXiv:2207.10772  [pdf, other

    stat.ML cs.LG

    Deep Sufficient Representation Learning via Mutual Information

    Authors: Siming Zheng, Yuanyuan Lin, Jian Huang

    Abstract: We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or c… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: 43 pages, 6 figures and 5 tables

    MSC Class: 62G05; 68T07

  14. arXiv:2205.13401  [pdf, other

    cs.LG cs.CL stat.ML

    Your Transformer May Not be as Powerful as You Expect

    Authors: Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

    Abstract: Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximati… ▽ More

    Submitted 28 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 22 pages; NeurIPS 2022, Camera Ready Version

  15. arXiv:2204.11155  [pdf, ps, other

    stat.ME

    Adaptive Tests for Bandedness of High-dimensional Covariance Matrices

    Authors: Xiaoyi Wang, Gongjun Xu, Shurong Zheng

    Abstract: Estimation of the high-dimensional banded covariance matrix is widely used in multivariate statistical analysis. To ensure the validity of estimation, we aim to test the hypothesis that the covariance matrix is banded with a certain bandwidth under the high-dimensional framework. Though several testing methods have been proposed in the literature, the existing tests are only powerful for some alte… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

  16. arXiv:2203.12003  [pdf, ps, other

    stat.ME

    On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

    Authors: Bingxin Zhao, Shurong Zheng, Hongtu Zhu

    Abstract: Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a block-diagonal structure, many existing methods attem… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 27 pages, 5 figures

  17. arXiv:2203.07681  [pdf, other

    cs.LG cs.AI stat.ML

    DEPTS: Deep Expansion Learning for Periodic Time Series Forecasting

    Authors: Wei Fan, Shun Zheng, Xiaohan Yi, Wei Cao, Yanjie Fu, Jiang Bian, Tie-Yan Liu

    Abstract: Periodic time series (PTS) forecasting plays a crucial role in a variety of industries to foster critical tasks, such as early warning, pre-planning, resource scheduling, etc. However, the complicated dependencies of the PTS signal on its inherent periodicity as well as the sophisticated composition of various periods hinder the performance of PTS forecasting. In this paper, we introduce a deep ex… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: ICLR22 Spotlight

  18. arXiv:2202.09784  [pdf, other

    cs.LG cs.AI cs.CV stat.ME

    Clustering by the Probability Distributions from Extreme Value Theory

    Authors: Sixiao Zheng, Ke Fan, Yanxi Hou, Jianfeng Feng, Yanwei Fu

    Abstract: Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probabili… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: IEEE Transactions on Artificial Intelligence

  19. arXiv:2106.12566  [pdf, other

    cs.LG cs.CL stat.ML

    Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

    Authors: Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

    Abstract: The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful atte… ▽ More

    Submitted 2 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021, camera ready version

  20. arXiv:2105.07829  [pdf, other

    cs.DC cs.LG stat.ML

    Compressed Communication for Distributed Training: Adaptive Methods and System

    Authors: Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

    Abstract: Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training. However, there is little understanding of applying gradient compression to adaptive gradient methods. Moreover, its performance benefits are often limited by the n… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  21. arXiv:2103.07756  [pdf, other

    cs.LG cs.CV stat.AP stat.ML

    Learning with Feature-Dependent Label Noise: A Progressive Approach

    Authors: Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, Chao Chen

    Abstract: Label noise is frequently observed in real-world large-scale datasets. The noise is introduced due to a variety of reasons; it is heterogeneous and feature-dependent. Most existing approaches to handling noisy labels fall into two categories: they either assume an ideal feature-independent noise, or remain heuristic without theoretical guarantees. In this paper, we propose to target a new family o… ▽ More

    Submitted 27 March, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: ICLR 2021 (Spotlight)

  22. arXiv:2012.11100  [pdf, other

    stat.ME

    Two-directional simultaneous inference for high-dimensional models

    Authors: Wei Liu, Huazhen Lin, ** Liu, Shurong Zheng

    Abstract: This paper proposes a general two directional simultaneous inference (TOSI) framework for high-dimensional models with a manifest variable or latent variable structure, for example, high-dimensional mean models, high-dimensional sparse regression models, and high-dimensional latent factors models. TOSI performs simultaneous inference on a set of parameters from two directions, one to test whether… ▽ More

    Submitted 6 February, 2023; v1 submitted 20 December, 2020; originally announced December 2020.

  23. arXiv:2007.13221  [pdf, other

    cs.LG cs.DC stat.ML

    CSER: Communication-efficient SGD with Error Reset

    Authors: Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

    Abstract: The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. S… ▽ More

    Submitted 4 December, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

  24. arXiv:2006.13484  [pdf, other

    cs.LG cs.CL cs.DC stat.ML

    Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

    Authors: Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

    Abstract: BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which result in long training time and impede development progress. Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to redu… ▽ More

    Submitted 18 September, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: Technical Report (not under reviewed in any venue)

  25. arXiv:2004.13332  [pdf, other

    econ.GN cs.LG stat.ML

    The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies

    Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher

    Abstract: Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-le… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 46 pages, 21 figures

  26. arXiv:2002.05712  [pdf, other

    cs.LG cs.CV stat.ML

    Cross-Iteration Batch Normalization

    Authors: Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

    Abstract: A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple rec… ▽ More

    Submitted 25 March, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to CVPR 2021

  27. arXiv:2002.05578  [pdf, other

    cs.LG stat.ML

    Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis

    Authors: Jung Yeon Park, Kenneth Theo Carr, Stephan Zheng, Yisong Yue, Rose Yu

    Abstract: Efficient and interpretable spatial analysis is crucial in many fields such as geology, sports, and climate science. Tensor latent factor models can describe higher-order correlations for spatial data. However, they are computationally expensive to train and are sensitive to initialization, leading to spatially incoherent, uninterpretable results. We develop a novel Multiresolution Tensor Learning… ▽ More

    Submitted 14 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  28. arXiv:2002.04745  [pdf, other

    cs.LG cs.CL stat.ML

    On Layer Normalization in the Transformer Architecture

    Authors: Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu

    Abstract: The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show… ▽ More

    Submitted 29 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Journal ref: Published on ICML 2020

  29. arXiv:1912.01899  [pdf, other

    cs.LG stat.ML

    Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning

    Authors: Shuai Zheng, Zhenfeng Zhu, Xingxing Zhang, Zhizhe Liu, Jian Cheng, Yao Zhao

    Abstract: Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input of many compute vision tasks. However, most existing algorithms ignore the existence of inherent data distribution and even noises. This may significantly increase the phenomenon of over-fitting and deteriorate the testing accuracy. In this paper, we propose a Distribution-induce… ▽ More

    Submitted 2 August, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted to CVPR2020. 10 pages, 5 figures, 4 tables, fixed a error in the Figure.1

    Journal ref: booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={7224--7233}, year={2020}

  30. arXiv:1910.02035  [pdf, other

    cs.LG cs.AI stat.ML

    Manufacturing Dispatching using Reinforcement and Transfer Learning

    Authors: Shuai Zheng, Chetan Gupta, Susumu Serita

    Abstract: Efficient dispatching rule in manufacturing industry is key to ensure product on-time delivery and minimum past-due and inventory cost. Manufacturing, especially in the developed world, is moving towards on-demand manufacturing meaning a high mix, low volume product mix. This requires efficient dispatching that can work in dynamic and stochastic environments, meaning it allows for quick response t… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: ECML PKDD 2019 (The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019)

  31. arXiv:1910.02034  [pdf, other

    cs.LG cs.CV stat.ML

    Generative Adversarial Networks for Failure Prediction

    Authors: Shuai Zheng, Ahmed Farahat, Chetan Gupta

    Abstract: Prognostics and Health Management (PHM) is an emerging engineering discipline which is concerned with the analysis and prediction of equipment health and performance. One of the key challenges in PHM is to accurately predict impending failures in the equipment. In recent years, solutions for failure prediction have evolved from building complex physical models to the use of machine learning algori… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: ECML PKDD 2019 (The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019)

  32. arXiv:1909.10710  [pdf, other

    stat.ME

    Estimating Number of Factors by Adjusted Eigenvalues Thresholding

    Authors: Jianqing Fan, Jianhua Guo, Shurong Zheng

    Abstract: Determining the number of common factors is an important and practical topic in high dimensional factor models. The existing literatures are mainly based on the eigenvalues of the covariance matrix. Due to the incomparability of the eigenvalues of the covariance matrix caused by heterogeneous scales of observed variables, it is very difficult to give an accurate relationship between these eigenval… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: 35 pages; 4 figures

  33. arXiv:1907.04433  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

    Authors: Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

    Abstract: We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototy** and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customiza… ▽ More

    Submitted 12 February, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-7

  34. arXiv:1907.00664  [pdf, other

    cs.LG stat.ML

    Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

    Authors: Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  35. arXiv:1906.06713  [pdf, ps, other

    math.ST stat.AP stat.ME

    Community Detection Based on the $L_\infty$ convergence of eigenvectors in DCBM

    Authors: Yan Liu, Zhiqiang Hou, Zhigang Yao, Zhidong Bai, Jiang Hu, Shurong Zheng

    Abstract: Spectral clustering is one of the most popular algorithms for community detection in network analysis. Based on this rationale, in this paper we give the convergence rate of eigenvectors for the adjacency matrix in the $l_\infty$ norm, under the stochastic block model (BM) and degree corrected stochastic block model (DCBM), adding some mild and rational conditions. We also extend this result to a… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: 28 pages, 2 figures

  36. arXiv:1906.00562  [pdf, ps, other

    cs.CV cs.LG stat.ML

    Learning to Self-Train for Semi-Supervised Few-Shot Classification

    Authors: Xinzhe Li, Qianru Sun, Yaoyao Liu, Shibao Zheng, Qin Zhou, Tat-Seng Chua, Bernt Schiele

    Abstract: Few-shot classification (FSC) is challenging due to the scarcity of labeled training data (e.g. only one labeled data point per class). Meta-learning has shown to achieve promising results by learning to initialize a classification model for FSC. In this paper we propose a novel semi-supervised meta-learning method called learning to self-train (LST) that leverages unlabeled data and specifically… ▽ More

    Submitted 29 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  37. arXiv:1905.12654  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Generalization Gap in Reparameterizable Reinforcement Learning

    Authors: Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: Understanding generalization in reinforcement learning (RL) is a significant challenge, as many common assumptions of traditional supervised learning theory do not apply. We focus on the special class of reparameterizable RL problems, where the trajectory distribution can be decomposed using the reparametrization trick. For this problem class, estimating the expected return is efficient and the tr… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Journal ref: Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019

  38. arXiv:1905.10936  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

    Authors: Shuai Zheng, Ziyue Huang, James T. Kwok

    Abstract: Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is base… ▽ More

    Submitted 28 October, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019

  39. arXiv:1905.09899  [pdf, other

    cs.LG math.OC stat.ML

    Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning

    Authors: Shuai Zheng, James T. Kwok

    Abstract: Stochastic methods with coordinate-wise adaptive stepsize (such as RMSprop and Adam) have been widely used in training deep neural networks. Despite their fast convergence, they can generalize worse than stochastic gradient descent. In this paper, by revisiting the design of Adagrad, we propose to split the network parameters into blocks, and use a blockwise adaptive stepsize. Intuitively, blockwi… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  40. arXiv:1904.06442  [pdf, other

    cs.LG stat.ML

    Remaining Useful Life Estimation Using Functional Data Analysis

    Authors: Qiyao Wang, Shuai Zheng, Ahmed Farahat, Susumu Serita, Chetan Gupta

    Abstract: Remaining Useful Life (RUL) of an equipment or one of its components is defined as the time left until the equipment or component reaches its end of useful life. Accurate RUL estimation is exceptionally beneficial to Predictive Maintenance, and Prognostics and Health Management (PHM). Data driven approaches which leverage the power of algorithms for RUL estimation using sensor and operational time… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: Accepted by IEEE International Conference on Prognostics and Health Management 2019

  41. arXiv:1901.10946  [pdf, other

    cs.LG stat.ML

    NAOMI: Non-Autoregressive Multiresolution Sequence Imputation

    Authors: Yukai Liu, Rose Yu, Stephan Zheng, Eric Zhan, Yisong Yue

    Abstract: Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NA… ▽ More

    Submitted 29 October, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

  42. arXiv:1809.07122  [pdf, other

    cs.LG stat.ML

    Capacity Control of ReLU Neural Networks by Basis-path Norm

    Authors: Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

    Abstract: Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. It has been shown that the generalization error bound in terms of the path norm explains the empirical generalization behaviors of the ReLU neural networks better than that of other capacity measures… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Journal ref: AAAI 2019

  43. arXiv:1807.00366  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

    Authors: Baoxiang Wang, Tongfang Sun, Xianjun Sam Zheng

    Abstract: In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players' diverse behaviors. In th… ▽ More

    Submitted 5 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

  44. arXiv:1806.02927  [pdf, other

    cs.LG math.OC stat.ML

    Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data

    Authors: Shuai Zheng, James T. Kwok

    Abstract: Variance reduction has been commonly used in stochastic optimization. It relies crucially on the assumption that the data set is finite. However, when the data are imputed with random noise as in data augmentation, the perturbed data set be- comes essentially infinite. Recently, the stochastic MISO (S-MISO) algorithm is introduced to address this expected risk minimization problem. Though it conve… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: To appear in ICML 2018

  45. arXiv:1804.05090  [pdf, ps, other

    cs.LG cs.IR stat.ML

    Regularized Singular Value Decomposition and Application to Recommender System

    Authors: Shuai Zheng, Chris Ding, Fei** Nie

    Abstract: Singular value decomposition (SVD) is the mathematical basis of principal component analysis (PCA). Together, SVD and PCA are one of the most widely used mathematical formalism/decomposition in machine learning, data mining, pattern recognition, artificial intelligence, computer vision, signal processing, etc. In recent applications, regularization becomes an increasing trend. In this paper, we pr… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

  46. arXiv:1804.02370  [pdf, other

    cs.LG stat.ML

    Minimal Support Vector Machine

    Authors: Shuai Zheng, Chris Ding

    Abstract: Support Vector Machine (SVM) is an efficient classification approach, which finds a hyperplane to separate data from different classes. This hyperplane is determined by support vectors. In existing SVM formulations, the objective function uses L2 norm or L1 norm on slack variables. The number of support vectors is a measure of generalization errors. In this work, we propose a Minimal SVM, which us… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

  47. arXiv:1803.07612  [pdf, other

    cs.LG stat.ML

    Generating Multi-Agent Trajectories using Programmatic Weak Supervision

    Authors: Eric Zhan, Stephan Zheng, Yisong Yue, Long Sha, Patrick Lucey

    Abstract: We study the problem of training sequential generative models for capturing coordinated multi-agent trajectory behavior, such as offensive basketball gameplay. When modeling such settings, it is often beneficial to design hierarchical models that can capture long-term coordination using intermediate variables. Furthermore, these intermediate variables should capture interesting high-level behavior… ▽ More

    Submitted 22 February, 2019; v1 submitted 20 March, 2018; originally announced March 2018.

  48. arXiv:1802.03713  [pdf, other

    stat.ML cs.LG

    $\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

    Authors: Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

    Abstract: It is well known that neural networks with rectified linear units (ReLU) activation functions are positively scale-invariant. Conventional algorithms like stochastic gradient descent optimize the neural networks in the vector space of weights, which is, however, not positively scale-invariant. This mismatch may lead to problems during the optimization process. Then, a natural question is: \emph{ca… ▽ More

    Submitted 23 March, 2021; v1 submitted 11 February, 2018; originally announced February 2018.

    Journal ref: ICLR2019

  49. arXiv:1801.09150  [pdf, other

    stat.ML

    Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm

    Authors: Vadim Smolyakov, Julian Straub, Sue Zheng, John W. Fisher III

    Abstract: Modern vehicles are equipped with increasingly complex sensors. These sensors generate large volumes of data that provide opportunities for modeling and analysis. Here, we are interested in exploiting this data to learn aspects of behaviors and the road network associated with individual drivers. Our dataset is collected on a standard vehicle used to commute to work and for personal trips. A Hidde… ▽ More

    Submitted 27 January, 2018; originally announced January 2018.

  50. A New Test of Multivariate Nonlinear Causality

    Authors: Zhidong Bai, Yongchang Hui, Zhihui Lv, Wing-Keung Wong, Shurong Zheng, Zhenzhen Zhu

    Abstract: The multivariate nonlinear Granger causality developed by Bai et al. (2010) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate… ▽ More

    Submitted 3 March, 2017; originally announced March 2017.

    Comments: 20 pages. arXiv admin note: substantial text overlap with arXiv:1701.03992