Skip to main content

Showing 1–50 of 66 results for author: Lin, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.17032  [pdf, other

    q-bio.QM math.PR q-bio.PE stat.AP

    Exact phylodynamic likelihood via structured Markov genealogy processes

    Authors: Aaron A. King, Qianying Lin, Edward L. Ionides

    Abstract: We consider genealogies arising from a Markov population process in which individuals are categorized into a discrete collection of compartments, with the requirement that individuals within the same compartment are statistically exchangeable. When equipped with a sampling process, each such population process induces a time-evolving tree-valued process defined as the genealogy of all sampled indi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.09362  [pdf, other

    stat.ML cs.LG

    On the Saturation Effect of Kernel Ridge Regression

    Authors: Yicheng Li, Haobo Zhang, Qian Lin

    Abstract: The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-sta… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: ICLR 2023; Minor errors are corrected in this version

  3. arXiv:2404.12597  [pdf, other

    cs.LG math.ST stat.ML

    The phase diagram of kernel interpolation in large dimensions

    Authors: Haobo Zhang, Weihao Lu, Qian Lin

    Abstract: The generalization ability of kernel interpolation in large dimensions (i.e., $n \asymp d^γ$ for some $γ>0$) might be one of the most interesting problems in the recent renaissance of kernel regression, since it may help us understand the 'benign overfitting phenomenon' reported in the neural networks literature. Focusing on the inner product kernel on the sphere, we fully characterized the exact… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 18 pages, 1 figure

  4. arXiv:2402.01148  [pdf, other

    math.ST cs.LG stat.ML

    The Optimality of Kernel Classifiers in Sobolev Space

    Authors: Jianfa Lai, Zhifan Li, Dongming Huang, Qian Lin

    Abstract: Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $η(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a k… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 21 pages, 2 figures

    MSC Class: 62G08 (Primary); 68T07; 46E22 (secondary) ACM Class: G.3

  5. arXiv:2309.04268  [pdf, other

    stat.ML cs.LG math.ST

    Optimal Rate of Kernel Regression in Large Dimensions

    Authors: Weihao Lu, Haobo Zhang, Yicheng Li, Manyun Xu, Qian Lin

    Abstract: We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^γ$ for some $γ>0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metr… ▽ More

    Submitted 28 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    MSC Class: 62G08; 46E22; 68T07

  6. arXiv:2305.18506  [pdf, other

    stat.ML cs.LG

    Generalization Ability of Wide Residual Networks

    Authors: Jianfa Lai, Zixiong Yu, Songtao Tian, Qian Lin

    Abstract: In this paper, we study the generalization ability of the wide residual network on $\mathbb{S}^{d-1}$ with the ReLU activation function. We first show that as the width $m\rightarrow\infty$, the residual network kernel (RNK) uniformly converges to the residual neural tangent kernel (RNTK). This uniform convergence further guarantees that the generalization error of the residual network converges t… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 28 pages, 3 figures

    MSC Class: 62G08 (Primary); 68T07; 46E22 (secondary) ACM Class: G.3

  7. arXiv:2305.02657  [pdf, other

    stat.ML cs.LG

    On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

    Authors: Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

    Abstract: In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the… ▽ More

    Submitted 8 January, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

  8. arXiv:2302.05933  [pdf, other

    stat.ML cs.LG

    Generalization Ability of Wide Neural Networks on $\mathbb{R}$

    Authors: Jianfa Lai, Manyun Xu, Rui Chen, Qian Lin

    Abstract: We perform a study on the generalization ability of the wide two-layer ReLU neural network on $\mathbb{R}$. We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on $\mathbb{R}^{d}$, is positive definite; $b)$ $λ_{i}(K_{1})$, the $i$-th largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that: $i)$ when the width… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 47 pages, 4 figures

    MSC Class: 62G08 (Primary); 68T07 (secondary); 46E22 ACM Class: G.3

  9. Tunable robustness in power-law inference

    Authors: Qianying Lin, Mitchell Newberry

    Abstract: Power-law probability distributions arise often in the social and natural sciences. Statistics have been developed for estimating the exponent parameter as well as gauging goodness-of-fit to a power law. Yet paradoxically, many famous power laws such as the distribution of wealth and earthquake magnitudes have not found good statistical support in data by modern methods. We show that measurement e… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  10. arXiv:2212.12603  [pdf, ps, other

    cs.LG math.OC stat.ML

    Stochastic Methods for AUC Optimization subject to AUC-based Fairness Constraints

    Authors: Yao Yao, Qihang Lin, Tianbao Yang

    Abstract: As machine learning being used increasingly in making high-stakes decisions, an arising challenge is to avoid unfair AI systems that lead to discriminatory decisions for protected population. A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints, which achieves Pareto efficiency when trading off p… ▽ More

    Submitted 22 February, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: Published in AISTATS 2023

  11. arXiv:2203.01505  [pdf, ps, other

    cs.LG math.OC stat.ML

    Large-scale Optimization of Partial AUC in a Range of False Positive Rates

    Authors: Yao Yao, Qihang Lin, Tianbao Yang

    Abstract: The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. However, it summarizes the true positive rates (TPRs) over all false positive rates (FPRs) in the ROC space, which may include the FPRs with no practical relevance in some applications. The partial AUC, as a generalization of the AUC, summarizes only the TPRs over a… ▽ More

    Submitted 27 October, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  12. arXiv:2202.01163  [pdf, other

    stat.ME

    A Recommender System Based on a Double Feature Allocation Model

    Authors: Qiaohui Lin, Peter Mueller

    Abstract: A collaborative filtering recommender system predicts user preferences by discovering common features among users and items. We implement such inference using a Bayesian double feature allocation model, that is, a model for random pairs of subsets. We use an Indian buffet process (IBP) to link users and items to features. Here a feature is a subset of users and a matching subset of items. By train… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  13. arXiv:2112.07755  [pdf, other

    stat.ME math.ST

    Separate Exchangeability as Modeling Principle in Bayesian Nonparametrics

    Authors: Giovanni Rebaudo, Qiaohui Lin, Peter Mueller

    Abstract: We argue for the use of separate exchangeability as a modeling principle in Bayesian nonparametric (BNP) inference. Separate exchangeability is \emph{de facto} widely applied in the Bayesian parametric case, e.g., it naturally arises in simple mixed models. However, while in some areas, such as random graphs, separate and (closely related) joint exchangeability are widely used, it is curiously und… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 December, 2021; originally announced December 2021.

  14. arXiv:2106.15400  [pdf, other

    cs.LG stat.ML

    Online Interaction Detection for Click-Through Rate Prediction

    Authors: Qiuqiang Lin, Chuanhou Gao

    Abstract: Click-Through Rate prediction aims to predict the ratio of clicks to impressions of a specific link. This is a challenging task since (1) there are usually categorical features, and the inputs will be extremely high-dimensional if one-hot encoding is applied, (2) not only the original features but also their interactions are important, (3) an effective prediction may rely on different features and… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

    Comments: 11pages, 4 figures, 1 supplement

  15. arXiv:2105.12730  [pdf, other

    math.PR q-bio.PE q-bio.QM stat.AP stat.ME

    Markov Genealogy Processes

    Authors: Aaron A. King, Qianying Lin, Edward L. Ionides

    Abstract: We construct a family of genealogy-valued Markov processes that are induced by a continuous-time Markov population process. We derive exact expressions for the likelihood of a given genealogy conditional on the history of the underlying population process. These lead to a nonlinear filtering equation which can be used to design efficient Monte Carlo inference algorithms. We demonstrate these calcu… ▽ More

    Submitted 24 January, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

    MSC Class: 60J99

    Journal ref: Theoretical Population Biology 143:77-91 (2022)

  16. arXiv:2104.04714  [pdf, other

    stat.ML cs.LG

    Random Intersection Chains

    Authors: Qiuqiang Lin, Chuanhou Gao

    Abstract: Interactions between several features sometimes play an important role in prediction tasks. But taking all the interactions into consideration will lead to an extremely heavy computational burden. For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied. Inspired by association rule mining, we propose a… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

  17. arXiv:2009.06170  [pdf, other

    stat.ME math.ST stat.ML

    Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts

    Authors: Qiaohui Lin, Robert Lunde, Purnamrita Sarkar

    Abstract: We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational… ▽ More

    Submitted 7 April, 2022; v1 submitted 13 September, 2020; originally announced September 2020.

  18. arXiv:2004.08935  [pdf, other

    math.ST stat.ME stat.ML

    On the Theoretical Properties of the Network Jackknife

    Authors: Qiaohui Lin, Robert Lunde, Purnamrita Sarkar

    Abstract: We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein-type inequality, showing that the network jackknife leads to conservative estimates of the variance (in expectation) for any network functional that is invariant to node permutation. For a general class of count functionals, we also establish consistency of the… ▽ More

    Submitted 21 April, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

  19. arXiv:2002.12761  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

    Authors: Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, Ming Li

    Abstract: In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. O… ▽ More

    Submitted 4 May, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

    Comments: Submitted to Odyssesy 2020

  20. arXiv:2002.11184  [pdf, other

    q-bio.PE math.PR stat.AP

    The Sampled Moran Genealogy Process

    Authors: Aaron A. King, Qianying Lin, Edward L. Ionides

    Abstract: We define the Sampled Moran Genealogy Process, a continuous-time Markov process on the space of genealogies with the demography of the classical Moran process, sampled through time. To do so, we begin by defining the Moran Genealogy Process using a novel representation. We then extend this process to include sampling through time. We derive exact conditional and marginal probability distributions… ▽ More

    Submitted 19 October, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    MSC Class: 60J99

  21. arXiv:2002.05309  [pdf, ps, other

    math.OC cs.LG stat.ML

    Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

    Authors: Yan Yan, Yi Xu, Qihang Lin, Wei Liu, Tianbao Yang

    Abstract: Epoch gradient descent method (a.k.a. Epoch-GD) proposed by Hazan and Kale (2011) was deemed a breakthrough for stochastic strongly convex minimization, which achieves the optimal convergence rate of $O(1/T)$ with $T$ iterative updates for the {\it objective gap}. However, its extension to solving stochastic min-max problems with strong convexity and strong concavity still remains open, and it is… ▽ More

    Submitted 17 June, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

  22. arXiv:2002.04180  [pdf, other

    cs.SI cs.LG stat.ML

    LoCEC: Local Community-based Edge Classification in Large Online Social Networks

    Authors: Chonggang Song, Qian Lin, Guohui Ling, Zongyi Zhang, Hongzhao Chen, Jun Liao, Chuan Chen

    Abstract: Relationships in online social networks often imply social connections in the real world. An accurate understanding of relationship types benefits many applications, e.g. social advertising and recommendation. Some recent attempts have been proposed to classify user relationships into predefined types with the help of pre-labeled relationships or abundant interaction features on relationships. Unf… ▽ More

    Submitted 20 March, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  23. arXiv:2001.02798  [pdf, other

    cs.LG math.OC stat.ML

    Self-guided Approximate Linear Programs

    Authors: Parshan Pakiman, Selvaprabu Nadarajah, Negar Soheili, Qihang Lin

    Abstract: Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain policies and lower bounds on the optimal policy cost of discounted-cost Markov decision processes (MDPs). Formulating an ALP requires (i) basis functions, the linear combination of which defines the VFA, and (ii) a state-relevance distribution, which determines the relative importance o… ▽ More

    Submitted 12 October, 2021; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: 52 pages

    MSC Class: 90C39; 90C40; 90C05; 90C06; 90C15; 90C22; 90C90; 46C07; 93E20; 93E35; 68T99; 65K99 ACM Class: I.2.8; G.1.2; G.3

  24. arXiv:2001.01006  [pdf, other

    q-bio.GN cs.LG q-bio.QM stat.ML

    Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

    Authors: Shixiong Zhang, Xiangtao Li, Qiuzhen Lin, Ka-Chun Wong

    Abstract: In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data… ▽ More

    Submitted 3 January, 2020; originally announced January 2020.

  25. Unified model selection approach based on minimum description length principle in Granger causality analysis

    Authors: Fei Li, Xuewei Wang, Qiang Lin, Zhenghui Hu

    Abstract: Granger causality analysis (GCA) provides a powerful tool for uncovering the patterns of brain connectivity mechanism using neuroimaging techniques. Conventional GCA applies two different mathematical theories in a two-stage scheme: (1) the Bayesian information criterion (BIC) or Akaike information criterion (AIC) for the regression model orders associated with endogenous and exogenous information… ▽ More

    Submitted 19 March, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

  26. arXiv:1910.07099  [pdf, other

    cs.LG cs.IR stat.ML

    Entire Space Multi-Task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction

    Authors: Hong Wen, **g Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, Ke** Yang

    Abstract: Recommender system, as an essential part of modern e-commerce, consists of two fundamental modules, namely Click-Through Rate (CTR) and Conversion Rate (CVR) prediction. While CVR has a direct impact on the purchasing volume, its prediction is well-known challenging due to the Sample Selection Bias (SSB) and Data Sparsity (DS) issues. Although existing methods, typically built on the user sequenti… ▽ More

    Submitted 9 June, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: 10page, 7 figures. Accepted by SIGIR 2020. The source code will be released at https://github.com/chaimi2013/ESM2

  27. arXiv:1909.10467  [pdf, other

    cs.LG cs.AI stat.ML

    Model-Agnostic Linear Competitors -- When Interpretable Models Compete and Collaborate with Black-Box Models

    Authors: Hassan Rafique, Tong Wang, Qihang Lin

    Abstract: Driven by an increasing need for model interpretability, interpretable models have become strong competitors for black-box models in many real applications. In this paper, we propose a novel type of model where interpretable models compete and collaborate with black-box models. We present the Model-Agnostic Linear Competitors (MALC) for partially interpretable classification. MALC is a hybrid mode… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  28. arXiv:1908.03077  [pdf, ps, other

    math.OC cs.LG stat.ML

    A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints

    Authors: Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang

    Abstract: Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we… ▽ More

    Submitted 1 January, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

  29. arXiv:1907.10393  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

    Authors: Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras

    Abstract: More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this pa… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: Accepted for INTERSPEECH 2019

  30. arXiv:1905.07835  [pdf, other

    cs.LG stat.ML

    Label Map** Neural Networks with Response Consolidation for Class Incremental Learning

    Authors: Xu Zhang, Yang Yao, Baile Xu, Lekun Mao, Furao Shen, Jian Zhao, Qingwei Lin

    Abstract: Class incremental learning refers to a special multi-class classification task, in which the number of classes is not fixed but is increasing with the continual arrival of new data. Existing researches mainly focused on solving catastrophic forgetting problem in class incremental learning. To this end, however, these models still require the old classes cached in the auxiliary data structure or mo… ▽ More

    Submitted 19 May, 2019; originally announced May 2019.

  31. arXiv:1905.04241  [pdf, other

    cs.LG cs.AI stat.ML

    Hybrid Predictive Model: When an Interpretable Model Collaborates with a Black-box Model

    Authors: Tong Wang, Qihang Lin

    Abstract: Interpretable machine learning has become a strong competitor for traditional black-box models. However, the possible loss of the predictive performance for gaining interpretability is often inevitable, putting practitioners in a dilemma of choosing between high accuracy (black-box models) and interpretability (interpretable models). In this work, we propose a novel framework for building a Hybrid… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

  32. arXiv:1904.10112  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Primal-Dual Algorithms with Faster Convergence than $O(1/\sqrt{T})$ for Problems without Bilinear Structure

    Authors: Yan Yan, Yi Xu, Qihang Lin, Lijun Zhang, Tianbao Yang

    Abstract: Previous studies on stochastic primal-dual algorithms for solving min-max problems with faster convergence heavily rely on the bilinear structure of the problem, which restricts their applicability to a narrowed range of problems. The main contribution of this paper is the design and analysis of new stochastic primal-dual algorithms that use a mixture of stochastic gradient updates and a logarithm… ▽ More

    Submitted 18 December, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

  33. arXiv:1903.00070  [pdf, other

    cs.LG cs.RO stat.ML

    Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

    Authors: Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song

    Abstract: We propose a meta path planning algorithm named \emph{Neural Exploration-Exploitation Trees~(NEXT)} for learning from prior experience for solving new path planning problems in high dimensional continuous state and action spaces. Compared to more classical sampling-based methods like RRT, our approach achieves much better sample efficiency in high-dimensions and can benefit from prior experience o… ▽ More

    Submitted 23 February, 2020; v1 submitted 28 February, 2019; originally announced March 2019.

    Comments: 26 pages, 74 figures, ICLR 2020 spotlight

  34. arXiv:1811.11829  [pdf, other

    math.OC stat.ML

    Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence

    Authors: Yi Xu, Qi Qi, Qihang Lin, Rong **, Tianbao Yang

    Abstract: Difference of convex (DC) functions cover a broad family of non-convex and possibly non-smooth and non-differentiable functions, and have wide applications in machine learning and statistics. Although deterministic algorithms for DC functions have been extensively studied, stochastic optimization that is more suitable for learning with big data remains under-explored. In this paper, we propose new… ▽ More

    Submitted 4 February, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: In the revised version, we present some improved complexity results for non-smooth and non-convex regularizers and for functions with known Hölder continuity parameter $ν\in(0,1]$ by a simple change of an algorithmic parameter

  35. arXiv:1810.10207  [pdf, other

    math.OC stat.ML

    First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems

    Authors: Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang

    Abstract: In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly concave in the variables of maximization. It has many important applications in machine learning including training Generative Adversarial Nets (GANs). We propose a… ▽ More

    Submitted 7 July, 2021; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: Accepted by Journal of Machine Learning Research (JMLR)

  36. arXiv:1810.08559  [pdf, other

    eess.AS cs.LG cs.NE cs.SD eess.SP stat.ML

    EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge

    Authors: Zhong Qiu Lin, Audrey G. Chung, Alexander Wong

    Abstract: Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting ef… ▽ More

    Submitted 13 November, 2018; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: 4 pages

  37. arXiv:1810.04472   

    cs.LG cs.AI stat.ML

    Domain Confusion with Self Ensembling for Unsupervised Adaptation

    Authors: Jiawei Wang, Zhaoshui He, Chengjian Feng, Zhou** Zhu, Qinzhuang Lin, Jun Lv, Shengli Xie

    Abstract: Data collection and annotation are time-consuming in machine learning, expecially for large scale problem. A common approach for this problem is to transfer knowledge from a related labeled domain to a target one. There are two popular ways to achieve this goal: adversarial learning and self training. In this article, we first analyze the training unstablity problem and the mistaken confusion issu… ▽ More

    Submitted 8 July, 2020; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: The expression is ambiguous, which is not convenient for readers to understand, and in today's view, the conclusion of the paper is of little significance, so it is no longer open

  38. arXiv:1808.10396  [pdf, other

    cs.LG stat.ML

    A Unified Analysis of Stochastic Momentum Methods for Deep Learning

    Authors: Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang

    Abstract: Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famou… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

    Comments: Previous Technical Report: arXiv:1604.03257

    Journal ref: In IJCAI, pp. 2955-2961. 2018

  39. arXiv:1807.01635  [pdf, other

    stat.ME stat.AP

    Randomization Inference for Peer Effects

    Authors: Xinran Li, Peng Ding, Qian Lin, Dawei Yang, Jun S. Liu

    Abstract: Many previous causal inference studies require no interference, that is, the potential outcomes of a unit do not depend on the treatments of other units. However, this no-interference assumption becomes unreasonable when a unit interacts with other units in the same group or cluster. In a motivating application, a university in China admits students through two channels: the college entrance exam… ▽ More

    Submitted 20 December, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  40. arXiv:1805.09484  [pdf, other

    cs.LG stat.ML

    Multi-Level Deep Cascade Trees for Conversion Rate Prediction in Recommendation System

    Authors: Hong Wen, **g Zhang, Quan Lin, Ke** Yang, Pipei Huang

    Abstract: Develo** effective and efficient recommendation methods is very challenging for modern e-commerce platforms. Generally speaking, two essential modules named "Click-Through Rate Prediction" (\textit{CTR}) and "Conversion Rate Prediction" (\textit{CVR}) are included, where \textit{CVR} module is a crucial factor that affects the final purchasing volume directly. However, it is indeed very challeng… ▽ More

    Submitted 18 November, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: 8 pages, 5 figures, To appear in AAAI'2019

  41. arXiv:1802.04918  [pdf, other

    cs.LG stat.ML

    Prophit: Causal inverse classification for multiple continuously valued treatment policies

    Authors: Michael T. Lash, Qihang Lin, W. Nick Street

    Abstract: Inverse classification uses an induced classifier as a queryable oracle to guide test instances towards a preferred posterior class label. The result produced from the process is a set of instance-specific feature perturbations, or recommendations, that optimally improve the probability of the class label. In this work, we adopt a causal approach to inverse classification, eliciting treatment poli… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

  42. arXiv:1710.05080  [pdf, other

    math.OC stat.ML

    DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

    Authors: Lin Xiao, Adams Wei Yu, Qihang Lin, Weizhu Chen

    Abstract: Machine learning with big data often involves large optimization models. For distributed optimization over a cluster of machines, frequent communication and synchronization of all model parameters (optimization variables) can be very costly. A promising solution is to use parameter servers to store different subsets of the model parameters, and update them asynchronously at different machines usin… ▽ More

    Submitted 13 October, 2017; originally announced October 2017.

  43. Double Coupled Canonical Polyadic Decomposition for Joint Blind Source Separation

    Authors: Xiao-Feng Gong, Qiu-Hua Lin, Feng-Yu Cong, Lieven De Lathauwer

    Abstract: Joint blind source separation (J-BSS) is an emerging data-driven technique for multi-set data-fusion. In this paper, J-BSS is addressed from a tensorial perspective. We show how, by using second-order multi-set statistics in J-BSS, a specific double coupled canonical polyadic decomposition (DC-CPD) problem can be formulated. We propose an algebraic DC-CPD algorithm based on a coupled rank-1 detect… ▽ More

    Submitted 27 April, 2018; v1 submitted 30 December, 2016; originally announced December 2016.

    Comments: Accepted by IEEE Transactions on Signal Processing

  44. arXiv:1612.07222  [pdf, other

    stat.ML cs.LG stat.ME

    Bayesian Decision Process for Cost-Efficient Dynamic Ranking via Crowdsourcing

    Authors: Xi Chen, Kevin Jiao, Qihang Lin

    Abstract: Rank aggregation based on pairwise comparisons over a set of items has a wide range of applications. Although considerable research has been devoted to the development of rank aggregation algorithms, one basic question is how to efficiently collect a large amount of high-quality pairwise comparisons for the ranking purpose. Because of the advent of many crowdsourcing services, a crowd of workers a… ▽ More

    Submitted 21 December, 2016; originally announced December 2016.

    Journal ref: Journal of Machine Learning Research 17 (2016) 1-40

  45. arXiv:1611.07100  [pdf, other

    stat.ML cs.AI

    Interpreting Finite Automata for Sequential Data

    Authors: Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State

    Abstract: Automaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply… ▽ More

    Submitted 24 November, 2016; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

    ACM Class: I.2.6

  46. arXiv:1611.06655  [pdf, ps, other

    math.ST stat.ME

    Sparse Sliced Inverse Regression Via Lasso

    Authors: Qian Lin, Zhigen Zhao, Jun S. Liu

    Abstract: For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the sufficient dimension reduction (SDR) space if and only if $ρ=\lim\frac{p}{n}=0$, where $p$ is the dimension and $n$ is the sample size. Thus, when $p$ is of the same or a higher order of $n$, additional assumptions such as sparsity must be imposed in order to ensure consi… ▽ More

    Submitted 17 June, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: 41 pages, 2 figures

    MSC Class: 62J02 (Primary); 62H25 (Secondary)

  47. Generalized Inverse Classification

    Authors: Michael T. Lash, Qihang Lin, W. Nick Street, Jennifer G. Robinson, Jeffrey Ohlmann

    Abstract: Inverse classification is the process of perturbing an instance in a meaningful way such that it is more likely to conform to a specific class. Historical methods that address such a problem are often framed to leverage only a single classifier, or specific set of classifiers. These works are often accompanied by naive assumptions. In this work we propose generalized inverse classification (GIC),… ▽ More

    Submitted 12 January, 2017; v1 submitted 5 October, 2016; originally announced October 2016.

    Comments: Accepted to SDM 2017. Full paper + supplemental material

  48. arXiv:1608.03487  [pdf, ps, other

    math.OC stat.ML

    A Richer Theory of Convex Constrained Optimization with Reduced Projections and Improved Rates

    Authors: Tianbao Yang, Qihang Lin, Lijun Zhang

    Abstract: This paper focuses on convex constrained optimization problems, where the solution is subject to a convex inequality constraint. In particular, we aim at challenging problems for which both projection into the constrained domain and a linear optimization under the inequality constraint are time-consuming, which render both projected gradient methods and conditional gradient methods (a.k.a. the Fra… ▽ More

    Submitted 12 June, 2017; v1 submitted 11 August, 2016; originally announced August 2016.

    Comments: This is the long version of our ICML 2017 paper

  49. arXiv:1607.03815  [pdf, ps, other

    math.OC stat.ML

    Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/ε)$

    Authors: Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang

    Abstract: In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal map** is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/ε)$ without any… ▽ More

    Submitted 3 November, 2016; v1 submitted 13 July, 2016; originally announced July 2016.

    Comments: This is a long version of the paper accepted by NIPS 2016

  50. arXiv:1607.01027  [pdf, ps, other

    math.OC cs.LG math.NA stat.ML

    Accelerate Stochastic Subgradient Method by Leveraging Local Growth Condition

    Authors: Yi Xu, Qihang Lin, Tianbao Yang

    Abstract: In this paper, a new theory is developed for first-order stochastic convex optimization, showing that the global convergence rate is sufficiently quantified by a local growth rate of the objective function in a neighborhood of the optimal solutions. In particular, if the objective function $F(\mathbf w)$ in the $ε$-sublevel set grows as fast as $\|\mathbf w - \mathbf w_*\|_2^{1/θ}$, where… ▽ More

    Submitted 5 May, 2020; v1 submitted 4 July, 2016; originally announced July 2016.