Search | arXiv e-print repository

Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

Authors: Serina Chang, Frederic Koehler, Zhaonan Qu, Jure Leskovec, Johan Ugander

Abstract: A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, th… ▽ More A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2211.15125 [pdf, other]

Global Depths for Irregularly Observed Multivariate Functional Data

Authors: Zhuo Qu, Wenlin Dai, Marc G. Genton

Abstract: Two frameworks for multivariate functional depth based on multivariate depths are introduced in this paper. The first framework is multivariate functional integrated depth, and the second framework involves multivariate functional extremal depth, which is an extension of the extremal depth for univariate functional data. In each framework, global and local multivariate functional depths are propos… ▽ More Two frameworks for multivariate functional depth based on multivariate depths are introduced in this paper. The first framework is multivariate functional integrated depth, and the second framework involves multivariate functional extremal depth, which is an extension of the extremal depth for univariate functional data. In each framework, global and local multivariate functional depths are proposed. The properties of population multivariate functional depths and consistency of finite sample depths to their population versions are established. In addition, finite sample depths under irregularly observed time grids are estimated. As a by-product, the simplified sparse functional boxplot and simplified intensity sparse functional boxplot are proposed for visualization without data reconstruction. A simulation study demonstrates the advantages of global multivariate functional depths over local multivariate functional depths in outlier detection and running time for big functional data. An application of our frameworks to cyclone tracks data demonstrates the excellent performance of our global multivariate functional depths. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 29 pages, 6 figures

arXiv:2209.00809 [pdf, other]

Optimal Diagonal Preconditioning

Authors: Zhaonan Qu, Wenzhi Gao, Oliver Hinder, Yinyu Ye, Zhengyuan Zhou

Abstract: Preconditioning has long been a staple technique in optimization, often applied to reduce the condition number of a matrix and speed up the convergence of algorithms. Although there are many popular preconditioning techniques in practice, most lack guarantees on reductions in condition number. Moreover, the degree to which we can improve over existing heuristic preconditioners remains an important… ▽ More Preconditioning has long been a staple technique in optimization, often applied to reduce the condition number of a matrix and speed up the convergence of algorithms. Although there are many popular preconditioning techniques in practice, most lack guarantees on reductions in condition number. Moreover, the degree to which we can improve over existing heuristic preconditioners remains an important practical question. In this paper, we study the problem of optimal diagonal preconditioning that achieves maximal reduction in the condition number of any full-rank matrix by scaling its rows and/or columns. We first reformulate the problem as a quasi-convex problem and provide a simple algorithm based on bisection. Then we develop an interior point algorithm with $O(\log(1/ε))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems. We then develop efficient customized solvers and study the empirical performance of our optimal diagonal preconditioning procedures through extensive experiments on large matrices. Our findings suggest that optimal diagonal preconditioners can significantly improve upon existing heuristics-based diagonal preconditioners at reducing condition numbers and speeding up iterative methods. Moreover, our implementation of customized solvers, combined with a random row/column sampling step, can find near-optimal diagonal preconditioners for matrices up to size 200,000 in reasonable time, demonstrating their practical appeal. △ Less

Submitted 4 November, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2206.05891 [pdf, other]

Anchor Sampling for Federated Learning with Partial Client Participation

Authors: Feijie Wu, Song Guo, Zhihao Qu, Shiqi He, Ziming Liu, **g Gao

Abstract: Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients' updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Trainin… ▽ More Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients' updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation. The core idea is anchor sampling, which separates partial participants into anchor and miner groups. Each client in the anchor group aims at the local bullseye with the gradient computation using a large batch. Guided by the bullseyes, clients in the miner group steer multiple near-optimal local updates using small batches and update the global model. By integrating the results of the two groups, FedAMD is able to accelerate the training process and improve the model performance. Measured by $ε$-approximation and compared to the state-of-the-art methods, FedAMD achieves the convergence by up to $O(1/ε)$ fewer communication rounds under non-convex objectives. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of the proposed algorithm: Not only does it considerably save computation and communication costs, but also the test accuracy significantly improves. △ Less

Submitted 28 May, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: ICML 2023

arXiv:2204.12135 [pdf, other]

Robust Two-Layer Partition Clustering of Sparse Multivariate Functional Data

Authors: Zhuo Qu, Wenlin Dai, Marc G. Genton

Abstract: A novel elastic time distance for sparse multivariate functional data is proposed and used to develop a robust distance-based two-layer partition clustering method. With this proposed distance, the new approach not only can detect correct clusters for sparse multivariate functional data under outlier settings but also can detect those outliers that do not belong to any clusters. Classical distance… ▽ More A novel elastic time distance for sparse multivariate functional data is proposed and used to develop a robust distance-based two-layer partition clustering method. With this proposed distance, the new approach not only can detect correct clusters for sparse multivariate functional data under outlier settings but also can detect those outliers that do not belong to any clusters. Classical distance-based clustering methods such as density-based spatial clustering of applications with noise (DBSCAN), agglomerative hierarchical clustering, and $K$-medoids are extended to the sparse multivariate functional case based on the newly-proposed distance. Numerical experiments on simulated data highlight that the performance of the proposed algorithm is superior to the performances of existing model-based and extended distance-based methods. The effectiveness of the proposed approach is demonstrated using Northwest Pacific cyclone tracks data as an example. △ Less

Submitted 18 March, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: 31 pages, 9 figures

MSC Class: 62H30

arXiv:2107.12420 [pdf, other]

Semiparametric Estimation of Treatment Effects in Observational Studies with Heterogeneous Partial Interference

Authors: Zhaonan Qu, Ruoxuan Xiong, Jizhou Liu, Guido Imbens

Abstract: In many observational studies in social science and medicine, subjects or units are connected, and one unit's treatment and attributes may affect another's treatment and outcome, violating the stable unit treatment value assumption (SUTVA) and resulting in interference. To enable feasible estimation and inference, many previous works assume exchangeability of interfering units (neighbors). However… ▽ More In many observational studies in social science and medicine, subjects or units are connected, and one unit's treatment and attributes may affect another's treatment and outcome, violating the stable unit treatment value assumption (SUTVA) and resulting in interference. To enable feasible estimation and inference, many previous works assume exchangeability of interfering units (neighbors). However, in many applications with distinctive units, interference is heterogeneous and needs to be modeled explicitly. In this paper, we focus on the partial interference setting, and only restrict units to be exchangeable conditional on observable characteristics. Under this framework, we propose generalized augmented inverse propensity weighted (AIPW) estimators for general causal estimands that include heterogeneous direct and spillover effects. We show that they are semiparametric efficient and robust to heterogeneous interference as well as model misspecifications. We apply our methods to the Add Health dataset to study the direct effects of alcohol consumption on academic performance and the spillover effects of parental incarceration on adolescent well-being. △ Less

Submitted 22 June, 2024; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2103.07868 [pdf, other]

doi 10.1080/10618600.2022.2066680

Sparse Functional Boxplots for Multivariate Curves

Authors: Zhuo Qu, Marc G. Genton

Abstract: This paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools. Besides being available for complete functional data, they can be used in sparse univariate and multivariate functional data. The sparse functional boxplot, based on the functional boxplot, displays sparseness proportions within the 50\% central region. The intensity spar… ▽ More This paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools. Besides being available for complete functional data, they can be used in sparse univariate and multivariate functional data. The sparse functional boxplot, based on the functional boxplot, displays sparseness proportions within the 50\% central region. The intensity sparse functional boxplot indicates the relative intensity of fitted sparse point patterns in the central region. The two-stage functional boxplot, which derives from the functional boxplot to detect outliers, is furthermore extended to its sparse form. We also contribute to sparse data fitting improvement and sparse multivariate functional data depth. In a simulation study, we evaluate the goodness of data fitting, several depth proposals for sparse multivariate functional data, and compare the results of outlier detection between the sparse functional boxplot and its two-stage version. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets. Supplementary materials and codes are available for readers to apply our visualization tools and replicate the analysis. △ Less

Submitted 27 May, 2022; v1 submitted 14 March, 2021; originally announced March 2021.

Comments: 33 pages, 7 figures

arXiv:2007.05690 [pdf, other]

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

Authors: Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou

Abstract: Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales w… ▽ More Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today, as well as its Nesterov accelerated variant, and conduct a systematic study of how their convergence scale with the number of participating devices under non-i.i.d. data and partial participation in convex settings. We provide a unified analysis that establishes convergence guarantees for FedAvg under strongly convex, convex, and overparameterized strongly convex problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies. For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings. Empirical studies of the algorithms in various settings have supported our theoretical results. △ Less

Submitted 31 December, 2023; v1 submitted 11 July, 2020; originally announced July 2020.

Journal ref: Journal of Artificial Intelligence Research 78 (2023) 1143-1200

arXiv:2007.03071 [pdf, other]

Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference

Authors: Zhongnan Qu, Cong Liu, Lothar Thiele

Abstract: Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updati… ▽ More Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights. △ Less

Submitted 27 July, 2022; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Published in ECCV 2022

arXiv:2003.08793 [pdf]

Deep Active Learning for Remote Sensing Object Detection

Authors: Zhenshen Qu, **gda Du, Yong Cao, Qiuyu Guan, Pengbo Zhao

Abstract: Recently, CNN object detectors have achieved high accuracy on remote sensing images but require huge labor and time costs on annotation. In this paper, we propose a new uncertainty-based active learning which can select images with more information for annotation and detector can still reach high performance with a fraction of the training images. Our method not only analyzes objects' classificati… ▽ More Recently, CNN object detectors have achieved high accuracy on remote sensing images but require huge labor and time costs on annotation. In this paper, we propose a new uncertainty-based active learning which can select images with more information for annotation and detector can still reach high performance with a fraction of the training images. Our method not only analyzes objects' classification uncertainty to find least confident objects but also considers their regression uncertainty to declare outliers. Besides, we bring out two extra weights to overcome two difficulties in remote sensing datasets, class-imbalance and difference in images' objects amount. We experiment our active learning algorithm on DOTA dataset with CenterNet as object detector. We achieve same-level performance as full supervision with only half images. We even override full supervision with 55% images and augmented weights on least confident images. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: 6 pages, 3 figures

arXiv:2003.07545 [pdf, other]

Interpretable Personalization via Policy Learning with Linear Decision Boundaries

Authors: Zhaonan Qu, Isabella Qian, Zhengyuan Zhou

Abstract: With the rise of the digital economy and an explosion of available information about consumers, effective personalization of goods and services has become a core business focus for companies to improve revenues and maintain a competitive edge. This paper studies the personalization problem through the lens of policy learning, where the goal is to learn a decision-making rule (a policy) that maps f… ▽ More With the rise of the digital economy and an explosion of available information about consumers, effective personalization of goods and services has become a core business focus for companies to improve revenues and maintain a competitive edge. This paper studies the personalization problem through the lens of policy learning, where the goal is to learn a decision-making rule (a policy) that maps from consumer and product characteristics (features) to recommendations (actions) in order to optimize outcomes (rewards). We focus on using available historical data for offline learning with unknown data collection procedures, where a key challenge is the non-random assignment of recommendations. Moreover, in many business and medical applications, interpretability of a policy is essential. We study the class of policies with linear decision boundaries to ensure interpretability, and propose learning algorithms using tools from causal inference to address unbalanced treatments. We study several optimization schemes to solve the associated non-convex, non-smooth optimization problem, and find that a Bayesian optimization algorithm is effective. We test our algorithm with extensive simulation studies and apply it to an anonymized online marketplace customer purchase dataset, where the learned policy outputs a personalized discount recommendation based on customer and product features in order to maximize gross merchandise value (GMV) for sellers. Our learned policy improves upon the platform's baseline by 88.2\% in net sales revenue, while also providing informative insights on which features are important for the decision-making process. Our findings suggest that our proposed policy learning framework using tools from causal inference and Bayesian optimization provides a promising practical approach to interpretable personalization across a wide range of applications. △ Less

Submitted 2 November, 2022; v1 submitted 17 March, 2020; originally announced March 2020.

arXiv:2001.08277 [pdf, ps, other]

Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning

Authors: Haozhao Wang, Zhihao Qu, Song Guo, Xin Gao, Ruixuan Li, Baoliu Ye

Abstract: Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of F… ▽ More Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of Federated Learning, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed. Specifically, each training node intermittently pulls the global model from the server in SGD iterations, resulting in that it is sometimes unsynchronized with the server. In such a case, it will use its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis of PRLC achieves two important findings. First, we prove that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Second, we show that PRLC admits lower pulling frequency than the existing pulling reduction method without local compensation. We also conduct extensive experiments on various machine learning models to validate our theoretical results. Experimental results show that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., PRLC requiring only half of the pulling operations of LAG. △ Less

Submitted 22 January, 2020; originally announced January 2020.

arXiv:2001.02856 [pdf, other]

D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data

Authors: Hai Shu, Zhe Qu, Hongtu Zhu

Abstract: Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additi… ▽ More Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the L2 space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples. △ Less

Submitted 16 September, 2022; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: The publisher's version is available at https://www.jmlr.org/papers/v23/20-021.html

Journal ref: Journal of Machine Learning Research, 23(169):1-64, 2022

arXiv:1912.09989 [pdf, other]

doi 10.1214/22-EJS2008

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

Authors: Hai Shu, Zhe Qu

Abstract: A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two d… ▽ More A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two datasets. However, their so-called common pattern only denotes the common latent factors but ignores the common pattern between the two coefficient matrices of these common latent factors. We propose a new unsupervised learning method, called the common and distinctive pattern analysis (CDPA), which appropriately defines the two types of data patterns by further incorporating the common and distinctive patterns of the coefficient matrices. A consistent estimation approach is developed for high-dimensional settings, and shows reasonably good finite-sample performance in simulations. Our simulation studies and real data analysis corroborate that the proposed CDPA can provide better characterization of common and distinctive patterns and thereby benefit data mining. △ Less

Submitted 5 April, 2022; v1 submitted 20 December, 2019; originally announced December 2019.

Journal ref: Electronic Journal of Statistics, 2022, 16 (1), 2475-2517

arXiv:1901.08669 [pdf, ps, other]

SAGA with Arbitrary Sampling

Authors: Xu Qian, Zheng Qu, Peter Richtárik

Abstract: We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models. One of the most celebrated methods in this context is the SAGA algorithm. Despite years of research on the topic, a general-purpose version of SAGA---one that would include arbitrary importance sampling and minibatching schemes---does not exi… ▽ More We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models. One of the most celebrated methods in this context is the SAGA algorithm. Despite years of research on the topic, a general-purpose version of SAGA---one that would include arbitrary importance sampling and minibatching schemes---does not exist. We remedy this situation and propose a general and flexible variant of SAGA following the {\em arbitrary sampling} paradigm. We perform an iteration complexity analysis of the method, largely possible due to the construction of new stochastic Lyapunov functions. We establish linear convergence rates in the smooth and strongly convex regime, and under a quadratic functional growth condition (i.e., in a regime not assuming strong convexity). Our rates match those of the primal-dual method Quartz for which an arbitrary sampling analysis is available, which makes a significant step towards closing the gap in our understanding of complexity of primal and dual methods for finite sum problems. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: 27 pages, 8 Figures, 1 algorithm

arXiv:1512.09103 [pdf, other]

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Authors: Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan

Abstract: Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor… ▽ More Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to $\sqrt{n}$. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice. △ Less

Submitted 27 May, 2016; v1 submitted 30 December, 2015; originally announced December 2015.

Comments: same result, but polished writing

arXiv:1502.08053 [pdf, ps, other]

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

Authors: Dominik Csiba, Zheng Qu, Peter Richtárik

Abstract: This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed proba… ▽ More This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods. △ Less

Submitted 27 February, 2015; originally announced February 2015.

Showing 1–17 of 17 results for author: Qu, Z