-
Solution-processed NiPS3 thin films from Liquid Exfoliated Inks with Long-Lived Spin-Entangled Excitons
Authors:
Andrii Shcherbakov,
Kevin Synnatschke,
Stanislav Bodnar,
Johnathan Zerhoch,
Lissa Eyre,
Felix Rauh,
Markus W. Heindl,
Shangpu Liu,
Jan Konecny,
Ian D. Sharp,
Zdenek Sofer,
Claudia Backes,
Felix Deschler
Abstract:
Antiferromagnets are promising materials for future opto-spintronic applications since they show spin dynamics in the THz range and no net magnetization. Recently, layered van der Waals (vdW) antiferromagnets have been reported, which combine low-dimensional excitonic properties with complex spin-structure. While various methods for the fabrication of vdW 2D crystals exist, formation of large area…
▽ More
Antiferromagnets are promising materials for future opto-spintronic applications since they show spin dynamics in the THz range and no net magnetization. Recently, layered van der Waals (vdW) antiferromagnets have been reported, which combine low-dimensional excitonic properties with complex spin-structure. While various methods for the fabrication of vdW 2D crystals exist, formation of large area and continuous thin films is challenging because of either limited scalability, synthetic complexity, or low opto-spintronic quality of the final material. Here, we fabricate centimeter-scale thin films of the van der Waals 2D antiferromagnetic material NiPS3, which we prepare using a crystal ink made from liquid phase exfoliation (LPE). We perform statistical atomic force microscopy (AFM) and scanning electron microscopy (SEM) to characterize and control the lateral size and number of layers through this ink-based fabrication. Using ultrafast optical spectroscopy at cryogenic temperatures, we resolve the dynamics of photoexcited excitons. We find antiferromagnetic spin arrangement and spin-entangled Zhang-Rice multiplet excitons with lifetimes in the nanosecond range, as well as ultranarrow emission linewidths, despite the disordered nature of our films. Thus, our findings demonstrate scalable thin-film fabrication of high-quality NiPS3, which is crucial for translating this 2D antiferromagnetic material into spintronic and nanoscale memory devices and further exploring its complex spin-light coupled states.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory
Authors:
Nicole Mitchell,
Johannes Ballé,
Zachary Charles,
Jakub Konečný
Abstract:
A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server. We present a comprehensive empirical study of the statistics of model updates in FL, as well as the role and benefits of various compression techniques. Motivated by these observations, we propose a novel method to reduce the average communicatio…
▽ More
A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server. We present a comprehensive empirical study of the statistics of model updates in FL, as well as the role and benefits of various compression techniques. Motivated by these observations, we propose a novel method to reduce the average communication cost, which is near-optimal in many use cases, and outperforms Top-K, DRIVE, 3LC and QSGD on Stack Overflow next-word prediction, a realistic and challenging FL benchmark. This is achieved by examining the problem using rate-distortion theory, and proposing distortion as a reliable proxy for model accuracy. Distortion can be more effectively used for optimizing the trade-off between model performance and communication cost across clients. We demonstrate empirically that in spite of the non-i.i.d. nature of federated learning, the rate-distortion frontier is consistent across datasets, optimizers, clients and training rounds.
△ Less
Submitted 19 May, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
A Field Guide to Federated Optimization
Authors:
Jianyu Wang,
Zachary Charles,
Zheng Xu,
Gauri Joshi,
H. Brendan McMahan,
Blaise Aguera y Arcas,
Maruan Al-Shedivat,
Galen Andrew,
Salman Avestimehr,
Katharine Daly,
Deepesh Data,
Suhas Diggavi,
Hubert Eichner,
Advait Gadhikar,
Zachary Garrett,
Antonious M. Girgis,
Filip Hanzely,
Andrew Hard,
Chaoyang He,
Samuel Horvath,
Zhouyuan Huo,
Alex Ingerman,
Martin Jaggi,
Tara Javidi,
Peter Kairouz
, et al. (28 additional authors not shown)
Abstract:
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and…
▽ More
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning
Authors:
Zachary Charles,
Jakub Konečný
Abstract:
We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms. We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss we exactly characterize. Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition nu…
▽ More
We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms. We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss we exactly characterize. Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition number of the surrogate loss and its alignment with the true loss. We derive novel convergence rates showcasing these trade-offs and highlight their importance in communication-limited settings. Using these insights, we are able to compare local update methods based on their convergence/accuracy trade-off, not just their convergence to critical points of the empirical loss. Our results shed new light on a broad range of phenomena, including the efficacy of server momentum in federated learning and the impact of proximal client updates.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
LinCbO: fast algorithm for computation of the Duquenne-Guigues basis
Authors:
Radek Janostik,
Jan Konecny,
Petr Krajča
Abstract:
We propose and evaluate a novel algorithm for computation of the Duquenne-Guigues basis which combines Close-by-One and LinClosure algorithms. This combination enables us to reuse attribute counters used in LinClosure and speed up the computation. Our experimental evaluation shows that it is the most efficient algorithm for computation of the Duquenne-Guigues basis.
We propose and evaluate a novel algorithm for computation of the Duquenne-Guigues basis which combines Close-by-One and LinClosure algorithms. This combination enables us to reuse attribute counters used in LinClosure and speed up the computation. Our experimental evaluation shows that it is the most efficient algorithm for computation of the Duquenne-Guigues basis.
△ Less
Submitted 22 January, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
LCM from FCA Point of View: A CbO-style Algorithm with Speed-up Features
Authors:
Radek Janostik,
Jan Konecny,
Petr Krajča
Abstract:
LCM is an algorithm for enumeration of frequent closed itemsets in transaction databases. It is well known that when we ignore the required frequency, the closed itemsets are exactly intents of formal concepts in Formal Concept Analysis (FCA). We describe LCM in terms of FCA and show that LCM is basically the Close-by-One algorithm with multiple speed-up features for processing sparse data. We ana…
▽ More
LCM is an algorithm for enumeration of frequent closed itemsets in transaction databases. It is well known that when we ignore the required frequency, the closed itemsets are exactly intents of formal concepts in Formal Concept Analysis (FCA). We describe LCM in terms of FCA and show that LCM is basically the Close-by-One algorithm with multiple speed-up features for processing sparse data. We analyze the speed-up features and compare them with those of similar FCA algorithms, like FCbO and algorithms from the In-Close family.
△ Less
Submitted 22 January, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
On the Outsized Importance of Learning Rates in Local Update Methods
Authors:
Zachary Charles,
Jakub Konečný
Abstract:
We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate l…
▽ More
We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate loss, as well as the distance between the minimizers of the surrogate and true loss functions. We use this theory to derive novel convergence rates for federated averaging that showcase this trade-off between the condition number of the surrogate loss and its alignment with the true loss function. We validate our results empirically, showing that in communication-limited settings, proper learning rate tuning is often sufficient to reach near-optimal behavior. We also present a practical method for automatic learning rate decay in local update methods that helps reduce the need for learning rate tuning, and highlight its empirical performance on a variety of tasks and datasets.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Adaptive Federated Optimization
Authors:
Sashank Reddi,
Zachary Charles,
Manzil Zaheer,
Zachary Garrett,
Keith Rush,
Jakub Konečný,
Sanjiv Kumar,
H. Brendan McMahan
Abstract:
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have…
▽ More
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyze their convergence in the presence of heterogeneous data for general non-convex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.
△ Less
Submitted 8 September, 2021; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Advances and Open Problems in Federated Learning
Authors:
Peter Kairouz,
H. Brendan McMahan,
Brendan Avent,
Aurélien Bellet,
Mehdi Bennis,
Arjun Nitin Bhagoji,
Kallista Bonawitz,
Zachary Charles,
Graham Cormode,
Rachel Cummings,
Rafael G. L. D'Oliveira,
Hubert Eichner,
Salim El Rouayheb,
David Evans,
Josh Gardner,
Zachary Garrett,
Adrià Gascón,
Badih Ghazi,
Phillip B. Gibbons,
Marco Gruteser,
Zaid Harchaoui,
Chaoyang He,
Lie He,
Zhouyuan Huo,
Ben Hutchinson
, et al. (34 additional authors not shown)
Abstract:
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re…
▽ More
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
△ Less
Submitted 8 March, 2021; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Federated Learning with Autotuned Communication-Efficient Secure Aggregation
Authors:
Keith Bonawitz,
Fariborz Salehi,
Jakub Konečný,
Brendan McMahan,
Marco Gruteser
Abstract:
Federated Learning enables mobile devices to collaboratively learn a shared inference model while kee** all the training data on a user's device, decoupling the ability to do machine learning from the need to store the data in the cloud. Existing work on federated learning with limited communication demonstrates how random rotation can enable users' model updates to be quantized much more effici…
▽ More
Federated Learning enables mobile devices to collaboratively learn a shared inference model while kee** all the training data on a user's device, decoupling the ability to do machine learning from the need to store the data in the cloud. Existing work on federated learning with limited communication demonstrates how random rotation can enable users' model updates to be quantized much more efficiently, reducing the communication cost between users and the server. Meanwhile, secure aggregation enables the server to learn an aggregate of at least a threshold number of device's model contributions without observing any individual device's contribution in unaggregated form. In this paper, we highlight some of the challenges of setting the parameters for secure aggregation to achieve communication efficiency, especially in the context of the aggressively quantized inputs enabled by random rotation. We then develop a recipe for auto-tuning communication-efficient secure aggregation, based on specific properties of random rotation and secure aggregation -- namely, the predictable distribution of vector entries post-rotation and the modular wrap** inherent in secure aggregation. We present both theoretical results and initial experiments.
△ Less
Submitted 29 November, 2019;
originally announced December 2019.
-
Improving Federated Learning Personalization via Model Agnostic Meta Learning
Authors:
Yihan Jiang,
Jakub Konečný,
Keith Rush,
Sreeram Kannan
Abstract:
Federated Learning (FL) refers to learning a high quality global model based on decentralized data storage, without ever copying the raw data. A natural scenario arises with data created on mobile phones by the activity of their users. Given the typical data heterogeneity in such situations, it is natural to ask how can the global model be personalized for every such device, individually. In this…
▽ More
Federated Learning (FL) refers to learning a high quality global model based on decentralized data storage, without ever copying the raw data. A natural scenario arises with data created on mobile phones by the activity of their users. Given the typical data heterogeneity in such situations, it is natural to ask how can the global model be personalized for every such device, individually. In this work, we point out that the setting of Model Agnostic Meta Learning (MAML), where one optimizes for a fast, gradient-based, few-shot adaptation to a heterogeneous distribution of tasks, has a number of similarities with the objective of personalization for FL. We present FL as a natural source of practical applications for MAML algorithms, and make the following observations. 1) The popular FL algorithm, Federated Averaging, can be interpreted as a meta learning algorithm. 2) Careful fine-tuning can yield a global model with higher accuracy, which is at the same time easier to personalize. However, solely optimizing for the global model accuracy yields a weaker personalization result. 3) A model trained using a standard datacenter optimization method is much harder to personalize, compared to one trained using Federated Averaging, supporting the first claim. These results raise new questions for FL, MAML, and broader ML research.
△ Less
Submitted 18 January, 2023; v1 submitted 27 September, 2019;
originally announced September 2019.
-
MLSys: The New Frontier of Machine Learning Systems
Authors:
Alexander Ratner,
Dan Alistarh,
Gustavo Alonso,
David G. Andersen,
Peter Bailis,
Sarah Bird,
Nicholas Carlini,
Bryan Catanzaro,
Jennifer Chayes,
Eric Chung,
Bill Dally,
Jeff Dean,
Inderjit S. Dhillon,
Alexandros Dimakis,
Pradeep Dubey,
Charles Elkan,
Grigori Fursin,
Gregory R. Ganger,
Lise Getoor,
Phillip B. Gibbons,
Garth A. Gibson,
Joseph E. Gonzalez,
Justin Gottschlich,
Song Han,
Kim Hazelwood
, et al. (44 additional authors not shown)
Abstract:
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne…
▽ More
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.
△ Less
Submitted 1 December, 2019; v1 submitted 29 March, 2019;
originally announced April 2019.
-
Towards Federated Learning at Scale: System Design
Authors:
Keith Bonawitz,
Hubert Eichner,
Wolfgang Grieskamp,
Dzmitry Huba,
Alex Ingerman,
Vladimir Ivanov,
Chloe Kiddon,
Jakub Konečný,
Stefano Mazzocchi,
H. Brendan McMahan,
Timon Van Overveldt,
David Petrou,
Daniel Ramage,
Jason Roselander
Abstract:
Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and…
▽ More
Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.
△ Less
Submitted 22 March, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion
Authors:
Filip Hanzely,
Jakub Konečný,
Nicolas Loizou,
Peter Richtárik,
Dmitry Grishchenko
Abstract:
In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for the method and perform extensive numerical experiments.
In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for the method and perform extensive numerical experiments.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Expanding the Reach of Federated Learning by Reducing Client Resource Requirements
Authors:
Sebastian Caldas,
Jakub Konečny,
H. Brendan McMahan,
Ameet Talwalkar
Abstract:
Communication on heterogeneous edge networks is a fundamental bottleneck in Federated Learning (FL), restricting both model capacity and user participation. To address this issue, we introduce two novel strategies to reduce communication costs: (1) the use of lossy compression on the global model sent server-to-client; and (2) Federated Dropout, which allows users to efficiently train locally on s…
▽ More
Communication on heterogeneous edge networks is a fundamental bottleneck in Federated Learning (FL), restricting both model capacity and user participation. To address this issue, we introduce two novel strategies to reduce communication costs: (1) the use of lossy compression on the global model sent server-to-client; and (2) Federated Dropout, which allows users to efficiently train locally on smaller subsets of the global model and also provides a reduction in both client-to-server communication and local computation. We empirically show that these strategies, combined with existing compression approaches for client-to-server communication, collectively provide up to a $14\times$ reduction in server-to-client communication, a $1.7\times$ reduction in local computation, and a $28\times$ reduction in upload communication, all without degrading the quality of the final model. We thus comprehensively reduce FL's impact on client device resources, allowing higher capacity models to be trained, and a more diverse set of users to be reached.
△ Less
Submitted 8 January, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
LEAF: A Benchmark for Federated Settings
Authors:
Sebastian Caldas,
Sai Meher Karthik Duddu,
Peter Wu,
Tian Li,
Jakub Konečný,
H. Brendan McMahan,
Virginia Smith,
Ameet Talwalkar
Abstract:
Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and mult…
▽ More
Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
△ Less
Submitted 9 December, 2019; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Note on Representing attribute reduction and concepts in concepts lattice using graphs
Authors:
Jan Konecny
Abstract:
Mao H. (2017, Representing attribute reduction and concepts in concept lattice using graphs. Soft Computing 21(24):7293--7311) claims to make contributions to the study of reduction of attributes in concept lattices by using graph theory. We show that her results are either trivial or already well-known and all three algorithms proposed in the paper are incorrect.
Mao H. (2017, Representing attribute reduction and concepts in concept lattice using graphs. Soft Computing 21(24):7293--7311) claims to make contributions to the study of reduction of attributes in concept lattices by using graph theory. We show that her results are either trivial or already well-known and all three algorithms proposed in the paper are incorrect.
△ Less
Submitted 30 May, 2018; v1 submitted 15 November, 2017;
originally announced November 2017.
-
Stochastic, Distributed and Federated Optimization for Machine Learning
Authors:
Jakub Konečný
Abstract:
We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single comp…
▽ More
We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
Privacy Preserving Randomized Gossip Algorithms
Authors:
Filip Hanzely,
Jakub Konečný,
Nicolas Loizou,
Peter Richtárik,
Dmitry Grishchenko
Abstract:
In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.
In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.
△ Less
Submitted 23 June, 2017;
originally announced June 2017.
-
Randomized Distributed Mean Estimation: Accuracy vs Communication
Authors:
Jakub Konečný,
Peter Richtárik
Abstract:
We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algor…
▽ More
We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an $ε$-bit communication and ${\cal O}\left(1/(εn)\right)$ error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining $\mathcal{O}(r/n)$ error, where $r$ is the number of bits used to represent a floating point value.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.
-
Federated Learning: Strategies for Improving Communication Efficiency
Authors:
Jakub Konečný,
H. Brendan McMahan,
Felix X. Yu,
Peter Richtárik,
Ananda Theertha Suresh,
Dave Bacon
Abstract:
Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local dat…
▽ More
Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model. The typical clients in this setting are mobile phones, and communication efficiency is of the utmost importance.
In this paper, we propose two ways to reduce the uplink communication costs: structured updates, where we directly learn an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, where we learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling before sending it to the server. Experiments on both convolutional and recurrent networks show that the proposed methods can reduce the communication cost by two orders of magnitude.
△ Less
Submitted 30 October, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
Authors:
Jakub Konečný,
H. Brendan McMahan,
Daniel Ramage,
Peter Richtárik
Abstract:
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes. The goal is to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of the utmost importance and minimizin…
▽ More
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes. The goal is to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of the utmost importance and minimizing the number of rounds of communication is the principal goal.
A motivating example arises when we keep the training data locally on users' mobile devices instead of logging it to a data center for training. In federated optimziation, the devices are used as compute nodes performing computation on their local data in order to update a global model. We suppose that we have extremely large number of devices in the network --- as many as the number of users of a given service, each of which has only a tiny fraction of the total data available. In particular, we expect the number of data points available locally to be much smaller than the number of devices. Additionally, since different users generate data with different patterns, it is reasonable to assume that no device has a representative sample of the overall distribution.
We show that existing algorithms are not suitable for this setting, and propose a new algorithm which shows encouraging experimental results for sparse convex problems. This work also sets a path for future research needed in the context of \federated optimization.
△ Less
Submitted 8 October, 2016;
originally announced October 2016.
-
AIDE: Fast and Communication Efficient Distributed Optimization
Authors:
Sashank J. Reddi,
Jakub Konečný,
Peter Richtárik,
Barnabás Póczós,
Alex Smola
Abstract:
In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be…
▽ More
In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be viewed as a robustification strategy since the method is substantially better behaved than DANE on data partition arising in practice. It is well known that DANE algorithm does not match the communication complexity lower bounds. To bridge this gap, we propose an accelerated variant of the first method, called AIDE, that not only matches the communication lower bounds but can also be implemented using a purely first-order oracle. Our empirical results show that AIDE is superior to other communication efficient algorithms in settings that naturally arise in machine learning applications.
△ Less
Submitted 24 August, 2016;
originally announced August 2016.
-
Distributed Optimization with Arbitrary Local Solvers
Authors:
Chenxin Ma,
Jakub Konečný,
Martin Jaggi,
Virginia Smith,
Michael I. Jordan,
Peter Richtárik,
Martin Takáč
Abstract:
With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on develo** highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performanc…
▽ More
With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on develo** highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performance of their well-tuned and customized single machine counterparts. Further, they are unable to easily integrate improvements that continue to be made to single machine methods. To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods. We give strong primal-dual convergence rate guarantees for our framework that hold for arbitrary local solvers. We demonstrate the impact of local solver selection both theoretically and in an extensive experimental comparison. Finally, we provide thorough implementation details for our framework, highlighting areas for practical performance gains.
△ Less
Submitted 3 August, 2016; v1 submitted 13 December, 2015;
originally announced December 2015.
-
Federated Optimization:Distributed Optimization Beyond the Datacenter
Authors:
Jakub Konečný,
Brendan McMahan,
Daniel Ramage
Abstract:
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of utmost importance.
A…
▽ More
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of utmost importance.
A motivating example for federated optimization arises when we keep the training data locally on users' mobile devices rather than logging it to a data center for training. Instead, the mobile devices are used as nodes performing computation on their local data in order to update a global model. We suppose that we have an extremely large number of devices in our network, each of which has only a tiny fraction of data available totally; in particular, we expect the number of data points available locally to be much smaller than the number of devices. Additionally, since different users generate data with different patterns, we assume that no device has a representative sample of the overall distribution.
We show that existing algorithms are not suitable for this setting, and propose a new algorithm which shows encouraging experimental results. This work also sets a path for future research needed in the context of federated optimization.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Stop Wasting My Gradients: Practical SVRG
Authors:
Reza Babanezhad,
Mohamed Osama Ahmed,
Alim Virani,
Mark Schmidt,
Jakub Konečný,
Scott Sallinen
Abstract:
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the…
▽ More
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the early iterations. We further (i) show how to exploit support vectors to reduce the number of gradient computations in the later iterations, (ii) prove that the commonly-used regularized SVRG iteration is justified and improves the convergence rate, (iii) consider alternate mini-batch selection strategies, and (iv) consider the generalization error of the method.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
Complete relations on fuzzy complete lattices
Authors:
Jan Konecny,
Michal Krupka
Abstract:
We generalize the notion of complete binary relation on complete lattice to residuated lattice valued ordered sets and show its properties. Then we focus on complete fuzzy tolerances on fuzzy complete lattices and prove they are in one-to-one correspondence with extensive isotone Galois connections. Finally, we prove that fuzzy complete lattice, factorized by a complete fuzzy tolerance, is again a…
▽ More
We generalize the notion of complete binary relation on complete lattice to residuated lattice valued ordered sets and show its properties. Then we focus on complete fuzzy tolerances on fuzzy complete lattices and prove they are in one-to-one correspondence with extensive isotone Galois connections. Finally, we prove that fuzzy complete lattice, factorized by a complete fuzzy tolerance, is again a fuzzy complete lattice.
△ Less
Submitted 12 June, 2015;
originally announced June 2015.
-
Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting
Authors:
Jakub Konečný,
Jie Liu,
Peter Richtárik,
Martin Takáč
Abstract:
We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a determ…
▽ More
We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the complexity of the method and show that it benefits from two speedup effects. First, we prove that as long as $b$ is below a certain threshold, we can reach any predefined accuracy with less overall work than without mini-batching. Second, our mini-batching scheme admits a simple parallel implementation, and hence is suitable for further acceleration by parallelization.
△ Less
Submitted 16 November, 2015; v1 submitted 16 April, 2015;
originally announced April 2015.
-
Semi-Stochastic Coordinate Descent
Authors:
Jakub Konečný,
Zheng Qu,
Peter Richtárik
Abstract:
We propose a novel stochastic gradient method---semi-stochastic coordinate descent (S2CD)---for the problem of minimizing a strongly convex function represented as the average of a large number of smooth convex functions: $f(x)=\tfrac{1}{n}\sum_i f_i(x)$. Our method first performs a deterministic step (computation of the gradient of $f$ at the starting point), followed by a large number of stochas…
▽ More
We propose a novel stochastic gradient method---semi-stochastic coordinate descent (S2CD)---for the problem of minimizing a strongly convex function represented as the average of a large number of smooth convex functions: $f(x)=\tfrac{1}{n}\sum_i f_i(x)$. Our method first performs a deterministic step (computation of the gradient of $f$ at the starting point), followed by a large number of stochastic steps. The process is repeated a few times, with the last stochastic iterate becoming the new starting point where the deterministic step is taken. The novelty of our method is in how the stochastic steps are performed. In each such step, we pick a random function $f_i$ and a random coordinate $j$---both using nonuniform distributions---and update a single coordinate of the decision vector only, based on the computation of the $j^{th}$ partial derivative of $f_i$ at two different points. Each random step of the method constitutes an unbiased estimate of the gradient of $f$ and moreover, the squared norm of the steps goes to zero in expectation, meaning that the stochastic estimate of the gradient progressively improves. The complexity of the method is the sum of two terms: $O(n\log(1/ε))$ evaluations of gradients $\nabla f_i$ and $O(\hatκ\log(1/ε))$ evaluations of partial derivatives $\nabla_j f_i$, where $\hatκ$ is a novel condition number.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting
Authors:
Jakub Konečný,
Jie Liu,
Peter Richtárik,
Martin Takáč
Abstract:
We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an average of a large number of smooth convex functions, and simple nonsmooth convex function. Our method first performs a deterministic step (computation of the g…
▽ More
We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an average of a large number of smooth convex functions, and simple nonsmooth convex function. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the complexity of the method and show that the method benefits from two speedup effects. First, we prove that as long as $b$ is below a certain threshold, we can reach predefined accuracy with less overall work than without mini-batching. Second, our mini-batching scheme admits a simple parallel implementation, and hence is suitable for further acceleration by parallelization.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
Simple Complexity Analysis of Simplified Direct Search
Authors:
Jakub Konečný,
Peter Richtárik
Abstract:
We consider the problem of unconstrained minimization of a smooth function in the derivative-free setting using. In particular, we propose and study a simplified variant of the direct search method (of direction type), which we call simplified direct search (SDS). Unlike standard direct search methods, which depend on a large number of parameters that need to be tuned, SDS depends on a single scal…
▽ More
We consider the problem of unconstrained minimization of a smooth function in the derivative-free setting using. In particular, we propose and study a simplified variant of the direct search method (of direction type), which we call simplified direct search (SDS). Unlike standard direct search methods, which depend on a large number of parameters that need to be tuned, SDS depends on a single scalar parameter only.
Despite relevant research activity in direct search methods spanning several decades, complexity guarantees---bounds on the number of function evaluations needed to find an approximate solution---were not established until very recently. In this paper we give a surprisingly brief and unified analysis of SDS for nonconvex, convex and strongly convex functions. We match the existing complexity results for direct search in their dependence on the problem dimension ($n$) and error tolerance ($ε$), but the overall bounds are simpler, easier to interpret, and have better dependence on other problem parameters. In particular, we show that for the set of directions formed by the standard coordinate vectors and their negatives, the number of function evaluations needed to find an $ε$-solution is $O(n^2 /ε)$ (resp. $O(n^2 \log(1/ε))$) for the problem of minimizing a convex (resp. strongly convex) smooth function. In the nonconvex smooth case, the bound is $O(n^2/ε^2)$, with the goal being the reduction of the norm of the gradient below $ε$.
△ Less
Submitted 13 November, 2014; v1 submitted 1 October, 2014;
originally announced October 2014.
-
One-Shot-Learning Gesture Recognition using HOG-HOF Features
Authors:
Jakub Konečný,
Michal Hagara
Abstract:
The purpose of this paper is to describe one-shot-learning gesture recognition systems developed on the \textit{ChaLearn Gesture Dataset}. We use RGB and depth images and combine appearance (Histograms of Oriented Gradients) and motion descriptors (Histogram of Optical Flow) for parallel temporal segmentation and recognition. The Quadratic-Chi distance family is used to measure differences between…
▽ More
The purpose of this paper is to describe one-shot-learning gesture recognition systems developed on the \textit{ChaLearn Gesture Dataset}. We use RGB and depth images and combine appearance (Histograms of Oriented Gradients) and motion descriptors (Histogram of Optical Flow) for parallel temporal segmentation and recognition. The Quadratic-Chi distance family is used to measure differences between histograms to capture cross-bin relationships. We also propose a new algorithm for trimming videos --- to remove all the unimportant frames from videos. We present two methods that use combination of HOG-HOF descriptors together with variants of Dynamic Time War** technique. Both methods outperform other published methods and help narrow down the gap between human performance and algorithms on this task. The code has been made publicly available in the MLOSS repository.
△ Less
Submitted 15 February, 2014; v1 submitted 15 December, 2013;
originally announced December 2013.
-
Semi-Stochastic Gradient Descent Methods
Authors:
Jakub Konečný,
Peter Richtárik
Abstract:
In this paper we study the problem of minimizing the average of a large number ($n$) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an…
▽ More
In this paper we study the problem of minimizing the average of a large number ($n$) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((κ/n)\log(1/\varepsilon))$, where $κ$ is the condition number. This is achieved by running the method for $O(\log(1/\varepsilon))$ epochs, with a single gradient evaluation and $O(κ)$ stochastic gradient evaluations in each. The SVRG method of Johnson and Zhang arises as a special case. If our method is limited to a single epoch only, it needs to evaluate at most $O((κ/\varepsilon)\log(1/\varepsilon))$ stochastic gradients. In contrast, SVRG requires $O(κ/\varepsilon^2)$ stochastic gradients. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find an $10^{-6}$-accurate solution for a problem with $n=10^9$ and $κ=10^3$.
△ Less
Submitted 16 June, 2015; v1 submitted 5 December, 2013;
originally announced December 2013.