-
Distributed Continual Learning with CoCoA in High-dimensional Linear Regression
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
We consider estimation under scenarios where the signals of interest exhibit change of characteristics over time. In particular, we consider the continual learning problem where different tasks, e.g., data with different distributions, arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the contin…
▽ More
We consider estimation under scenarios where the signals of interest exhibit change of characteristics over time. In particular, we consider the continual learning problem where different tasks, e.g., data with different distributions, arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the problem from a distributed estimation perspective. We consider the well-established distributed learning algorithm COCOA, which distributes the model parameters and the corresponding features over the network. We provide exact analytical characterization for the generalization error of COCOA under continual learning for linear regression in a range of scenarios, where overparameterization is of particular interest. These analytical results characterize how the generalization error depends on the network structure, the task similarity and the number of tasks, and show how these dependencies are intertwined. In particular, our results show that the generalization error can be significantly reduced by adjusting the network size, where the most favorable network size depends on task similarity and the number of tasks. We present numerical results verifying the theoretical analysis and illustrate the continual learning performance of COCOA with a digit classification task.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Improving Generalization in Game Agents with Data Augmentation in Imitation Learning
Authors:
Derek Yadgaroff,
Alessandro Sestini,
Konrad Tollmar,
Ayca Ozcelikkale,
Linus Gisslén
Abstract:
Imitation learning is an effective approach for training game-playing agents and, consequently, for efficient game production. However, generalization - the ability to perform well in related but unseen scenarios - is an essential requirement that remains an unsolved challenge for game AI. Generalization is difficult for imitation learning agents because it requires the algorithm to take meaningfu…
▽ More
Imitation learning is an effective approach for training game-playing agents and, consequently, for efficient game production. However, generalization - the ability to perform well in related but unseen scenarios - is an essential requirement that remains an unsolved challenge for game AI. Generalization is difficult for imitation learning agents because it requires the algorithm to take meaningful actions outside of the training distribution. In this paper we propose a solution to this challenge. Inspired by the success of data augmentation in supervised learning, we augment the training data so the distribution of states and actions in the dataset better represents the real state-action distribution. This study evaluates methods for combining and applying data augmentations to observations, to improve generalization of imitation learning agents. It also provides a performance benchmark of these augmentations across several 3D environments. These results demonstrate that data augmentation is a promising framework for improving generalization in imitation learning agents.
△ Less
Submitted 7 April, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Authors:
Jason Yik,
Korneel Van den Berghe,
Douwe den Blanken,
Younes Bouhadjar,
Maxime Fabre,
Paul Hueber,
Denis Kleyko,
Noah Pacik-Nelson,
Pao-Sheng Vincent Sun,
Guangzhi Tang,
Shenqi Wang,
Biyan Zhou,
Soikat Hasan Ahmed,
George Vathakkattil Joseph,
Benedetto Leto,
Aurora Micheli,
Anurag Kumar Mishra,
Gregor Lenz,
Tao Sun,
Zergham Ahmed,
Mahmoud Akl,
Brian Anderson,
Andreas G. Andreou,
Chiara Bartolozzi,
Arindam Basu
, et al. (73 additional authors not shown)
Abstract:
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu…
▽ More
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.
△ Less
Submitted 17 January, 2024; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Communication Trade-offs in Federated Learning of Spiking Neural Networks
Authors:
Soumi Chaki,
David Weinberg,
Ayca Özcelikkale
Abstract:
Spiking Neural Networks (SNNs) are biologically inspired alternatives to conventional Artificial Neural Networks (ANNs). Despite promising preliminary results, the trade-offs in the training of SNNs in a distributed scheme are not well understood. Here, we consider SNNs in a federated learning setting where a high-quality global model is created by aggregating multiple local models from the client…
▽ More
Spiking Neural Networks (SNNs) are biologically inspired alternatives to conventional Artificial Neural Networks (ANNs). Despite promising preliminary results, the trade-offs in the training of SNNs in a distributed scheme are not well understood. Here, we consider SNNs in a federated learning setting where a high-quality global model is created by aggregating multiple local models from the clients without sharing any data. We investigate federated learning for training multiple SNNs at clients when two mechanisms reduce the uplink communication cost: i) random masking of the model updates sent from the clients to the server; and ii) client dropouts where some clients do not send their updates to the server. We evaluated the performance of the SNNs using a subset of the Spiking Heidelberg digits (SHD) dataset. The results show that a trade-off between the random masking and the client drop probabilities is crucial to obtain a satisfactory performance for a fixed number of clients.
△ Less
Submitted 27 February, 2023;
originally announced March 2023.
-
Regularization Trade-offs with Fake Features
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
Recent successes of massively overparameterized models have inspired a new line of work investigating the underlying conditions that enable overparameterized models to generalize well. This paper considers a framework where the possibly overparametrized model includes fake features, i.e., features that are present in the model but not in the data. We present a non-asymptotic high-probability bound…
▽ More
Recent successes of massively overparameterized models have inspired a new line of work investigating the underlying conditions that enable overparameterized models to generalize well. This paper considers a framework where the possibly overparametrized model includes fake features, i.e., features that are present in the model but not in the data. We present a non-asymptotic high-probability bound on the generalization error of the ridge regression problem under the model misspecification of having fake features. Our highprobability results provide insights into the interplay between the implicit regularization provided by the fake features and the explicit regularization provided by the ridge parameter. Numerical results illustrate the trade-off between the number of fake features and how the optimal ridge parameter may heavily depend on the number of fake features.
△ Less
Submitted 5 December, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Continual Learning with Distributed Optimization: Does CoCoA Forget?
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
We focus on the continual learning problem where the tasks arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the distributed estimation framework. We consider the well-established distributed learning algorithm…
▽ More
We focus on the continual learning problem where the tasks arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the distributed estimation framework. We consider the well-established distributed learning algorithm COCOA. We derive closed form expressions for the iterations for the overparametrized case. We illustrate the convergence and the error performance of the algorithm based on the over/under-parameterization of the problem. Our results show that depending on the problem dimensions and data generation assumptions, COCOA can perform continual learning over a sequence of tasks, i.e., it can learn a new task without forgetting previously learned tasks, with access only to one task at a time.
△ Less
Submitted 5 December, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Random Features Model with General Convex Regularization: A Fine Grained Analysis with Precise Asymptotic Learning Curves
Authors:
David Bosch,
Ashkan Panahi,
Ayca Özcelikkale,
Devdatt Dubhash
Abstract:
We compute precise asymptotic expressions for the learning curves of least squares random feature (RF) models with either a separable strongly convex regularization or the $\ell_1$ regularization. We propose a novel multi-level application of the convex Gaussian min max theorem (CGMT) to overcome the traditional difficulty of finding computable expressions for random features models with correlate…
▽ More
We compute precise asymptotic expressions for the learning curves of least squares random feature (RF) models with either a separable strongly convex regularization or the $\ell_1$ regularization. We propose a novel multi-level application of the convex Gaussian min max theorem (CGMT) to overcome the traditional difficulty of finding computable expressions for random features models with correlated data. Our result takes the form of a computable 4-dimensional scalar optimization. In contrast to previous results, our approach does not require solving an often intractable proximal operator, which scales with the number of model parameters. Furthermore, we extend the universality results for the training and generalization errors for RF models to $\ell_1$ regularization. In particular, we demonstrate that under mild conditions, random feature models with elastic net or $\ell_1$ regularization are asymptotically equivalent to a surrogate Gaussian model with the same first and second moments. We numerically demonstrate the predictive capacity of our results, and show experimentally that the predicted test error is accurate even in the non-asymptotic regime.
△ Less
Submitted 1 March, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Estimation under Model Misspecification with Fake Features
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
We consider estimation under model misspecification where there is a model mismatch between the underlying system, which generates the data, and the model used during estimation. We propose a model misspecification framework which enables a joint treatment of the model misspecification types of having fake features as well as incorrect covariance assumptions on the unknowns and the noise. We prese…
▽ More
We consider estimation under model misspecification where there is a model mismatch between the underlying system, which generates the data, and the model used during estimation. We propose a model misspecification framework which enables a joint treatment of the model misspecification types of having fake features as well as incorrect covariance assumptions on the unknowns and the noise. We present a decomposition of the output error into components that relate to different subsets of the model parameters corresponding to underlying, fake and missing features. Here, fake features are features which are included in the model but are not present in the underlying system. Under this framework, we characterize the estimation performance and reveal trade-offs between the number of samples, number of fake features, and the possibly incorrect noise level assumption. In contrast to existing work focusing on incorrect covariance assumptions or missing features, fake features is a central component of our framework. Our results show that fake features can significantly improve the estimation performance, even though they are not correlated with the features in the underlying system. In particular, we show that the estimation error can be decreased by including more fake features in the model, even to the point where the model is overparametrized, i.e., the model contains more unknowns than observations.
△ Less
Submitted 30 November, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Model Mismatch Trade-offs in LMMSE Estimation
Authors:
Martin Hellkvist,
Ayça Özçelikkale
Abstract:
We consider a linear minimum mean squared error (LMMSE) estimation framework with model mismatch where the assumed model order is smaller than that of the underlying linear system which generates the data used in the estimation process. By modelling the regressors of the underlying system as random variables, we analyze the average behaviour of the mean squared error (MSE). Our results quantify ho…
▽ More
We consider a linear minimum mean squared error (LMMSE) estimation framework with model mismatch where the assumed model order is smaller than that of the underlying linear system which generates the data used in the estimation process. By modelling the regressors of the underlying system as random variables, we analyze the average behaviour of the mean squared error (MSE). Our results quantify how the MSE depends on the interplay between the number of samples and the number of parameters in the underlying system and in the assumed model. In particular, if the number of samples is not sufficiently large, neither increasing the number of samples nor the assumed model complexity is sufficient to guarantee a performance improvement.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Chance-Constrained Active Inference
Authors:
Thijs van de Laar,
Ismail Senoz,
Ayça Özçelikkale,
Henk Wymeersch
Abstract:
Active Inference (ActInf) is an emerging theory that explains perception and action in biological agents, in terms of minimizing a free energy bound on Bayesian surprise. Goal-directed behavior is elicited by introducing prior beliefs on the underlying generative model. In contrast to prior beliefs, which constrain all realizations of a random variable, we propose an alternative approach through c…
▽ More
Active Inference (ActInf) is an emerging theory that explains perception and action in biological agents, in terms of minimizing a free energy bound on Bayesian surprise. Goal-directed behavior is elicited by introducing prior beliefs on the underlying generative model. In contrast to prior beliefs, which constrain all realizations of a random variable, we propose an alternative approach through chance constraints, which allow for a (typically small) probability of constraint violation, and demonstrate how such constraints can be used as intrinsic drivers for goal-directed behavior in ActInf. We illustrate how chance-constrained ActInf weights all imposed (prior) constraints on the generative model, allowing e.g., for a trade-off between robust control and empirical chance constraint violation. Secondly, we interpret the proposed solution within a message passing framework. Interestingly, the message passing interpretation is not only relevant to the context of ActInf, but also provides a general purpose approach that can account for chance constraints on graphical models. The chance constraint message updates can then be readily combined with other pre-derived message update rules, without the need for custom derivations. The proposed chance-constrained message passing framework thus accelerates the search for workable models in general, and can be used to complement message-passing formulations on generative neural models.
△ Less
Submitted 6 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Linear Regression with Distributed Learning: A Generalization Error Perspective
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear regression where the model parameters, i.e., the unknowns, are distributed over the network. We adopt a statistical learning approach. In contrast to works that foc…
▽ More
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear regression where the model parameters, i.e., the unknowns, are distributed over the network. We adopt a statistical learning approach. In contrast to works that focus on the performance on the training data, we focus on the generalization error, i.e., the performance on unseen data. We provide high-probability bounds on the generalization error for both isotropic and correlated Gaussian data as well as sub-gaussian data. These results reveal the dependence of the generalization performance on the partitioning of the model over the network. In particular, our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution even when the error on the training data is at the same level for both the centralized and distributed approaches. Our numerical results illustrate the performance with both real-world image data as well as synthetic data.
△ Less
Submitted 18 August, 2021; v1 submitted 22 January, 2021;
originally announced January 2021.
-
Generalization Error for Linear Regression under Distributed Learning
Authors:
Martin Hellkvist,
Ayça Özçelikkale,
Anders Ahlén
Abstract:
Distributed learning facilitates the scaling-up of data processing by distributing the computational burden over several nodes. Despite the vast interest in distributed learning, generalization performance of such approaches is not well understood. We address this gap by focusing on a linear regression setting. We consider the setting where the unknowns are distributed over a network of nodes. We…
▽ More
Distributed learning facilitates the scaling-up of data processing by distributing the computational burden over several nodes. Despite the vast interest in distributed learning, generalization performance of such approaches is not well understood. We address this gap by focusing on a linear regression setting. We consider the setting where the unknowns are distributed over a network of nodes. We present an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes. In particular, for the overparameterized case, our results show that while the error on training data remains in the same range as that of the centralized solution, the generalization error of the distributed solution increases dramatically compared to that of the centralized solution when the number of unknowns estimated at any node is close to the number of observations. We further provide numerical examples to verify our analytical expressions.
△ Less
Submitted 4 May, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Sparse Recovery With Non-Linear Fourier Features
Authors:
Ayca Ozcelikkale
Abstract:
Random non-linear Fourier features have recently shown remarkable performance in a wide-range of regression and classification applications. Motivated by this success, this article focuses on a sparse non-linear Fourier feature (NFF) model. We provide a characterization of the sufficient number of data points that guarantee perfect recovery of the unknown parameters with high-probability. In parti…
▽ More
Random non-linear Fourier features have recently shown remarkable performance in a wide-range of regression and classification applications. Motivated by this success, this article focuses on a sparse non-linear Fourier feature (NFF) model. We provide a characterization of the sufficient number of data points that guarantee perfect recovery of the unknown parameters with high-probability. In particular, we show how the sufficient number of data points depends on the kernel matrix associated with the probability distribution function of the input data. We compare our results with the recoverability bounds for the bounded orthonormal systems and provide examples that illustrate sparse recovery under the NFF model.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Optimization vs. Reinforcement Learning for Wirelessly Powered Sensor Networks
Authors:
Ayca Ozcelikkale,
Mehmet Koseoglu,
Mani Srivastava
Abstract:
We consider a sensing application where the sensor nodes are wirelessly powered by an energy beacon. We focus on the problem of jointly optimizing the energy allocation of the energy beacon to different sensors and the data transmission powers of the sensors in order to minimize the field reconstruction error at the sink. In contrast to the standard ideal linear energy harvesting (EH) model, we co…
▽ More
We consider a sensing application where the sensor nodes are wirelessly powered by an energy beacon. We focus on the problem of jointly optimizing the energy allocation of the energy beacon to different sensors and the data transmission powers of the sensors in order to minimize the field reconstruction error at the sink. In contrast to the standard ideal linear energy harvesting (EH) model, we consider practical non-linear EH models. We investigate this problem under two different frameworks: i) an optimization approach where the energy beacon knows the utility function of the nodes, channel state information and the energy harvesting characteristics of the devices; hence optimal power allocation strategies can be designed using an optimization problem and ii) a learning approach where the energy beacon decides on its strategies adaptively with battery level information and feedback on the utility function. Our results illustrate that deep reinforcement learning approach can obtain the same error levels with the optimization approach and provides a promising alternative to the optimization framework.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Transmission Strategies for Remote Estimation with an Energy Harvesting Sensor
Authors:
Ayca Ozcelikkale,
Tomas McKelvey,
Mats Viberg
Abstract:
We consider the remote estimation of a time-correlated signal using an energy harvesting (EH) sensor. The sensor observes the unknown signal and communicates its observations to a remote fusion center using an amplify-and-forward strategy. We consider the design of optimal power allocation strategies in order to minimize the mean-square error at the fusion center. Contrary to the traditional appro…
▽ More
We consider the remote estimation of a time-correlated signal using an energy harvesting (EH) sensor. The sensor observes the unknown signal and communicates its observations to a remote fusion center using an amplify-and-forward strategy. We consider the design of optimal power allocation strategies in order to minimize the mean-square error at the fusion center. Contrary to the traditional approaches, the degree of correlation between the signal values constitutes an important aspect of our formulation. We provide the optimal power allocation strategies for a number of illustrative scenarios. We show that the most majorized power allocation strategy, i.e. the power allocation as balanced as possible, is optimal for the cases of circularly wide-sense stationary (c.w.s.s.) signals with a static correlation coefficient, and sampled low-pass c.w.s.s. signals for a static channel. We show that the optimal strategy can be characterized as a water-filling type solution for sampled low-pass c.w.s.s. signals for a fading channel. Motivated by the high-complexity of the numerical solution of the optimization problem, we propose low-complexity policies for the general scenario. Numerical evaluations illustrate the close performance of these low-complexity policies to that of the optimal policies, and demonstrate the effect of the EH constraints and the degree of freedom of the signal.
△ Less
Submitted 9 October, 2016;
originally announced October 2016.
-
Performance Bounds for Remote Estimation under Energy Harvesting Constraints
Authors:
Ayca Ozcelikkale,
Tomas McKelvey,
Mats Viberg
Abstract:
Remote estimation with an energy harvesting sensor with a limited data and energy buffer is considered. The sensor node observes an unknown Gaussian field and communicates its observations to a remote fusion center using the energy it harvested. The fusion center employs minimum mean-square error (MMSE) estimation to reconstruct the unknown field. The distortion minimization problem under the onli…
▽ More
Remote estimation with an energy harvesting sensor with a limited data and energy buffer is considered. The sensor node observes an unknown Gaussian field and communicates its observations to a remote fusion center using the energy it harvested. The fusion center employs minimum mean-square error (MMSE) estimation to reconstruct the unknown field. The distortion minimization problem under the online scheme, where the sensor has access to only the statistical information for the future energy packets is considered. We provide performance bounds on the achievable distortion under a slotted block transmission scheme, where at each transmission time slot, the data and the energy buffer are completely emptied. Our bounds provide insights to the trade-offs between the buffer sizes, the statistical properties of the energy harvesting process and the achievable distortion. In particular, these trade-offs illustrate the insensitivity of the performance to the buffer sizes for signals with low degree of freedom and suggest performance improvements with increasing buffer size for signals with relatively higher degree of freedom. Depending only on the mean, variance and finite support of the energy arrival process, these results provide practical insights for the battery and buffer sizes for deployment in future energy harvesting wireless sensing systems.
△ Less
Submitted 8 October, 2016;
originally announced October 2016.
-
Unitary Precoding and Basis Dependency of MMSE Performance for Gaussian Erasure Channels
Authors:
Ayça Özçelikkale,
Serdar Yüksel,
Haldun M. Ozaktas
Abstract:
We consider the transmission of a Gaussian vector source over a multi-dimensional Gaussian channel where a random or a fixed subset of the channel outputs are erased. Within the setup where the only encoding operation allowed is a linear unitary transformation on the source, we investigate the MMSE performance, both in average, and also in terms of guarantees that hold with high probability as a f…
▽ More
We consider the transmission of a Gaussian vector source over a multi-dimensional Gaussian channel where a random or a fixed subset of the channel outputs are erased. Within the setup where the only encoding operation allowed is a linear unitary transformation on the source, we investigate the MMSE performance, both in average, and also in terms of guarantees that hold with high probability as a function of the system parameters. Under the performance criterion of average MMSE, necessary conditions that should be satisfied by the optimal unitary encoders are established and explicit solutions for a class of settings are presented. For random sampling of signals that have a low number of degrees of freedom, we present MMSE bounds that hold with high probability. Our results illustrate how the spread of the eigenvalue distribution and the unitary transformation contribute to these performance guarantees. The performance of the discrete Fourier transform (DFT) is also investigated. As a benchmark, we investigate the equidistant sampling of circularly wide-sense stationary (c.w.s.s.) signals, and present the explicit error expression that quantifies the effects of the sampling rate and the eigenvalue distribution of the covariance matrix of the signal.
These findings may be useful in understanding the geometric dependence of signal uncertainty in a stochastic process. In particular, unlike information theoretic measures such as entropy, we highlight the basis dependence of uncertainty in a signal with another perspective. The unitary encoding space restriction exhibits the most and least favorable signal bases for estimation.
△ Less
Submitted 13 September, 2014; v1 submitted 10 November, 2011;
originally announced November 2011.