Search | arXiv e-print repository

DFML: Decentralized Federated Mutual Learning

Authors: Yasser H. Khalil, Amir H. Estiri, Mahdi Beitollahi, Nader Asadi, Sobhan Hemati, Xu Li, Guojun Zhang, Xi Chen

Abstract: In the realm of real-world devices, centralized servers in Federated Learning (FL) present challenges including communication bottlenecks and susceptibility to a single point of failure. Additionally, contemporary devices inherently exhibit model and data heterogeneity. Existing work lacks a Decentralized FL (DFL) framework capable of accommodating such heterogeneity without imposing architectural… ▽ More In the realm of real-world devices, centralized servers in Federated Learning (FL) present challenges including communication bottlenecks and susceptibility to a single point of failure. Additionally, contemporary devices inherently exhibit model and data heterogeneity. Existing work lacks a Decentralized FL (DFL) framework capable of accommodating such heterogeneity without imposing architectural restrictions or assuming the availability of public data. To address these issues, we propose a Decentralized Federated Mutual Learning (DFML) framework that is serverless, supports nonrestrictive heterogeneous models, and avoids reliance on public data. DFML effectively handles model and data heterogeneity through mutual learning, which distills knowledge between clients, and cyclically varying the amount of supervision and distillation signals. Extensive experimental results demonstrate consistent effectiveness of DFML in both convergence speed and global accuracy, outperforming prevalent baselines under various conditions. For example, with the CIFAR-100 dataset and 50 clients, DFML achieves a substantial increase of +17.20% and +19.95% in global accuracy under Independent and Identically Distributed (IID) and non-IID data shifts, respectively. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.05387 [pdf, other]

Cross Domain Generative Augmentation: Domain Generalization with Latent Diffusion Models

Authors: Sobhan Hemati, Mahdi Beitollahi, Amir Hossein Estiri, Bassel Al Omari, Xi Chen, Guojun Zhang

Abstract: Despite the huge effort in develo** novel regularizers for Domain Generalization (DG), adding simple data augmentation to the vanilla ERM which is a practical implementation of the Vicinal Risk Minimization principle (VRM) \citep{chapelle2000vicinal} outperforms or stays competitive with many of the proposed regularizers. The VRM reduces the estimation error in ERM by replacing the point-wise ke… ▽ More Despite the huge effort in develo** novel regularizers for Domain Generalization (DG), adding simple data augmentation to the vanilla ERM which is a practical implementation of the Vicinal Risk Minimization principle (VRM) \citep{chapelle2000vicinal} outperforms or stays competitive with many of the proposed regularizers. The VRM reduces the estimation error in ERM by replacing the point-wise kernel estimates with a more precise estimation of true data distribution that reduces the gap between data points \textbf{within each domain}. However, in the DG setting, the estimation error of true data distribution by ERM is mainly caused by the distribution shift \textbf{between domains} which cannot be fully addressed by simple data augmentation techniques within each domain. Inspired by this limitation of VRM, we propose a novel data augmentation named Cross Domain Generative Augmentation (CDGA) that replaces the pointwise kernel estimates in ERM with new density estimates in the \textbf{vicinity of domain pairs} so that the gap between domains is further reduced. To this end, CDGA, which is built upon latent diffusion models (LDM), generates synthetic images to fill the gap between all domains and as a result, reduces the non-iidness. We show that CDGA outperforms SOTA DG methods under the Domainbed benchmark. To explain the effectiveness of CDGA, we generate more than 5 Million synthetic images and perform extensive ablation studies including data scaling laws, distribution visualization, domain shift quantification, adversarial robustness, and loss landscape analysis. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2308.11778 [pdf, other]

Understanding Hessian Alignment for Domain Generalization

Authors: Sobhan Hemati, Guojun Zhang, Amir Estiri, Xi Chen

Abstract: Out-of-distribution (OOD) generalization is a critical ability for deep learning models in many real-world scenarios including healthcare and autonomous vehicles. Recently, different techniques have been proposed to improve OOD generalization. Among these methods, gradient-based regularizers have shown promising performance compared with other competitors. Despite this success, our understanding o… ▽ More Out-of-distribution (OOD) generalization is a critical ability for deep learning models in many real-world scenarios including healthcare and autonomous vehicles. Recently, different techniques have been proposed to improve OOD generalization. Among these methods, gradient-based regularizers have shown promising performance compared with other competitors. Despite this success, our understanding of the role of Hessian and gradient alignment in domain generalization is still limited. To address this shortcoming, we analyze the role of the classifier's head Hessian matrix and gradient in domain generalization using recent OOD theory of transferability. Theoretically, we show that spectral norm between the classifier's head Hessian matrices across domains is an upper bound of the transfer measure, a notion of distance between target and source domains. Furthermore, we analyze all the attributes that get aligned when we encourage similarity between Hessians and gradients. Our analysis explains the success of many regularizers like CORAL, IRM, V-REx, Fish, IGA, and Fishr as they regularize part of the classifier's head Hessian and/or gradient. Finally, we propose two simple yet effective methods to match the classifier's head Hessians and gradients in an efficient way, based on the Hessian Gradient Product (HGP) and Hutchinson's method (Hutchinson), and without directly calculating Hessians. We validate the OOD generalization ability of proposed methods in different scenarios, including transferability, severe correlation shift, label shift and diversity shift. Our results show that Hessian alignment methods achieve promising performance on various OOD benchmarks. The code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/HessianAlignment}. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2111.07457 [pdf, other]

Attentive Federated Learning for Concept Drift in Distributed 5G Edge Networks

Authors: Amir Hossein Estiri, Muthucumaru Maheswaran

Abstract: Machine learning (ML) is expected to play a major role in 5G edge computing. Various studies have demonstrated that ML is highly suitable for optimizing edge computing systems as rapid mobility and application-induced changes occur at the edge. For ML to provide the best solutions, it is important to continually train the ML models to include the changing scenarios. The sudden changes in data dist… ▽ More Machine learning (ML) is expected to play a major role in 5G edge computing. Various studies have demonstrated that ML is highly suitable for optimizing edge computing systems as rapid mobility and application-induced changes occur at the edge. For ML to provide the best solutions, it is important to continually train the ML models to include the changing scenarios. The sudden changes in data distributions caused by changing scenarios (e.g., 5G base station failures) is referred to as concept drift and is a major challenge to continual learning. The ML models can present high error rates while the drifts take place and the errors decrease only after the model learns the distributions. This problem is more pronounced in a distributed setting where multiple ML models are being used for different heterogeneous datasets and the final model needs to capture all concept drifts. In this paper, we show that using Attention in Federated Learning (FL) is an efficient way of handling concept drifts. We use a 5G network traffic dataset to simulate concept drift and test various scenarios. The results indicate that Attention can significantly improve the concept drift handling capability of FL. △ Less

Submitted 14 November, 2021; originally announced November 2021.

Comments: 6 pages, 7 figures, IEEE International Conference on Communications (ICCC) 2022

arXiv:2010.03967 [pdf, other]

A Variational Auto-Encoder Approach for Image Transmission in Wireless Channel

Authors: Amir Hossein Estiri, Mohammad Reza Sabramooz, Ali Banaei, Amir Hossein Dehghan, Benyamin Jamialahmadi, Mahdi Jafari Siavoshani

Abstract: Recent advancements in information technology and the widespread use of the Internet have led to easier access to data worldwide. As a result, transmitting data through noisy channels is inevitable. Reducing the size of data and protecting it during transmission from corruption due to channel noises are two classical problems in communication and information theory. Recently, inspired by deep neur… ▽ More Recent advancements in information technology and the widespread use of the Internet have led to easier access to data worldwide. As a result, transmitting data through noisy channels is inevitable. Reducing the size of data and protecting it during transmission from corruption due to channel noises are two classical problems in communication and information theory. Recently, inspired by deep neural networks' success in different tasks, many works have been done to address these two problems using deep learning techniques. In this paper, we investigate the performance of variational auto-encoders and compare the results with standard auto-encoders. Our findings suggest that variational auto-encoders are more robust to channel degradation than auto-encoders. Furthermore, we have tried to excel in the human perceptual quality of reconstructed images by using perception-based error metrics as our network's loss function. To this end, we use the structural similarity index (SSIM) as a perception-based metric to optimize the proposed neural network. Our experiments demonstrate that the SSIM metric visually improves the quality of the reconstructed images at the receiver. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Showing 1–5 of 5 results for author: Estiri, A