-
Using Decentralized Aggregation for Federated Learning with Differential Privacy
Authors:
Hadeel Abd El-Kareem,
Abd El-Moaty Saleh,
Ana Fernández-Vilas,
Manuel Fernández-Veiga,
asser El-Sonbaty
Abstract:
Nowadays, the ubiquitous usage of mobile devices and networks have raised concerns about the loss of control over personal data and research advance towards the trade-off between privacy and utility in scenarios that combine exchange communications, big databases and distributed and collaborative (P2P) Machine Learning techniques. On the other hand, although Federated Learning (FL) provides some l…
▽ More
Nowadays, the ubiquitous usage of mobile devices and networks have raised concerns about the loss of control over personal data and research advance towards the trade-off between privacy and utility in scenarios that combine exchange communications, big databases and distributed and collaborative (P2P) Machine Learning techniques. On the other hand, although Federated Learning (FL) provides some level of privacy by retaining the data at the local node, which executes a local training to enrich a global model, this scenario is still susceptible to privacy breaches as membership inference attacks. To provide a stronger level of privacy, this research deploys an experimental environment for FL with Differential Privacy (DP) using benchmark datasets. The obtained results show that the election of parameters and techniques of DP is central in the aforementioned trade-off between privacy and utility by means of a classification example.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Learning from Integral Losses in Physics Informed Neural Networks
Authors:
Ehsan Saleh,
Saba Ghaffari,
Timothy Bretl,
Luke Olson,
Matthew West
Abstract:
This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead…
▽ More
This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.
△ Less
Submitted 11 June, 2024; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Robust Model-Based Optimization for Challenging Fitness Landscapes
Authors:
Saba Ghaffari,
Ehsan Saleh,
Alexander G. Schwing,
Yu-Xiong Wang,
Martin D. Burke,
Saurabh Sinha
Abstract:
Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recogni…
▽ More
Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recognized but equally important problem stems from the distribution of training samples in the design space: leading methods are not designed for scenarios where the desired optimum is in a region that is not only poorly represented in training data, but also relatively far from the highly represented low-fitness regions. We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools and propose a new approach that uses a novel VAE as its search model to overcome the problem. We demonstrate its advantage over prior methods in robustly finding improved samples, regardless of the imbalance and separation between low- and high-fitness samples. Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. Our implementation is available at https://github.com/sabagh1994/PGVAE.
△ Less
Submitted 27 June, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Statistical Properties of the log-cosh Loss Function Used in Machine Learning
Authors:
Resve A. Saleh,
A. K. Md. Ehsanes Saleh
Abstract:
This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribu…
▽ More
This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing.
△ Less
Submitted 15 March, 2024; v1 submitted 9 August, 2022;
originally announced August 2022.
-
Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration
Authors:
Dustin Morrill,
Esra'a Saleh,
Michael Bowling,
Amy Greenwald
Abstract:
Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that of SPG, however, we show that the Monte Carlo updates differ in a substantial way: the importance correction accounting for a sampled action is nullified in th…
▽ More
Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that of SPG, however, we show that the Monte Carlo updates differ in a substantial way: the importance correction accounting for a sampled action is nullified in the SPG update, but not in the NeuRD update. Naturally, this causes the NeuRD update to have higher variance than its SPG counterpart. Building on implicit exploration algorithms in the adversarial bandit setting, we introduce capped implicit exploration (CIX) estimates that allow us to construct NeuRD-CIX, which interpolates between this aspect of NeuRD and SPG. We show how CIX estimates can be used in a black-box reduction to construct bandit algorithms with regret bounds that hold with high probability and the benefits this entails for NeuRD-CIX in sequential decision-making settings. Our analysis reveals a bias--variance tradeoff between SPG and NeuRD, and shows how theory predicts that NeuRD-CIX will perform well more consistently than NeuRD while retaining NeuRD's advantages over SPG in non-stationary environments.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Truly Deterministic Policy Optimization
Authors:
Ehsan Saleh,
Saba Ghaffari,
Timothy Bretl,
Matthew West
Abstract:
In this paper, we present a policy gradient method that avoids exploratory noise injection and performs policy search over the deterministic landscape. By avoiding noise injection all sources of estimation variance can be eliminated in systems with deterministic dynamics (up to the initial state distribution). Since deterministic policy regularization is impossible using traditional non-metric mea…
▽ More
In this paper, we present a policy gradient method that avoids exploratory noise injection and performs policy search over the deterministic landscape. By avoiding noise injection all sources of estimation variance can be eliminated in systems with deterministic dynamics (up to the initial state distribution). Since deterministic policy regularization is impossible using traditional non-metric measures such as the KL divergence, we derive a Wasserstein-based quadratic model for our purposes. We state conditions on the system model under which it is possible to establish a monotonic policy improvement guarantee, propose a surrogate function for policy gradient estimation, and show that it is possible to compute exact advantage estimates if both the state transition model and the policy are deterministic. Finally, we describe two novel robotic control environments -- one with non-local rewards in the frequency domain and the other with a long horizon (8000 time-steps) -- for which our policy gradient method (TDPO) significantly outperforms existing methods (PPO, TRPO, DDPG, and TD3). Our implementation with all the experimental settings is available at https://github.com/ehsansaleh/code_tdpo
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Should Models Be Accurate?
Authors:
Esra'a Saleh,
John D. Martin,
Anna Koop,
Arash Pourzarabi,
Michael Bowling
Abstract:
Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of thes…
▽ More
Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of these models should be the accurate simulation of environment dynamics at all. We focus our investigations on Dyna-style planning in a prediction setting. First, we highlight and support three motivating points: a perfectly accurate model of environment dynamics is not practically achievable, is not necessary, and is not always the most useful anyways. Second, we introduce a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment. Our experiments show that in a simple non-stationary environment, our algorithm enables faster learning than even using an accurate model built with domain-specific knowledge of the non-stationarity.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression
Authors:
Resve A. Saleh,
A. K. Md. Ehsanes Saleh
Abstract:
This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike fo…
▽ More
This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike for over 4 decades. Numerous attempts have been made to find a simple and general solution. This paper describes a unique and elegant solution to the problem based on a flexible check function that is easy to understand and implement in R and Python, while greatly reducing or even eliminating the crossing problem entirely. It will be very important in all areas where quantile regression is routinely used and may also find application in robust regression, especially in the context of machine learning. From this perspective, we also utilize the flexible check function to provide insights into the root causes of the crossing problem.
△ Less
Submitted 24 November, 2021; v1 submitted 8 November, 2021;
originally announced November 2021.
-
On the Importance of Firth Bias Reduction in Few-Shot Classification
Authors:
Saba Ghaffari,
Ehsan Saleh,
David Forsyth,
Yu-xiong Wang
Abstract:
Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training…
▽ More
Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction
△ Less
Submitted 14 April, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning
Authors:
Chi-Sing Ho,
Neal Jean,
Catherine A. Hogan,
Lena Blackmon,
Stefanie S. Jeffrey,
Mark Holodniy,
Niaz Banaei,
Amr A. E. Saleh,
Stefano Ermon,
Jennifer Dionne
Abstract:
Rapid identification of bacteria is essential to prevent the spread of infectious disease, help combat antimicrobial resistance, and improve patient outcomes. Raman optical spectroscopy promises to combine bacterial detection, identification, and antibiotic susceptibility testing in a single step. However, achieving clinically relevant speeds and accuracies remains challenging due to the weak Rama…
▽ More
Rapid identification of bacteria is essential to prevent the spread of infectious disease, help combat antimicrobial resistance, and improve patient outcomes. Raman optical spectroscopy promises to combine bacterial detection, identification, and antibiotic susceptibility testing in a single step. However, achieving clinically relevant speeds and accuracies remains challenging due to the weak Raman signal from bacterial cells and the large number of bacterial species and phenotypes. By amassing the largest known dataset of bacterial Raman spectra, we are able to apply state-of-the-art deep learning approaches to identify 30 of the most common bacterial pathogens from noisy Raman spectra, achieving antibiotic treatment identification accuracies of 99.0$\pm$0.1%. This novel approach distinguishes between methicillin-resistant and -susceptible isolates of Staphylococcus aureus (MRSA and MSSA) as well as a pair of isogenic MRSA and MSSA that are genetically identical apart from deletion of the mecA resistance gene, indicating the potential for culture-free detection of antibiotic resistance. Results from initial clinical validation are promising: using just 10 bacterial spectra from each of 25 isolates, we achieve 99.0$\pm$1.9% species identification accuracy. Our combined Raman-deep learning system represents an important proof-of-concept for rapid, culture-free identification of bacterial isolates and antibiotic resistance and could be readily extended for diagnostics on blood, urine, and sputum.
△ Less
Submitted 5 November, 2019; v1 submitted 22 January, 2019;
originally announced January 2019.
-
Survey on Using GIS in Evacuation Planning
Authors:
Sara Shaker Abed El-Hamied,
Ahmed Abou El-Fotouh Saleh,
Aziza Asem
Abstract:
Natural crises form a big threat on environment; these crises mean the loss of enterprises and individuals, and therefore losses in the sum total of community development. Management to these crises is required through a crisis management plan to control the crises before, during, and after the event. One of the most needed things to consider during preparing the crisis management plan is preparin…
▽ More
Natural crises form a big threat on environment; these crises mean the loss of enterprises and individuals, and therefore losses in the sum total of community development. Management to these crises is required through a crisis management plan to control the crises before, during, and after the event. One of the most needed things to consider during preparing the crisis management plan is preparing the evacuation plan in order to transfer people from the incident place to a safe place; this must be done quickly and carefully. Because of the geographic nature of the evacuation process, Geographical Information System (GIS) has been used widely and effectively for over 20 years in the field of crisis management in general and in evacuation planning in particular. This paper provides an overview about evacuation process and the basic concepts of GIS systems. The paper also demonstrates the importance of evacuation planning and how GIS systems used in other studies to assists in evacuation process.
△ Less
Submitted 29 August, 2012;
originally announced August 2012.