Search | arXiv e-print repository

On Sequential Loss Approximation for Continual Learning

Authors: Menghao Waiyan William Zhu, Ercan Engin Kuruoğlu

Abstract: We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these method… ▽ More We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results, unless combined with replay. We find that for small datasets, quadratic approximation of the previous loss function leads to poor results, even with full Hessian computation, and NC could significantly improve the predictive performance, while for large datasets, when used with a fixed pre-trained feature extractor, AQC provides superior predictive performance. We also find that using tanh-output features can improve the predictive performance of AQC. In particular, in class-incremental Split MNIST, when a Convolutional Neural Network (CNN) with tanh-output features is pre-trained on EMNIST Letters and used as a fixed pre-trained feature extractor, AQC can achieve predictive performance comparable to joint training. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.04111 [pdf, other]

Adaptive Least Mean pth Power Graph Neural Networks

Authors: Changran Peng, Yi Yan, Ercan E. Kuruoglu

Abstract: In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage… ▽ More In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage of adaptive filtering in handling noise and missing observations as well as the online update capability. The incorporated graph neural network within the LMP-GNN can train and update filter parameters online instead of predefined filter parameters in previous methods, outputting more accurate prediction results. The adaptive update scheme of the LMP-GNN follows the solution of a $l_p$-norm optimization, rooting to the minimum dispersion criterion, and yields robust estimation results for time-varying graph signals under impulsive noise. A special case of LMP-GNN named the Sign-GNN is also provided and analyzed, Experiment results on two real-world datasets of temperature graph and traffic graph under four different noise distributions prove the effectiveness and robustness of our proposed LMP-GNN. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04098 [pdf, other]

Binarized Simplicial Convolutional Neural Networks

Authors: Yi Yan, Ercan E. Kuruoglu

Abstract: Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on s… ▽ More Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on simplicial complexes named Binarized Simplicial Convolutional Neural Networks (Bi-SCNN) based on the combination of simplicial convolution with a binary-sign forward propagation strategy. The usage of the Hodge Laplacian on a binary-sign forward propagation enables Bi-SCNN to efficiently and effectively represent simplicial features that have higher-order structures than traditional graph node representations. Compared to the previous Simplicial Convolutional Neural Networks, the reduced model complexity of Bi-SCNN shortens the execution time without sacrificing the prediction performance and is less prone to the over-smoothing effect. Experimenting with real-world citation and ocean-drifter data confirmed that our proposed Bi-SCNN is efficient and accurate. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2402.16911 [pdf, other]

Trustworthy Personalized Bayesian Federated Learning via Posterior Fine-Tune

Authors: Mengen Luo, Chi Xu, Ercan Engin Kuruoglu

Abstract: Performance degradation owing to data heterogeneity and low output interpretability are the most significant challenges faced by federated learning in practical applications. Personalized federated learning diverges from traditional approaches, as it no longer seeks to train a single model, but instead tailors a unique personalized model for each client. However, previous work focused only on pers… ▽ More Performance degradation owing to data heterogeneity and low output interpretability are the most significant challenges faced by federated learning in practical applications. Personalized federated learning diverges from traditional approaches, as it no longer seeks to train a single model, but instead tailors a unique personalized model for each client. However, previous work focused only on personalization from the perspective of neural network parameters and lack of robustness and interpretability. In this work, we establish a novel framework for personalized federated learning, incorporating Bayesian methodology which enhances the algorithm's ability to quantify uncertainty. Furthermore, we introduce normalizing flow to achieve personalization from the parameter posterior perspective and theoretically analyze the impact of normalizing flow on out-of-distribution (OOD) detection for Bayesian neural networks. Finally, we evaluated our approach on heterogeneous datasets, and the experimental results indicate that the new algorithm not only improves accuracy but also outperforms the baseline significantly in OOD detection due to the reliable output of the Bayesian approach. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.16091 [pdf, other]

Bayesian Neural Network For Personalized Federated Learning Parameter Selection

Authors: Mengen Luo, Ercan Engin Kuruoglu

Abstract: Federated learning's poor performance in the presence of heterogeneous data remains one of the most pressing issues in the field. Personalized federated learning departs from the conventional paradigm in which all clients employ the same model, instead striving to discover an individualized model for each client to address the heterogeneity in the data. One of such approach involves personalizing… ▽ More Federated learning's poor performance in the presence of heterogeneous data remains one of the most pressing issues in the field. Personalized federated learning departs from the conventional paradigm in which all clients employ the same model, instead striving to discover an individualized model for each client to address the heterogeneity in the data. One of such approach involves personalizing specific layers of neural networks. However, prior endeavors have not provided a dependable rationale, and some have selected personalized layers that are entirely distinct and conflicting. In this work, we take a step further by proposing personalization at the elemental level, rather than the traditional layer-level personalization. To select personalized parameters, we introduce Bayesian neural networks and rely on the uncertainty they offer to guide our selection of personalized parameters. Finally, we validate our algorithm's efficacy on several real-world datasets, demonstrating that our proposed approach outperforms existing baselines. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2401.15304 [pdf, other]

Adaptive Least Mean Squares Graph Neural Networks and Online Graph Signal Estimation

Authors: Yi Yan, Changran Peng, Ercan Engin Kuruoglu

Abstract: The online prediction of multivariate signals, existing simultaneously in space and time, from noisy partial observations is a fundamental task in numerous applications. We propose an efficient Neural Network architecture for the online estimation of time-varying graph signals named the Adaptive Least Mean Squares Graph Neural Networks (LMS-GNN). LMS-GNN aims to capture the time variation and brid… ▽ More The online prediction of multivariate signals, existing simultaneously in space and time, from noisy partial observations is a fundamental task in numerous applications. We propose an efficient Neural Network architecture for the online estimation of time-varying graph signals named the Adaptive Least Mean Squares Graph Neural Networks (LMS-GNN). LMS-GNN aims to capture the time variation and bridge the cross-space-time interactions under the condition that signals are corrupted by noise and missing values. The LMS-GNN is a combination of adaptive graph filters and Graph Neural Networks (GNN). At each time step, the forward propagation of LMS-GNN is similar to adaptive graph filters where the output is based on the error between the observation and the prediction similar to GNN. The filter coefficients are updated via backpropagation as in GNN. Experimenting on real-world temperature data reveals that our LMS-GNN achieves more accurate online predictions compared to graph-based methods like adaptive graph filters and graph convolutional neural networks. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2311.11126 [pdf, other]

Bayesian Neural Networks: A Min-Max Game Framework

Authors: Jun** Hong, Ercan Engin Kuruoglu

Abstract: This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcri… ▽ More This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcription neural network, we formulate the BNN via game theory between the deterministic neural network $f$ and the sampling network $f + ξ$ or $f + r*ξ$. Compared with previous BNN, BNN via game theory learns a solution space within a certain gap between the center $f$ and the sampling point $f + r*ξ$, and is a conservative choice with a meaningful prior setting compared with previous BNN. Furthermore, the minimum points between $f$ and $f + r*ξ$ become stable when the subspace dimension is large enough with a well-trained model $f$. With these, the model $f$ can have a high chance of recognizing the out-of-distribution data or noise data in the subspace rather than the prediction level, even if $f$ is in online training after a few iterations of true data. So far, our experiments are limited to MNIST and Fashion MNIST data sets, more experiments with realistic data sets and complicated neural network models should be implemented to validate the above arguments. △ Less

Submitted 29 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: 6 pages, 8 figures,

arXiv:2311.10803 [pdf, other]

Robustness Enhancement in Neural Networks with Alpha-Stable Training Noise

Authors: Xueqiong Yuan, Jipeng Li, Ercan Engin Kuruoğlu

Abstract: With the increasing use of deep learning on data collected by non-perfect sensors and in non-perfect environments, the robustness of deep learning systems has become an important issue. A common approach for obtaining robustness to noise has been to train deep learning systems with data augmented with Gaussian noise. In this work, we challenge the common choice of Gaussian noise and explore the po… ▽ More With the increasing use of deep learning on data collected by non-perfect sensors and in non-perfect environments, the robustness of deep learning systems has become an important issue. A common approach for obtaining robustness to noise has been to train deep learning systems with data augmented with Gaussian noise. In this work, we challenge the common choice of Gaussian noise and explore the possibility of stronger robustness for non-Gaussian impulsive noise, specifically alpha-stable noise. Justified by the Generalized Central Limit Theorem and evidenced by observations in various application areas, alpha-stable noise is widely present in nature. By comparing the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different noise, we find that training with alpha-stable noise is more effective than Gaussian noise, especially when the dataset is corrupted by impulsive noise, thus improving the robustness of the model. The generality of this conclusion is validated through experiments conducted on various deep learning models with image and time series datasets, and other benchmark corrupted datasets. Consequently, we propose a novel data augmentation method that replaces Gaussian noise, which is typically added to the training data, with alpha-stable noise. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.00656 [pdf, other]

Online Signal Estimation on the Graph Edges via Line Graph Transformation

Authors: Yi Yan, Ercan Engin Kuruoglu

Abstract: The processing of signals on graph edges is challenging considering that Graph Signal Processing techniques are defined only on the graph nodes. Leveraging the Line Graph to transform a graph edge signal onto the node of its edge-to-vertex dual, we propose the Line Graph Least Mean Square (LGLMS) algorithm for online time-varying graph edge signal prediction. By setting up an $l_2$-norm optimizati… ▽ More The processing of signals on graph edges is challenging considering that Graph Signal Processing techniques are defined only on the graph nodes. Leveraging the Line Graph to transform a graph edge signal onto the node of its edge-to-vertex dual, we propose the Line Graph Least Mean Square (LGLMS) algorithm for online time-varying graph edge signal prediction. By setting up an $l_2$-norm optimization problem, LGLMS forms an adaptive algorithm as the graph edge analogy of the classical adaptive LMS algorithm. Additionally, the LGLMS inherits all the GSP concepts and techniques that can previously be deployed on the graph nodes, but without the need to redefine them on the graph edges. Experimenting with transportation graphs and meteorological graphs, with the signal observations having noisy and missing values, we confirmed that LGLMS is suitable for the online prediction of time-varying edge signals. △ Less

Submitted 28 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.00642 [pdf, other]

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Authors: Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu

Abstract: The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issu… ▽ More The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution. △ Less

Submitted 1 October, 2023; originally announced October 2023.

MSC Class: 93A16 ACM Class: I.2.11; G.3

arXiv:2303.11959 [pdf, other]

Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning

Authors: Hengxi Zhang, Zhendong Shi, Yuanquan Hu, Wenbo Ding, Ercan E. Kuruoglu, Xiao-** Zhang

Abstract: Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. Thi… ▽ More Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. This paper delves into the fusion of two established financial trading strategies, namely the constant proportion portfolio insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the multi-agent deep deterministic policy gradient (MADDPG) framework. As a result, we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. To validate these innovations, we implemented them on a diverse selection of 100 real-market shares. Our empirical findings reveal that the CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional counterparts, affirming their efficacy in the realm of quantitative trading. △ Less

Submitted 21 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

arXiv:2211.06533 [pdf, other]

Adaptive Joint Estimation of Temporal Vertex and Edge Signals

Authors: Yi Yan, Tian Xie, Ercan E. Kuruoglu

Abstract: The adaptive estimation of coexisting temporal vertex (node) and edge signals on graphs is a critical task when a change in edge signals influences the temporal dynamics of the vertex signals. However, the current Graph Signal Processing algorithms mostly consider only the signals existing on the graph vertices and have neglected the fact that signals can reside on the edges. We propose an Adaptiv… ▽ More The adaptive estimation of coexisting temporal vertex (node) and edge signals on graphs is a critical task when a change in edge signals influences the temporal dynamics of the vertex signals. However, the current Graph Signal Processing algorithms mostly consider only the signals existing on the graph vertices and have neglected the fact that signals can reside on the edges. We propose an Adaptive Joint Vertex-Edge Estimation (AJVEE) algorithm for jointly estimating time-varying vertex and edge signals through a time-varying regression, incorporating both vertex signal filtering and edge signal filtering. Accompanying AJVEE is a newly proposed Adaptive Least Mean Square procedure based on the Hodge Laplacian (ALMS-Hodge), which is inspired by classical adaptive filters combining simplicial filtering and simplicial regression. AJVEE is able to operate jointly on the vertices and edges by merging two ALMS-Hodge algorithms specified on the vertices and edges into a unified formulation. A more generalized case extending AJVEE beyond the vertices and edges is being discussed. Experimenting on real-world traffic networks and population mobility networks, we have confirmed that our proposed AJVEE algorithm could accurately and jointly track time-varying vertex and edge signals on graphs. △ Less

Submitted 7 May, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

arXiv:2203.10214 [pdf, other]

Thompson Sampling on Asymmetric $α$-Stable Bandits

Authors: Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

Abstract: In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to e… ▽ More In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $α$-stable distributions and explore their applications in modelling financial and wireless data. △ Less

Submitted 25 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 8 pages, 4 figures

arXiv:2203.00320 [pdf, ps, other]

doi 10.1007/s11265-022-01802-2

Graph Normalized-LMP Algorithm for Signal Estimation Under Impulsive Noise

Authors: Yi Yan, Radwa Adel, Ercan Engin Kuruoglu

Abstract: In this paper, we introduce an adaptive graph normalized least mean pth power (GNLMP) algorithm for graph signal processing (GSP) that utilizes GSP techniques, including bandlimited filtering and node sampling, to estimate sampled graph signals under impulsive noise. Different from least-squares-based algorithms, such as the adaptive GSP Least Mean Squares (GLMS) algorithm and the normalized GLMS… ▽ More In this paper, we introduce an adaptive graph normalized least mean pth power (GNLMP) algorithm for graph signal processing (GSP) that utilizes GSP techniques, including bandlimited filtering and node sampling, to estimate sampled graph signals under impulsive noise. Different from least-squares-based algorithms, such as the adaptive GSP Least Mean Squares (GLMS) algorithm and the normalized GLMS (GNLMS) algorithm, the GNLMP algorithm has the ability to reconstruct a graph signal that is corrupted by non-Gaussian noise with heavy-tailed characteristics. Compared to the recently introduced adaptive GSP least mean pth power (GLMP) algorithm, the GNLMP algorithm reduces the number of iterations to converge to a steady graph signal. The convergence condition of the GNLMP algorithm is derived, and the ability of the GNLMP algorithm to process multidimensional time-varying graph signals with multiple features is demonstrated as well. Simulations show the performance of the GNLMP algorithm in estimating steady-state and time-varying graph signals is faster than GLMP and more robust in comparison to GLMS and GNLMS. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Journal ref: J Sign Process Syst (2022)

arXiv:2109.14509 [pdf, other]

PAC-Bayes Information Bottleneck

Authors: Zifeng Wang, Shao-Lun Huang, Ercan E. Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng

Abstract: Understanding the source of the superior generalization ability of NNs remains one of the most important problems in ML research. There have been a series of theoretical works trying to derive non-vacuous bounds for NNs. Recently, the compression of information stored in weights (IIW) is proved to play a key role in NNs generalization based on the PAC-Bayes theorem. However, no solution of IIW has… ▽ More Understanding the source of the superior generalization ability of NNs remains one of the most important problems in ML research. There have been a series of theoretical works trying to derive non-vacuous bounds for NNs. Recently, the compression of information stored in weights (IIW) is proved to play a key role in NNs generalization based on the PAC-Bayes theorem. However, no solution of IIW has ever been provided, which builds a barrier for further investigation of the IIW's property and its potential in practical deep learning. In this paper, we propose an algorithm for the efficient approximation of IIW. Then, we build an IIW-based information bottleneck on the trade-off between accuracy and information complexity of NNs, namely PIB. From PIB, we can empirically identify the fitting to compressing phase transition during NNs' training and the concrete connection between the IIW compression and the generalization. Besides, we verify that IIW is able to explain NNs in broad cases, e.g., varying batch sizes, over-parameterization, and noisy labels. Moreover, we propose an MCMC-based algorithm to sample from the optimal weight posterior characterized by PIB, which fulfills the potential of IIW in enhancing NNs in practice. △ Less

Submitted 4 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: ICLR'22 (Spotlight)

arXiv:2009.02623 [pdf, other]

Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback

Authors: Zifeng Wang, Xi Chen, Rui Wen, Shao-Lun Huang, Ercan E. Kuruoglu, Yefeng Zheng

Abstract: Counterfactual learning for dealing with missing-not-at-random data (MNAR) is an intriguing topic in the recommendation literature since MNAR data are ubiquitous in modern recommender systems. Missing-at-random (MAR) data, namely randomized controlled trials (RCTs), are usually required by most previous counterfactual learning methods for debiasing learning. However, the execution of RCTs is extra… ▽ More Counterfactual learning for dealing with missing-not-at-random data (MNAR) is an intriguing topic in the recommendation literature since MNAR data are ubiquitous in modern recommender systems. Missing-at-random (MAR) data, namely randomized controlled trials (RCTs), are usually required by most previous counterfactual learning methods for debiasing learning. However, the execution of RCTs is extraordinarily expensive in practice. To circumvent the use of RCTs, we build an information-theoretic counterfactual variational information bottleneck (CVIB), as an alternative for debiasing learning without RCTs. By separating the task-aware mutual information term in the original information bottleneck Lagrangian into factual and counterfactual parts, we derive a contrastive information loss and an additional output confidence penalty, which facilitates balanced learning between the factual and counterfactual domains. Empirical evaluation on real-world datasets shows that our CVIB significantly enhances both shallow and deep models, which sheds light on counterfactual learning in recommendation that goes beyond RCTs. △ Less

Submitted 17 October, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

arXiv:1904.05586 [pdf, other]

Black-Box Decision based Adversarial Attack with Symmetric $α$-stable Distribution

Authors: Vignesh Srinivasan, Ercan E. Kuruoglu, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima

Abstract: Develo** techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice whe… ▽ More Develo** techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice when the exploration is required to follow the complicated structure that most real-world data distributions exhibit. In this paper, we investigate how statistics of random variables affect such random walk exploration. Specifically, we generalize the Boundary Attack, a state-of-the-art black-box decision based attacking strategy, and propose the Lévy-Attack, where the random walk is driven by symmetric $α$-stable random variables. Our experiments on MNIST and CIFAR10 datasets show that the Lévy-Attack explores the image data space more efficiently, and significantly improves the performance. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1404.4351 [pdf, ps, other]

Stable Graphical Models

Authors: Navodit Misra, Ercan E. Kuruoglu

Abstract: Stable random variables are motivated by the central limit theorem for densities with (potentially) unbounded variance and can be thought of as natural generalizations of the Gaussian distribution to skewed and heavy-tailed phenomenon. In this paper, we introduce stable graphical (SG) models, a class of multivariate stable densities that can also be represented as Bayesian networks whose edges enc… ▽ More Stable random variables are motivated by the central limit theorem for densities with (potentially) unbounded variance and can be thought of as natural generalizations of the Gaussian distribution to skewed and heavy-tailed phenomenon. In this paper, we introduce stable graphical (SG) models, a class of multivariate stable densities that can also be represented as Bayesian networks whose edges encode linear dependencies between random variables. One major hurdle to the extensive use of stable distributions is the lack of a closed-form analytical expression for their densities. This makes penalized maximum-likelihood based learning computationally demanding. We establish theoretically that the Bayesian information criterion (BIC) can asymptotically be reduced to the computationally more tractable minimum dispersion criterion (MDC) and develop StabLe, a structure learning algorithm based on MDC. We use simulated datasets for five benchmark network topologies to empirically demonstrate how StabLe improves upon ordinary least squares (OLS) regression. We also apply StabLe to microarray gene expression data for lymphoblastoid cells from 727 individuals belonging to eight global population groups. We establish that StabLe improves test set performance relative to OLS via ten-fold cross-validation. Finally, we develop SGEX, a method for quantifying differential expression of genes between different population groups. △ Less

Submitted 16 April, 2014; originally announced April 2014.

Showing 1–18 of 18 results for author: Kuruoğlu, E E