Search | arXiv e-print repository

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

Authors: Shayan Mohajer Hamidi, Linfeng Ye

Abstract: Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generaliza… ▽ More Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generalization, leaving DNNs vulnerable to unforeseen attack types. To address these dual challenges, this paper introduces adversarial training via adaptive knowledge amalgamation of an ensemble of teachers (AT-AKA). In particular, we generate a diverse set of adversarial samples as the inputs to an ensemble of teachers; and then, we adaptively amalgamate the logtis of these teachers to train a generalized-robust student. Through comprehensive experiments, we illustrate the superior efficacy of AT-AKA over existing AT methods and adversarial robustness distillation techniques against cutting-edge attacks, including AutoAttack. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2401.08732 [pdf, other]

Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information

Authors: Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, En-Hui Yang

Abstract: It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond… ▽ More It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of conditional mutual information (CMI) into the estimation of BCPD and propose a novel estimator called the maximum CMI (MCMI) method. Specifically, in MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained. Through Eigen-CAM, it is further shown that maximizing the teacher's CMI value allows the teacher to capture more contextual information in an image cluster. Via conducting a thorough set of experiments, we show that by employing a teacher trained via MCMI estimation rather than one trained via MLL estimation in various state-of-the-art KD frameworks, the student's classification accuracy consistently increases, with the gain of up to 3.32\%. This suggests that the teacher's BCPD estimate provided by MCMI method is more accurate than that provided by MLL method. In addition, we show that such improvements in the student's accuracy are more drastic in zero-shot and few-shot settings. Notably, the student's accuracy increases with the gain of up to 5.72\% when 5\% of the training samples are available to the student (few-shot), and increases from 0\% to as high as 84\% for an omitted class (zero-shot). The code is available at \url{https://github.com/iclr2024mcmi/ICLRMCMI}. △ Less

Submitted 7 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 32 pages, 19 figures, Published as a conference paper at ICLR 2024

MSC Class: 68T30 ACM Class: I.2.6

Journal ref: International Conference on Learning Representations 2024 (ICLR)

arXiv:2401.07991 [pdf, other]

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

Authors: Shayan Mohajer Hamidi, Linfeng Ye

Abstract: Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each c… ▽ More Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack. △ Less

Submitted 20 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: The paper has been accepted in ICASSP 2024

arXiv:2401.04993 [pdf, other]

AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

Authors: Shayan Mohajer Hamidi, En-Hui Yang

Abstract: Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is… ▽ More Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is to find an updating direction for the server along which (i) all the clients' loss functions are decreasing; and (ii) more importantly, the loss functions for the clients with larger values decrease with a higher rate. AdaFed adaptively tunes this common direction based on the values of local gradients and loss functions. We validate the effectiveness of AdaFed on a suite of federated datasets, and demonstrate that AdaFed outperforms state-of-the-art fair FL methods. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: This paper has been accepted in Transactions on Machine Learning Research. This is the link to the paper: https://openreview.net/forum?id=rFecyFpFUp&referrer=%5Bthe%20profile%20of%20Shayan%20Mohajer%20Hamidi%5D(%2Fprofile%3Fid%3D~Shayan_Mohajer_Hamidi1)

arXiv:2309.09123 [pdf, other]

Conditional Mutual Information Constrained Deep Learning for Classification

Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang

Abstract: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the D… ▽ More The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2304.14974 [pdf]

Vectorial characterization of Bloch surface wave via one-dimensional photonic-atomic structure

Authors: M. Asadolah Salmanpour, M. Mosleh, S. M. Hamidi

Abstract: Use of hot atomic vapor as a new tool for tracing the complex nature of light has become a knowledge-based topic in recent years. In this paper, we examine the polarization ellipse of the Bloch surface wave (BSW) through the effect of a magnetic field on the coupling of these surface waves in BSW-hot atomic vapor cell. For this purpose, we fabricate a one-dimensional photonic crystal-based Bloch w… ▽ More Use of hot atomic vapor as a new tool for tracing the complex nature of light has become a knowledge-based topic in recent years. In this paper, we examine the polarization ellipse of the Bloch surface wave (BSW) through the effect of a magnetic field on the coupling of these surface waves in BSW-hot atomic vapor cell. For this purpose, we fabricate a one-dimensional photonic crystal-based Bloch wave atom cell, where under different configurations of magnetic field, polarization ellipse of Bloch surface waves has been recorded experimentally. Our results indicate that by applying the magnetic field in different directions, Faraday and Voigt, the characteristics of electromagnetically induced transparency (EIT-like) of hybrid system change. We have used these changes to redefine the geometry of Voigt and Faraday for evanescent waves, as well as to measure the ratio of the components of the elliptical polarized electric field. These characterizations can open new insight into the miniaturized atomic field in high quality and low volumetric areas. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2210.12356 [pdf]

doi 10.1364/OE.479525

Bloch Surface Wave-atom Coupling in Periodic Photonic Structure

Authors: M. Asadolah Salmanpour, M. Mosleh, S. M. Hamidi

Abstract: Considering efforts for hot atom vapor-nanophotonic integration as a new paradigm in quantum optics in this paper we introduce 1D photonic crystal-Rb vapor cell as structure with miniaturized interaction volume. The Bloch surface wave excited on surface of a photonic crystal as electromagnetic hosting photonic mode, and altered the optical response of Rb atoms in the vicinity of surface. Coupling… ▽ More Considering efforts for hot atom vapor-nanophotonic integration as a new paradigm in quantum optics in this paper we introduce 1D photonic crystal-Rb vapor cell as structure with miniaturized interaction volume. The Bloch surface wave excited on surface of a photonic crystal as electromagnetic hosting photonic mode, and altered the optical response of Rb atoms in the vicinity of surface. Coupling of atomic states with BSW confined modes would lead to quantum interference effects and results in nonlinearities in resonant coupling of atoms with BSW. We show Bloch surface wave induced transparency is highly stable under a change of incidence angle. Our results show slight changes in transitions detunings due to nonlinear interactions like the Casimir-Polder effect under change of localized density of optical states. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 15 pages, 4 figures

arXiv:2111.15046 [pdf, other]

A Secure Key Sharing Algorithm Exploiting Phase Reciprocity in Wireless Channels

Authors: Shayan Mohajer Hamidi, Amir Keyvan Khandani, Ehsan Bateni

Abstract: This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value… ▽ More This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value is used to mask points of a Phase Shift Keying (PSK) constellation. Masking is achieved by rotating each PSK constellation with a shared phase value. Rotation of constellation is equivalent to adding phases modulo-$2π$, and as the channel phase is uniformly distributed in $[0,2π)$, the result of summation conveys zero information about summands. To enlarge the key size over a static or slow fading channel, the Radio Frequency (RF) propagation path is perturbed to create several independent realizations of multi-path fading, each used to share a new phase value. To eavesdrop a phase value shared in this manner, the Eavesdropper (Eve) will always face an under-determined system of linear equations which will not reveal any useful information about its actual solution value. This property is used to establish a secure key between two legitimate users. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.12305 [pdf, other]

Thundernna: a white box adversarial attack

Authors: Linfeng Ye, Shayan Mohajer Hamidi

Abstract: The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds o… ▽ More The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds of adversarial attacks. At the same time, the adversarial attack against a neural network can also reveal some characteristics of the neural network, a complex high-dimensional non-linear function, as discussed in previous work. In This project, we develop a first-order method to attack the neural network. Compare with other first-order attacks, our method has a much higher success rate. Furthermore, it is much faster than second-order attacks and multi-steps first-order attacks. △ Less

Submitted 21 January, 2024; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

MSC Class: 92B20 ACM Class: I.2.m

arXiv:2011.02770 [pdf]

Nanophotonic structures with optical surface modes for tunable spin current generation

Authors: P. V. Shilina, D. O. Ignatyeva, P. O. Kapralov, S. K. Sekatskii, M. Nur-E-Alam, M. Vasiliev, K. Alameh, V. G. Achanta, Y. Song, S. M. Hamidi, A. K. Zvezdin, V. I. Belotelov

Abstract: Heat generated by spin currents in spintronics-based devices is typically much less than that generated by charge current flows in conventional electronic devices. However, the conventional approaches for excitation of spin currents based on spin-pum** and spin Hall effect are limited in efficiency which restricts their application for viable spintronic devices. We propose a novel type of photon… ▽ More Heat generated by spin currents in spintronics-based devices is typically much less than that generated by charge current flows in conventional electronic devices. However, the conventional approaches for excitation of spin currents based on spin-pum** and spin Hall effect are limited in efficiency which restricts their application for viable spintronic devices. We propose a novel type of photonic-crystal (PC) based structures for efficient and tunable optically-induced spin current generation via the Spin Seebeck and inverse spin Hall effects. It is experimentally demonstrated that optical surface modes localized at the PC surface covered by ferromagnetic layer and materials with giant spin-orbit coupling (SOC) notably increase the efficiency of the optically-induced spin current generation and provides its tunability by modifying light wavelength or angle of incidence. Up to 100% of the incident light power can be transferred to heat within the SOC layer and, therefore, to spin current. Importantly, high efficiency becomes accessible even for ultra-thin SOC layers. Moreover, surface patterning of the PC-based spintronic nanostructure allows local generation of spin currents at the pattern scales rather than diameter of the laser beam. △ Less

Submitted 7 December, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2005.10913 [pdf]

doi 10.1016/j.jmmm.2020.167387

Surface lattice resonance based magneto-plasmonic switch in NiFe patterned nano-structure

Authors: H. Mbarak, S. M. Hamidi, V. I. Belotelov, A. I. Chernov, E. Mohajerani, Y. Zaatar

Abstract: In this work, a 2D magneto-plasmonic grating structure combining materials with ferromagnetic and plasmonic properties is demonstrated. NiFe composite ferromagnetic material, as an active medium with tunable physical properties, and Au metal, as a plasmonic excitation layer, were the materials of choice. Here, we have experimentally investigated the active control of the plasmonic characteristics… ▽ More In this work, a 2D magneto-plasmonic grating structure combining materials with ferromagnetic and plasmonic properties is demonstrated. NiFe composite ferromagnetic material, as an active medium with tunable physical properties, and Au metal, as a plasmonic excitation layer, were the materials of choice. Here, we have experimentally investigated the active control of the plasmonic characteristics in Au/NiFe bilayer by the action of an external magnetic field, as well as the switching effect of the system. The active plasmonic control, can be achieved by the magnetization switching of the ferromagnetic material, opening a new path in the development of active plasmonic devices. To our best knowledge, this is the first demonstration of such a magneto-optical plasmonic switch based on the coupling of plasmons with magneto-optical active materials, in which the response time was estimated to be in the range of microseconds. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Showing 1–11 of 11 results for author: Hamidi, S M