-
Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers
Authors:
Shayan Mohajer Hamidi,
Linfeng Ye
Abstract:
Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generaliza…
▽ More
Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generalization, leaving DNNs vulnerable to unforeseen attack types. To address these dual challenges, this paper introduces adversarial training via adaptive knowledge amalgamation of an ensemble of teachers (AT-AKA). In particular, we generate a diverse set of adversarial samples as the inputs to an ensemble of teachers; and then, we adaptively amalgamate the logtis of these teachers to train a generalized-robust student. Through comprehensive experiments, we illustrate the superior efficacy of AT-AKA over existing AT methods and adversarial robustness distillation techniques against cutting-edge attacks, including AutoAttack.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
Authors:
Linfeng Ye,
Shayan Mohajer Hamidi,
Renhao Tan,
En-Hui Yang
Abstract:
It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond…
▽ More
It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of conditional mutual information (CMI) into the estimation of BCPD and propose a novel estimator called the maximum CMI (MCMI) method. Specifically, in MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained. Through Eigen-CAM, it is further shown that maximizing the teacher's CMI value allows the teacher to capture more contextual information in an image cluster. Via conducting a thorough set of experiments, we show that by employing a teacher trained via MCMI estimation rather than one trained via MLL estimation in various state-of-the-art KD frameworks, the student's classification accuracy consistently increases, with the gain of up to 3.32\%. This suggests that the teacher's BCPD estimate provided by MCMI method is more accurate than that provided by MLL method. In addition, we show that such improvements in the student's accuracy are more drastic in zero-shot and few-shot settings. Notably, the student's accuracy increases with the gain of up to 5.72\% when 5\% of the training samples are available to the student (few-shot), and increases from 0\% to as high as 84\% for an omitted class (zero-shot). The code is available at \url{https://github.com/iclr2024mcmi/ICLRMCMI}.
△ Less
Submitted 7 March, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes
Authors:
Shayan Mohajer Hamidi,
Linfeng Ye
Abstract:
Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each c…
▽ More
Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.
△ Less
Submitted 20 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
AdaFed: Fair Federated Learning via Adaptive Common Descent Direction
Authors:
Shayan Mohajer Hamidi,
En-Hui Yang
Abstract:
Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is…
▽ More
Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is to find an updating direction for the server along which (i) all the clients' loss functions are decreasing; and (ii) more importantly, the loss functions for the clients with larger values decrease with a higher rate. AdaFed adaptively tunes this common direction based on the values of local gradients and loss functions. We validate the effectiveness of AdaFed on a suite of federated datasets, and demonstrate that AdaFed outperforms state-of-the-art fair FL methods.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Conditional Mutual Information Constrained Deep Learning for Classification
Authors:
En-Hui Yang,
Shayan Mohajer Hamidi,
Linfeng Ye,
Renhao Tan,
Beverly Yang
Abstract:
The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the D…
▽ More
The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Vectorial characterization of Bloch surface wave via one-dimensional photonic-atomic structure
Authors:
M. Asadolah Salmanpour,
M. Mosleh,
S. M. Hamidi
Abstract:
Use of hot atomic vapor as a new tool for tracing the complex nature of light has become a knowledge-based topic in recent years. In this paper, we examine the polarization ellipse of the Bloch surface wave (BSW) through the effect of a magnetic field on the coupling of these surface waves in BSW-hot atomic vapor cell. For this purpose, we fabricate a one-dimensional photonic crystal-based Bloch w…
▽ More
Use of hot atomic vapor as a new tool for tracing the complex nature of light has become a knowledge-based topic in recent years. In this paper, we examine the polarization ellipse of the Bloch surface wave (BSW) through the effect of a magnetic field on the coupling of these surface waves in BSW-hot atomic vapor cell. For this purpose, we fabricate a one-dimensional photonic crystal-based Bloch wave atom cell, where under different configurations of magnetic field, polarization ellipse of Bloch surface waves has been recorded experimentally. Our results indicate that by applying the magnetic field in different directions, Faraday and Voigt, the characteristics of electromagnetically induced transparency (EIT-like) of hybrid system change. We have used these changes to redefine the geometry of Voigt and Faraday for evanescent waves, as well as to measure the ratio of the components of the elliptical polarized electric field. These characterizations can open new insight into the miniaturized atomic field in high quality and low volumetric areas.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Bloch Surface Wave-atom Coupling in Periodic Photonic Structure
Authors:
M. Asadolah Salmanpour,
M. Mosleh,
S. M. Hamidi
Abstract:
Considering efforts for hot atom vapor-nanophotonic integration as a new paradigm in quantum optics in this paper we introduce 1D photonic crystal-Rb vapor cell as structure with miniaturized interaction volume. The Bloch surface wave excited on surface of a photonic crystal as electromagnetic hosting photonic mode, and altered the optical response of Rb atoms in the vicinity of surface. Coupling…
▽ More
Considering efforts for hot atom vapor-nanophotonic integration as a new paradigm in quantum optics in this paper we introduce 1D photonic crystal-Rb vapor cell as structure with miniaturized interaction volume. The Bloch surface wave excited on surface of a photonic crystal as electromagnetic hosting photonic mode, and altered the optical response of Rb atoms in the vicinity of surface. Coupling of atomic states with BSW confined modes would lead to quantum interference effects and results in nonlinearities in resonant coupling of atoms with BSW. We show Bloch surface wave induced transparency is highly stable under a change of incidence angle. Our results show slight changes in transitions detunings due to nonlinear interactions like the Casimir-Polder effect under change of localized density of optical states.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
A Secure Key Sharing Algorithm Exploiting Phase Reciprocity in Wireless Channels
Authors:
Shayan Mohajer Hamidi,
Amir Keyvan Khandani,
Ehsan Bateni
Abstract:
This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value…
▽ More
This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value is used to mask points of a Phase Shift Keying (PSK) constellation. Masking is achieved by rotating each PSK constellation with a shared phase value. Rotation of constellation is equivalent to adding phases modulo-$2π$, and as the channel phase is uniformly distributed in $[0,2π)$, the result of summation conveys zero information about summands. To enlarge the key size over a static or slow fading channel, the Radio Frequency (RF) propagation path is perturbed to create several independent realizations of multi-path fading, each used to share a new phase value. To eavesdrop a phase value shared in this manner, the Eavesdropper (Eve) will always face an under-determined system of linear equations which will not reveal any useful information about its actual solution value. This property is used to establish a secure key between two legitimate users.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Thundernna: a white box adversarial attack
Authors:
Linfeng Ye,
Shayan Mohajer Hamidi
Abstract:
The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds o…
▽ More
The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds of adversarial attacks. At the same time, the adversarial attack against a neural network can also reveal some characteristics of the neural network, a complex high-dimensional non-linear function, as discussed in previous work.
In This project, we develop a first-order method to attack the neural network. Compare with other first-order attacks, our method has a much higher success rate. Furthermore, it is much faster than second-order attacks and multi-steps first-order attacks.
△ Less
Submitted 21 January, 2024; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Nanophotonic structures with optical surface modes for tunable spin current generation
Authors:
P. V. Shilina,
D. O. Ignatyeva,
P. O. Kapralov,
S. K. Sekatskii,
M. Nur-E-Alam,
M. Vasiliev,
K. Alameh,
V. G. Achanta,
Y. Song,
S. M. Hamidi,
A. K. Zvezdin,
V. I. Belotelov
Abstract:
Heat generated by spin currents in spintronics-based devices is typically much less than that generated by charge current flows in conventional electronic devices. However, the conventional approaches for excitation of spin currents based on spin-pum** and spin Hall effect are limited in efficiency which restricts their application for viable spintronic devices. We propose a novel type of photon…
▽ More
Heat generated by spin currents in spintronics-based devices is typically much less than that generated by charge current flows in conventional electronic devices. However, the conventional approaches for excitation of spin currents based on spin-pum** and spin Hall effect are limited in efficiency which restricts their application for viable spintronic devices. We propose a novel type of photonic-crystal (PC) based structures for efficient and tunable optically-induced spin current generation via the Spin Seebeck and inverse spin Hall effects. It is experimentally demonstrated that optical surface modes localized at the PC surface covered by ferromagnetic layer and materials with giant spin-orbit coupling (SOC) notably increase the efficiency of the optically-induced spin current generation and provides its tunability by modifying light wavelength or angle of incidence. Up to 100% of the incident light power can be transferred to heat within the SOC layer and, therefore, to spin current. Importantly, high efficiency becomes accessible even for ultra-thin SOC layers. Moreover, surface patterning of the PC-based spintronic nanostructure allows local generation of spin currents at the pattern scales rather than diameter of the laser beam.
△ Less
Submitted 7 December, 2020; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Surface lattice resonance based magneto-plasmonic switch in NiFe patterned nano-structure
Authors:
H. Mbarak,
S. M. Hamidi,
V. I. Belotelov,
A. I. Chernov,
E. Mohajerani,
Y. Zaatar
Abstract:
In this work, a 2D magneto-plasmonic grating structure combining materials with ferromagnetic and plasmonic properties is demonstrated. NiFe composite ferromagnetic material, as an active medium with tunable physical properties, and Au metal, as a plasmonic excitation layer, were the materials of choice. Here, we have experimentally investigated the active control of the plasmonic characteristics…
▽ More
In this work, a 2D magneto-plasmonic grating structure combining materials with ferromagnetic and plasmonic properties is demonstrated. NiFe composite ferromagnetic material, as an active medium with tunable physical properties, and Au metal, as a plasmonic excitation layer, were the materials of choice. Here, we have experimentally investigated the active control of the plasmonic characteristics in Au/NiFe bilayer by the action of an external magnetic field, as well as the switching effect of the system. The active plasmonic control, can be achieved by the magnetization switching of the ferromagnetic material, opening a new path in the development of active plasmonic devices. To our best knowledge, this is the first demonstration of such a magneto-optical plasmonic switch based on the coupling of plasmons with magneto-optical active materials, in which the response time was estimated to be in the range of microseconds.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.