Search | arXiv e-print repository

On Inherent Adversarial Robustness of Active Vision Systems

Authors: Amitangshu Mukherjee, Timur Ibrayev, Kaushik Roy

Abstract: Current Deep Neural Networks are vulnerable to adversarial examples, which alter their predictions by adding carefully crafted noise. Since human eyes are robust to such inputs, it is possible that the vulnerability stems from the standard way of processing inputs in one shot by processing every pixel with the same importance. In contrast, neuroscience suggests that the human vision system can dif… ▽ More Current Deep Neural Networks are vulnerable to adversarial examples, which alter their predictions by adding carefully crafted noise. Since human eyes are robust to such inputs, it is possible that the vulnerability stems from the standard way of processing inputs in one shot by processing every pixel with the same importance. In contrast, neuroscience suggests that the human vision system can differentiate salient features by (1) switching between multiple fixation points (saccades) and (2) processing the surrounding with a non-uniform external resolution (foveation). In this work, we advocate that the integration of such active vision mechanisms into current deep learning systems can offer robustness benefits. Specifically, we empirically demonstrate the inherent robustness of two active vision methods - GFNet and FALcon - under a black box threat model. By learning and inferencing based on downsampled glimpses obtained from multiple distinct fixation points within an input, we show that these active methods achieve (2-3) times greater robustness compared to a standard passive convolutional network under state-of-the-art adversarial attacks. More importantly, we provide illustrative and interpretable visualization analysis that demonstrates how performing inference from distinct fixation points makes active vision methods less vulnerable to malicious inputs. △ Less

Submitted 5 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.15977 [pdf, other]

doi 10.1109/TCDS.2024.3390597

Towards Two-Stream Foveation-based Active Vision Learning

Authors: Timur Ibrayev, Amitangshu Mukherjee, Sai Aparna Aketi, Kaushik Roy

Abstract: Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both "what object is being observed" and "where it is located". In contrast, the "two-stream hypothesis" from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the… ▽ More Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both "what object is being observed" and "where it is located". In contrast, the "two-stream hypothesis" from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the what and the where questions. In this work, we propose a machine learning framework inspired by the "two-stream hypothesis" and explore the potential benefits that it offers. Specifically, the proposed framework models the following mechanisms: 1) ventral (what) stream focusing on the input regions perceived by the fovea part of an eye (foveation), 2) dorsal (where) stream providing visual guidance, and 3) iterative processing of the two streams to calibrate visual focus and process the sequence of focused image patches. The training of the proposed framework is accomplished by label-based DNN training for the ventral stream model and reinforcement learning for the dorsal stream model. We show that the two-stream foveation-based learning is applicable to the challenging task of weakly-supervised object localization (WSOL), where the training data is limited to the object class or its attributes. The framework is capable of both predicting the properties of an object and successfully localizing it by predicting its bounding box. We also show that, due to the independent nature of the two streams, the dorsal model can be applied on its own to unseen images to localize objects from different datasets. △ Less

Submitted 20 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted version of the article, 18 pages, 14 figures

Journal ref: IEEE Transactions on Cognitive and Developmental Systems, 2024

arXiv:2403.13082 [pdf, other]

Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory Accelerators

Authors: Timur Ibrayev, Isha Garg, Indranil Chakraborty, Kaushik Roy

Abstract: Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digi… ▽ More Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digital converters (ADCs) to communicate crossbar outputs. ADCs consume a significant portion of energy and area of every crossbar processing unit, thus diminishing the potential efficiency benefits. Pruning is a well-studied technique to improve the efficiency of DNNs but requires modifications to be effective for crossbars. In this paper, we motivate crossbar-attuned pruning to target ADC-specific inefficiencies. This is achieved by identifying three key properties (dubbed D.U.B.) that induce sparsity that can be utilized to reduce ADC energy without sacrificing accuracy. The first property ensures that sparsity translates effectively to hardware efficiency by restricting sparsity levels to Discrete powers of 2. The other 2 properties encourage columns in the same crossbar to achieve both Unstructured and Balanced sparsity in order to amortize the accuracy drop. The desired D.U.B. sparsity is then achieved by regularizing the variance of $L_{0}$ norms of neighboring columns within the same crossbar. Our proposed implementation allows it to be directly used in end-to-end gradient-based training. We apply the proposed algorithm to convolutional layers of VGG11 and ResNet18 models, trained on CIFAR-10 and ImageNet datasets, and achieve up to 7.13x and 1.27x improvement, respectively, in ADC energy with less than 1% drop in accuracy. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 11 pages, 5 figures

arXiv:2402.03073 [pdf, other]

Study of dark counts in optical superconducting transition-edge sensors

Authors: Laura Manenti, Carlo Pepe, Isaac Sarnoff, Tengiz Ibrayev, Panagiotis Oikonomou, Artem Knyazev, Eugenio Monticone, Hobey Garrone, Fiona Alder, Osama Fawwaz, Alexander J. Millar, Knut Dundas Morå, Hamad Shams, Francesco Arneodo, Mauro Rajteri

Abstract: Superconducting transition-edge sensors (TESs), known for their high single-photon detection efficiency and low background, are increasingly being used in rare event searches. We present the first comprehensive characterization of optical TES backgrounds, identifying three event types: high-energy, electrical noise, and photon-like events. We experimentally verify and simulate the source of the hi… ▽ More Superconducting transition-edge sensors (TESs), known for their high single-photon detection efficiency and low background, are increasingly being used in rare event searches. We present the first comprehensive characterization of optical TES backgrounds, identifying three event types: high-energy, electrical noise, and photon-like events. We experimentally verify and simulate the source of the high-energy events. We develop an algorithm to isolate photon-like events, the expected signal in dark matter searches, achieving record-low photon-like dark count rates in the 0.8-3.2 eV energy range. △ Less

Submitted 7 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.17515 [pdf, other]

Towards Image Semantics and Syntax Sequence Learning

Authors: Chun Tao, Timur Ibrayev, Kaushik Roy

Abstract: Convolutional neural networks and vision transformers have achieved outstanding performance in machine perception, particularly for image classification. Although these image classifiers excel at predicting image-level class labels, they may not discriminate missing or shifted parts within an object. As a result, they may fail to detect corrupted images that involve missing or disarrayed semantic… ▽ More Convolutional neural networks and vision transformers have achieved outstanding performance in machine perception, particularly for image classification. Although these image classifiers excel at predicting image-level class labels, they may not discriminate missing or shifted parts within an object. As a result, they may fail to detect corrupted images that involve missing or disarrayed semantic information in the object composition. On the contrary, human perception easily distinguishes such corruptions. To mitigate this gap, we introduce the concept of "image grammar", consisting of "image semantics" and "image syntax", to denote the semantics of parts or patches of an image and the order in which these parts are arranged to create a meaningful object. To learn the image grammar relative to a class of visual objects/scenes, we propose a weakly supervised two-stage approach. In the first stage, we use a deep clustering framework that relies on iterative clustering and feature refinement to produce part-semantic segmentation. In the second stage, we incorporate a recurrent bi-LSTM module to process a sequence of semantic segmentation patches to capture the image syntax. Our framework is trained to reason over patch semantics and detect faulty syntax. We benchmark the performance of several grammar learning models in detecting patch corruptions. Finally, we verify the capabilities of our framework in Celeb and SUNRGBD datasets and demonstrate that it can achieve a grammar validation accuracy of 70 to 90% in a wide variety of semantic and syntactical corruption scenarios. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 21 pages, 22 figures, 5 tables

arXiv:2204.13373 [pdf, other]

doi 10.1093/mnras/stac1191

NIHAO XXVIII: Collateral effects of AGN on dark matter concentration and stellar kinematics

Authors: Stefan Waterval, Sana Elgamal, Matteo Nori, Mario Pasquato, Andrea V. Macciò, Marvin Blank, Keri L. Dixon, Xi Kang, Tengiz Ibrayev

Abstract: Although active galactic nuclei (AGN) feedback is required in simulations of galaxies to regulate star formation, further downstream effects on the dark matter distribution of the halo and stellar kinematics of the central galaxy can be expected. We combine simulations of galaxies with and without AGN physics from the Numerical Investigation of a Hundred Astrophysical Objects (NIHAO) to investigat… ▽ More Although active galactic nuclei (AGN) feedback is required in simulations of galaxies to regulate star formation, further downstream effects on the dark matter distribution of the halo and stellar kinematics of the central galaxy can be expected. We combine simulations of galaxies with and without AGN physics from the Numerical Investigation of a Hundred Astrophysical Objects (NIHAO) to investigate the effect of AGN on the dark matter profile and central stellar rotation of the host galaxies. Specifically, we study how the concentration-halo mass ($c-M$) relation and the stellar spin parameter ($λ_R$) are affected by AGN feedback. We find that AGN physics is crucial to reduce the central density of simulated massive ($\gtrsim 10^{12}$ M$_\odot$) galaxies and bring their concentration to agreement with results from the Spitzer Photometry & Accurate Rotation Curves (SPARC) sample. Similarly, AGN feedback has a key role in reproducing the dichotomy between slow and fast rotators as observed by the ATLAS$^{3\text{D}}$ survey. Without star formation suppression due to AGN feedback, the number of fast rotators strongly exceeds the observational constraints. Our study shows that there are several collateral effects that support the importance of AGN feedback in galaxy formation, and these effects can be used to constrain its implementation in numerical simulations. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted for publication in MNRAS. 15 pages, 10 figures

arXiv:2008.12016 [pdf, other]

On the Intrinsic Robustness of NVM Crossbars Against Adversarial Attacks

Authors: Deboleena Roy, Indranil Chakraborty, Timur Ibrayev, Kaushik Roy

Abstract: The increasing computational demand of Deep Learning has propelled research in special-purpose inference accelerators based on emerging non-volatile memory (NVM) technologies. Such NVM crossbars promise fast and energy-efficient in-situ Matrix Vector Multiplication (MVM) thus alleviating the long-standing von Neuman bottleneck in today's digital hardware. However, the analog nature of computing in… ▽ More The increasing computational demand of Deep Learning has propelled research in special-purpose inference accelerators based on emerging non-volatile memory (NVM) technologies. Such NVM crossbars promise fast and energy-efficient in-situ Matrix Vector Multiplication (MVM) thus alleviating the long-standing von Neuman bottleneck in today's digital hardware. However, the analog nature of computing in these crossbars is inherently approximate and results in deviations from ideal output values, which reduces the overall performance of Deep Neural Networks (DNNs) under normal circumstances. In this paper, we study the impact of these non-idealities under adversarial circumstances. We show that the non-ideal behavior of analog computing lowers the effectiveness of adversarial attacks, in both Black-Box and White-Box attack scenarios. In a non-adaptive attack, where the attacker is unaware of the analog hardware, we observe that analog computing offers a varying degree of intrinsic robustness, with a peak adversarial accuracy improvement of 35.34%, 22.69%, and 9.90% for white box PGD (epsilon=1/255, iter=30) for CIFAR-10, CIFAR-100, and ImageNet respectively. We also demonstrate "Hardware-in-Loop" adaptive attacks that circumvent this robustness by utilizing the knowledge of the NVM model. △ Less

Submitted 15 March, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: to appear in Proceedings of DAC, 2021

arXiv:1709.08184 [pdf, other]

On-chip Face Recognition System Design with Memristive Hierarchical Temporal Memory

Authors: Timur Ibrayev, Ulan Myrzakhan, Olga Krestinskaya, Aidana Irmanova, Alex Pappachen James

Abstract: Hierarchical Temporal Memory is a new machine learning algorithm intended to mimic the working principle of neocortex, part of the human brain, which is responsible for learning, classification, and making predictions. Although many works illustrate its effectiveness as a software algorithm, hardware design for HTM remains an open research problem. Hence, this work proposes an architecture for HTM… ▽ More Hierarchical Temporal Memory is a new machine learning algorithm intended to mimic the working principle of neocortex, part of the human brain, which is responsible for learning, classification, and making predictions. Although many works illustrate its effectiveness as a software algorithm, hardware design for HTM remains an open research problem. Hence, this work proposes an architecture for HTM Spatial Pooler and Temporal Memory with learning mechanism, which creates a single image for each class based on important and unimportant features of all images in the training set. In turn, the reduction in the number of templates within database reduces the memory requirements and increases the processing speed. Moreover, face recognition analysis indicates that for a large number of training images, the proposed design provides higher accuracy results (83.5\%) compared to only Spatial Pooler design presented in the previous works. △ Less

Submitted 24 September, 2017; originally announced September 2017.

Comments: Journal of Intelligent and Fuzzy Systems, 2018

Showing 1–8 of 8 results for author: Ibrayev, T