Search | arXiv e-print repository

The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

Abstract: In this paper, we study the offline RL problem with linear function approximation. Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy. This assumption is natural in that it is essentially the minimal assumption required for value iteration to succeed. We give a com… ▽ More In this paper, we study the offline RL problem with linear function approximation. Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy. This assumption is natural in that it is essentially the minimal assumption required for value iteration to succeed. We give a computationally efficient algorithm which succeeds under a single-policy coverage condition on the dataset, namely which outputs a policy whose value is at least that of any policy which is well-covered by the dataset. Even in the setting when the inherent Bellman error is 0 (termed linear Bellman completeness), our algorithm yields the first known guarantee under single-policy coverage. In the setting of positive inherent Bellman error ${\varepsilon_{\mathrm{BE}}} > 0$, we show that the suboptimality error of our algorithm scales with $\sqrt{\varepsilon_{\mathrm{BE}}}$. Furthermore, we prove that the scaling of the suboptimality with $\sqrt{\varepsilon_{\mathrm{BE}}}$ cannot be improved for any algorithm. Our lower bound stands in contrast to many other settings in reinforcement learning with misspecification, where one can typically obtain performance that degrades linearly with the misspecification error. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: RLC 2024

arXiv:2406.11640 [pdf, ps, other]

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

Authors: Noah Golowich, Ankur Moitra

Abstract: One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We… ▽ More One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We study the problem of learning an optimal policy under Bellman completeness in the online model of RL with linear function approximation. In the linear setting, while statistically efficient algorithms are known under Bellman completeness (e.g., Jiang et al. (2017); Zanette et al. (2020)), these algorithms all rely on the principle of global optimism which requires solving a nonconvex optimization problem. In particular, it has remained open as to whether computationally efficient algorithms exist. In this paper we give the first polynomial-time algorithm for RL under linear Bellman completeness when the number of actions is any constant. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: COLT 2024

arXiv:2406.02633 [pdf, ps, other]

Edit Distance Robust Watermarks for Language Models

Authors: Noah Golowich, Ankur Moitra

Abstract: Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual o… ▽ More Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.00082 [pdf, other]

Structure learning of Hamiltonians from real-time evolution

Authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang

Abstract: We initiate the study of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-studied under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But i… ▽ More We initiate the study of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-studied under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But is it possible to learn a local Hamiltonian without prior knowledge of its interaction structure? We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with an evolution time scaling with $1/\varepsilon$, and has the following appealing properties: (1) it does not need to know the Hamiltonian terms; (2) it works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; (3) it evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. To our knowledge, no prior algorithm with Heisenberg-limited scaling existed with even one of these properties. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^2$. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 50 pages

arXiv:2404.15185 [pdf, other]

doi 10.1145/3649329.3655679

PIVOT- Input-aware Path Selection for Energy-efficient ViT Inference

Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Priyadarshini Panda

Abstract: The attention module in vision transformers(ViTs) performs intricate spatial correlations, contributing significantly to accuracy and delay. It is thereby important to modulate the number of attentions according to the input feature complexity for optimal delay-accuracy tradeoffs. To this end, we propose PIVOT - a co-optimization framework which selectively performs attention skip** based on the… ▽ More The attention module in vision transformers(ViTs) performs intricate spatial correlations, contributing significantly to accuracy and delay. It is thereby important to modulate the number of attentions according to the input feature complexity for optimal delay-accuracy tradeoffs. To this end, we propose PIVOT - a co-optimization framework which selectively performs attention skip** based on the input difficulty. For this, PIVOT employs a hardware-in-loop co-search to obtain optimal attention skip configurations. Evaluations on the ZCU102 MPSoC FPGA show that PIVOT achieves 2.7x lower EDP at 0.2% accuracy reduction compared to LVViT-S ViT. PIVOT also achieves 1.3% and 1.8x higher accuracy and throughput than prior works on traditional CPUs and GPUs. The PIVOT project can be found at https://github.com/Intelligent-Computing-Lab-Yale/PIVOT. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted to 61st ACM/IEEE Design Automation Conference (DAC '24), June 23--27, 2024, San Francisco, CA, USA (6 Pages)

arXiv:2404.11325 [pdf, ps, other]

On Learning Parities with Dependent Noise

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: In this expository note we show that the learning parities with noise (LPN) assumption is robust to weak dependencies in the noise distribution of small batches of samples. This provides a partial converse to the linearization technique of [AG11]. The material in this note is drawn from a recent work by the authors [GMR24], where the robustness guarantee was a key component in a cryptographic sepa… ▽ More In this expository note we show that the learning parities with noise (LPN) assumption is robust to weak dependencies in the noise distribution of small batches of samples. This provides a partial converse to the linearization technique of [AG11]. The material in this note is drawn from a recent work by the authors [GMR24], where the robustness guarantee was a key component in a cryptographic separation between reinforcement learning and supervised learning. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: This note draws heavily from arXiv:2404.03774

arXiv:2404.03774 [pdf, other]

Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free explorati… ▽ More Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free exploration is provably computationally harder than the associated regression problem. We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem. It is known that being able to perform regression in block MDPs is necessary for finding a good policy; our results suggest that it is not sufficient. Our separation lower bound uses a new robustness property of the Learning Parities with Noise (LPN) hardness assumption, which is crucial in handling the dependent nature of RL data. We argue that separations and oracle lower bounds, such as ours, are a more meaningful way to prove hardness of learning because the constructions better reflect the practical reality that supervised learning by itself is often not the computational bottleneck. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 112 pages, 3 figures

arXiv:2403.16850 [pdf, other]

High-Temperature Gibbs States are Unentangled and Efficiently Preparable

Authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang

Abstract: We show that thermal states of local Hamiltonians are separable above a constant temperature. Specifically, for a local Hamiltonian $H$ on a graph with degree $\mathfrak{d}$, its Gibbs state at inverse temperature $β$, denoted by $ρ=e^{-βH}/ \textrm{tr}(e^{-βH})$, is a classical distribution over product states for all $β< 1/(c\mathfrak{d})$, where $c$ is a constant. This sudden death of thermal e… ▽ More We show that thermal states of local Hamiltonians are separable above a constant temperature. Specifically, for a local Hamiltonian $H$ on a graph with degree $\mathfrak{d}$, its Gibbs state at inverse temperature $β$, denoted by $ρ=e^{-βH}/ \textrm{tr}(e^{-βH})$, is a classical distribution over product states for all $β< 1/(c\mathfrak{d})$, where $c$ is a constant. This sudden death of thermal entanglement upends conventional wisdom about the presence of short-range quantum correlations in Gibbs states. Moreover, we show that we can efficiently sample from the distribution over product states. In particular, for any $β< 1/( c \mathfrak{d}^3)$, we can prepare a state $ε$-close to $ρ$ in trace distance with a depth-one quantum circuit and $\textrm{poly}(n) \log(1/ε)$ classical overhead. A priori the task of preparing a Gibbs state is a natural candidate for achieving super-polynomial quantum speedups, but our results rule out this possibility above a fixed constant temperature. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.02586 [pdf, other]

ClipFormer: Key-Value Clip** of Transformers on Memristive Crossbars for Write Noise Mitigation

Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: Transformers have revolutionized various real-world applications from natural language processing to computer vision. However, traditional von-Neumann computing paradigm faces memory and bandwidth limitations in accelerating transformers owing to their massive model sizes. To this end, In-memory Computing (IMC) crossbars based on Non-volatile Memories (NVMs), due to their ability to perform highly… ▽ More Transformers have revolutionized various real-world applications from natural language processing to computer vision. However, traditional von-Neumann computing paradigm faces memory and bandwidth limitations in accelerating transformers owing to their massive model sizes. To this end, In-memory Computing (IMC) crossbars based on Non-volatile Memories (NVMs), due to their ability to perform highly parallelized Matrix-Vector-Multiplications (MVMs) with high energy-efficiencies, have emerged as a promising solution for accelerating transformers. However, analog MVM operations in crossbars introduce non-idealities, such as stochastic read & write noise, which affect the inference accuracy of the deployed transformers. Specifically, we find pre-trained Vision Transformers (ViTs) to be vulnerable on crossbars due to the impact of write noise on the dynamically-generated Key (K) and Value (V) matrices in the attention layers, an effect not accounted for in prior studies. We, thus, propose ClipFormer, a transformation on the K and V matrices during inference, to boost the non-ideal accuracies of pre-trained ViT models. ClipFormer requires no additional hardware and training overhead and is amenable to transformers deployed on any memristive crossbar platform. Our experiments on Imagenet-1k dataset using pre-trained DeiT-S transformers, subjected to standard training and variation-aware-training, show >10-40% higher non-ideal accuracies at the high write noise regime by applying ClipFormer. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 9 pages, 10 figures, 3 tables, 1 appendix

arXiv:2401.08001 [pdf, other]

TT-SNN: Tensor Train Decomposition for Efficient Spiking Neural Network Training

Authors: Donghyun Lee, Ruokai Yin, Youngeun Kim, Abhishek Moitra, Yuhang Li, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decompos… ▽ More Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency. In addition, we propose a parallel computation pipeline as an alternative to the typical sequential tensor computation, which can be flexibly integrated into various existing SNN architectures. To the best of our knowledge, this is the first of its kind application of tensor decomposition in SNNs. We validate our method using both static and dynamic datasets, CIFAR10/100 and N-Caltech101, respectively. We also propose a TT-SNN-tailored training accelerator to fully harness the parallelism in TT-SNN. Our results demonstrate substantial reductions in parameter size (7.98X), FLOPs (9.25X), training time (17.7%), and training energy (28.3%) during training for the N-Caltech101 dataset, with negligible accuracy degradation. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.03559 [pdf, other]

MCAIMem: a Mixed SRAM and eDRAM Cell for Area and Energy-efficient on-chip AI Memory

Authors: Duy-Thanh Nguyen, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies like non-volatile memory, which offers fast-read memory access and a small cell area. Despite these advantages, non-volatile mem… ▽ More AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies like non-volatile memory, which offers fast-read memory access and a small cell area. Despite these advantages, non-volatile memory's slow write memory access and high write energy consumption prevent it from surpassing SRAM performance in AI applications with extensive memory access requirements. Some research has also investigated eDRAM as an area-efficient on-chip memory with similar access times as SRAM. Still, refresh power remains a concern, leaving the trade-off between performance, area, and power consumption unresolved. To address this issue, our paper presents a novel mixed CMOS cell memory design that balances performance, area, and energy efficiency for AI memory by combining SRAM and eDRAM cells. We consider the proportion ratio of one SRAM and seven eDRAM cells in the memory to achieve area reduction using mixed CMOS cell memory. Additionally, we capitalize on the characteristics of DNN data representation and integrate asymmetric eDRAM cells to lower energy consumption. To validate our proposed MCAIMem solution, we conduct extensive simulations and benchmarking against traditional SRAM. Our results demonstrate that MCAIMem significantly outperforms these alternatives in terms of area and energy efficiency. Specifically, our MCAIMem can reduce the area by 48\% and energy consumption by 3.4$\times$ compared to SRAM designs, without incurring any accuracy loss. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2310.06845 [pdf, other]

RobustEdge: Low Power Adversarial Detection for Cloud-Edge Systems

Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda

Abstract: In practical cloud-edge scenarios, where a resource constrained edge performs data acquisition and a cloud system (having sufficient resources) performs inference tasks with a deep neural network (DNN), adversarial robustness is critical for reliability and ubiquitous deployment. Adversarial detection is a prime adversarial defence technique used in prior literature. However, in prior detection wo… ▽ More In practical cloud-edge scenarios, where a resource constrained edge performs data acquisition and a cloud system (having sufficient resources) performs inference tasks with a deep neural network (DNN), adversarial robustness is critical for reliability and ubiquitous deployment. Adversarial detection is a prime adversarial defence technique used in prior literature. However, in prior detection works, the detector is attached to the classifier model and both detector and classifier work in tandem to perform adversarial detection that requires a high computational overhead which is not available at the low-power edge. Therefore, prior works can only perform adversarial detection at the cloud and not at the edge. This means that in case of adversarial attacks, the unfavourable adversarial samples must be communicated to the cloud which leads to energy wastage at the edge device. Therefore, a low-power edge-friendly adversarial detection method is required to improve the energy efficiency of the edge and robustness of the cloud-based classifier. To this end, RobustEdge proposes Quantization-enabled Energy Separation (QES) training with "early detection and exit" to perform edge-based low cost adversarial detection. The QES-trained detector implemented at the edge blocks adversarial data transmission to the classifier model, thereby improving adversarial robustness and energy-efficiency of the Cloud-Edge system. △ Less

Submitted 5 September, 2023; originally announced October 2023.

Comments: IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)

arXiv:2310.02243 [pdf, other]

Learning quantum Hamiltonians at any temperature in polynomial time

Authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang

Abstract: We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obta… ▽ More We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $ε$ from polynomially many copies of the Gibbs state at any constant $β> 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.09457 [pdf, ps, other]

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $φ(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen s… ▽ More The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $φ(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems. Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming. As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible. △ Less

Submitted 18 September, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.03388 [pdf, other]

Are SNNs Truly Energy-efficient? $-$ A Hardware Perspective

Authors: Abhiroop Bhattacharjee, Ruokai Yin, Abhishek Moitra, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have gained attention for their energy-efficient machine learning capabilities, utilizing bio-inspired activation functions and sparse binary spike-data representations. While recent SNN algorithmic advances achieve high accuracy on large-scale computer vision tasks, their energy-efficiency claims rely on certain impractical estimation metrics. This work studies two… ▽ More Spiking Neural Networks (SNNs) have gained attention for their energy-efficient machine learning capabilities, utilizing bio-inspired activation functions and sparse binary spike-data representations. While recent SNN algorithmic advances achieve high accuracy on large-scale computer vision tasks, their energy-efficiency claims rely on certain impractical estimation metrics. This work studies two hardware benchmarking platforms for large-scale SNN inference, namely SATA and SpikeSim. SATA is a sparsity-aware systolic-array accelerator, while SpikeSim evaluates SNNs implemented on In-Memory Computing (IMC) based analog crossbars. Using these tools, we find that the actual energy-efficiency improvements of recent SNN algorithmic works differ significantly from their estimated values due to various hardware bottlenecks. We identify and address key roadblocks to efficient SNN deployment on hardware, including repeated computations & data movements over timesteps, neuronal module overhead, and vulnerability of SNNs towards crossbar non-idealities. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 5 pages

arXiv:2308.00664 [pdf, other]

doi 10.1109/JETCAS.2023.3327748

HyDe: A Hybrid PCM/FeFET/SRAM Device-search for Optimizing Area and Energy-efficiencies in Analog IMC Platforms

Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs & FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC device offers its own pros & cons during inference of Deep Neural Networks (DNNs) on crossbars in terms of area overhead, programming energy and non-idealities. A design-space exploration is, therefore, imperative to derive a hybrid-dev… ▽ More Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs & FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC device offers its own pros & cons during inference of Deep Neural Networks (DNNs) on crossbars in terms of area overhead, programming energy and non-idealities. A design-space exploration is, therefore, imperative to derive a hybrid-device architecture optimized for accurate DNN inference under the impact of non-idealities from multiple devices, while maintaining competitive area & energy-efficiencies. We propose a two-phase search framework (HyDe) that exploits the best of all worlds offered by multiple devices to determine an optimal hybrid-device architecture for a given DNN topology. Our hybrid models achieve upto 2.30-2.74x higher TOPS/mm^2 at 22-26% higher energy-efficiencies than baseline homogeneous models for a VGG16 DNN topology. We further propose a feasible implementation of the HyDe-derived hybrid-device architectures in the 2.5D design space using chiplets to reduce design effort and cost in the hardware fabrication involving multiple technology processes. △ Less

Submitted 24 October, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted to IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS)

Journal ref: IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2023

arXiv:2307.06538 [pdf, ps, other]

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems

Authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Morris Yau

Abstract: Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical… ▽ More Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models. △ Less

Submitted 23 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: ICML 2023

arXiv:2306.01993 [pdf, ps, other]

Provable benefits of score matching

Authors: Chirag Pabbaraju, Dhruv Rohatgi, Anish Sevekari, Holden Lee, Ankur Moitra, Andrej Risteski

Abstract: Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of th… ▽ More Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood -- both computational and statistical -- are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 25 Pages

arXiv:2305.18416 [pdf, other]

doi 10.1145/3583781.3590241

Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing

Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Youngeun Kim, Yeshwanth Venkatesha, Priyadarshini Panda

Abstract: In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently… ▽ More In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbar-aware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: Accepted in Great Lakes Symposium on VLSI 2023 (GLSVLSI 2023) conference

Journal ref: Great Lakes Symposium on VLSI 2023 (GLSVLSI 2023) conference

arXiv:2305.18360 [pdf, other]

Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks

Authors: Youngeun Kim, Yuhang Li, Abhishek Moitra, Ruokai Yin, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have gained increasing attention as energy-efficient neural networks owing to their binary and asynchronous computation. However, their non-linear activation, that is Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a membrane voltage to capture the temporal dynamics of spikes. Although the required memory cost for LIF neurons significantly… ▽ More Spiking Neural Networks (SNNs) have gained increasing attention as energy-efficient neural networks owing to their binary and asynchronous computation. However, their non-linear activation, that is Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a membrane voltage to capture the temporal dynamics of spikes. Although the required memory cost for LIF neurons significantly increases as the input dimension goes larger, a technique to reduce memory for LIF neurons has not been explored so far. To address this, we propose a simple and effective solution, EfficientLIF-Net, which shares the LIF neurons across different layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the standard SNNs while bringing up to ~4.3X forward memory efficiency and ~21.9X backward memory efficiency for LIF neurons. We conduct experiments on various datasets including CIFAR10, CIFAR100, TinyImageNet, ImageNet-100, and N-Caltech101. Furthermore, we show that our approach also offers advantages on Human Activity Recognition (HAR) datasets, which heavily rely on temporal information. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.17346 [pdf, other]

Input-Aware Dynamic Timestep Spiking Neural Networks for Efficient In-Memory Computing

Authors: Yuhang Li, Abhishek Moitra, Tamar Geller, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have recently attracted widespread research interest as an efficient alternative to traditional Artificial Neural Networks (ANNs) because of their capability to process sparse and binary spike information and avoid expensive multiplication operations. Although the efficiency of SNNs can be realized on the In-Memory Computing (IMC) architecture, we show that the energ… ▽ More Spiking Neural Networks (SNNs) have recently attracted widespread research interest as an efficient alternative to traditional Artificial Neural Networks (ANNs) because of their capability to process sparse and binary spike information and avoid expensive multiplication operations. Although the efficiency of SNNs can be realized on the In-Memory Computing (IMC) architecture, we show that the energy cost and latency of SNNs scale linearly with the number of timesteps used on IMC hardware. Therefore, in order to maximize the efficiency of SNNs, we propose input-aware Dynamic Timestep SNN (DT-SNN), a novel algorithmic solution to dynamically determine the number of timesteps during inference on an input-dependent basis. By calculating the entropy of the accumulated output after each timestep, we can compare it to a predefined threshold and decide if the information processed at the current timestep is sufficient for a confident prediction. We deploy DT-SNN on an IMC architecture and show that it incurs negligible computational overhead. We demonstrate that our method only uses 1.46 average timesteps to achieve the accuracy of a 4-timestep static SNN while reducing the energy-delay-product by 80%. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Published at Design & Automation Conferences (DAC) 2023

arXiv:2305.17223 [pdf, other]

Do We Really Need a Large Number of Visual Prompts?

Authors: Youngeun Kim, Yuhang Li, Abhishek Moitra, Ruokai Yin, Priyadarshini Panda

Abstract: Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in addition… ▽ More Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ~70% while maintaining accuracy. △ Less

Submitted 12 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.09850 [pdf, other]

MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking Neural Networks

Authors: Ruokai Yin, Yuhang Li, Abhishek Moitra, Priyadarshini Panda

Abstract: We propose Multiplier-less INTeger (MINT) quantization, a uniform quantization scheme that efficiently compresses weights and membrane potentials in spiking neural networks (SNNs). Unlike previous SNN quantization methods, MINT quantizes memory-intensive membrane potentials to an extremely low precision (2-bit), significantly reducing the memory footprint. MINT also shares the quantization scaling… ▽ More We propose Multiplier-less INTeger (MINT) quantization, a uniform quantization scheme that efficiently compresses weights and membrane potentials in spiking neural networks (SNNs). Unlike previous SNN quantization methods, MINT quantizes memory-intensive membrane potentials to an extremely low precision (2-bit), significantly reducing the memory footprint. MINT also shares the quantization scaling factor between weights and membrane potentials, eliminating the need for multipliers required in conventional uniform quantization. Experimental results show that our method matches the accuracy of full-precision models and other state-of-the-art SNN quantization techniques while surpassing them in memory footprint reduction and hardware cost efficiency at deployment. For example, 2-bit MINT VGG-16 achieves 90.6% accuracy on CIFAR-10, with roughly 93.8% reduction in memory footprint from the full-precision model and 90% reduction in computation energy compared to vanilla uniform quantization at deployment. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/MINT-Quantization. △ Less

Submitted 7 November, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: 6 pages. Accepted to 29th Asia and South Pacific Design Automation Conference (ASP-DAC 2024), nominated for best paper award

arXiv:2304.01954 [pdf, ps, other]

Strong spatial mixing for colorings on trees and its algorithmic applications

Authors: Zongchen Chen, Kuikui Liu, Nitya Mani, Ankur Moitra

Abstract: Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $Δ$-regular tree exhibits SSM whenever $q \ge Δ+1$. Moreover, it is widely believed that as long as SSM holds on boun… ▽ More Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $Δ$-regular tree exhibits SSM whenever $q \ge Δ+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $Δ\ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $Δ$ whenever $q \ge Δ+ 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59Δ+γ^*$ for an absolute constant $γ^* > 0$. (2) For any $Δ\ge 3$ and girth $g = Ω_Δ(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $Δ$ and girth $g$ whenever $q \ge Δ+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees. △ Less

Submitted 13 February, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 54 pages, 3 page appendix

arXiv:2303.17646 [pdf, other]

XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing

Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda

Abstract: The hardware-efficiency and accuracy of Deep Neural Networks (DNNs) implemented on In-memory Computing (IMC) architectures primarily depend on the DNN architecture and the peripheral circuit parameters. It is therefore essential to holistically co-search the network and peripheral parameters to achieve optimal performance. To this end, we propose XPert, which co-searches network architecture in ta… ▽ More The hardware-efficiency and accuracy of Deep Neural Networks (DNNs) implemented on In-memory Computing (IMC) architectures primarily depend on the DNN architecture and the peripheral circuit parameters. It is therefore essential to holistically co-search the network and peripheral parameters to achieve optimal performance. To this end, we propose XPert, which co-searches network architecture in tandem with peripheral parameters such as the type and precision of analog-to-digital converters, crossbar column sharing and the layer-specific input precision using an optimization-based design space exploration. Compared to VGG16 baselines, XPert achieves 10.24x (4.7x) lower EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%) accuracy for CIFAR10 (TinyImagenet) datasets. The code for this paper is available at https://github.com/Intelligent-Computing-Lab-Yale/XPert. △ Less

Submitted 21 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: Accepted to Design and Automation Conference (DAC)

Journal ref: 60th DAC, 2023

arXiv:2302.07769 [pdf, other]

doi 10.1145/3593045

XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to… ▽ More Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search (NAS) approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive hardware-efficiency. Our experiments on crossbars with benchmark datasets (SVHN, CIFAR10 & CIFAR100) show upto ~8-16% improvement in the adversarial robustness of the searched Subnets against a baseline ResNet-18 model subjected to crossbar-aware adversarial training. We benchmark our robust Subnets for Energy-Delay-Area-Products (EDAPs) using the Neurosim tool and find that with additional hardware-efficiency driven optimizations, the Subnets attain ~1.5-1.6x lower EDAPs than ResNet-18 baseline. △ Less

Submitted 15 April, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: Accepted to ACM Transactions on Embedded Computing Systems in April 2023

Journal ref: ACM Transactions on Embedded Computing Systems (2023)

arXiv:2302.06746 [pdf, other]

Workload-Balanced Pruning for Sparse Spiking Neural Networks

Authors: Ruokai Yin, Youngeun Kim, Yuhang Li, Abhishek Moitra, Nitin Satpute, Anna Hambitzer, Priyadarshini Panda

Abstract: Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assig… ▽ More Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method. △ Less

Submitted 22 March, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 11 pages. Accepted to IEEE Transactions on Emerging Topics in Computational Intelligence (2024)

arXiv:2302.04712 [pdf, other]

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

Authors: Duy-Thanh Nguyen, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on… ▽ More With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on the notion of angles and magnitudes in the Euclidean space has attracted limited attention. This paper proposes DeepCAM, an inference accelerator built on two critical innovations to alleviate the computation time bottleneck of convolutional neural networks. The first innovation is an approximate dot-product built on computations in the Euclidean space that can replace addition and multiplication with simple bit-wise operations. The second innovation is a dynamic size content addressable memory-based (CAM-based) accelerator to perform bit-wise operations and accelerate the CNNs with a lower computation time. Our experiments on benchmark image recognition datasets demonstrate that DeepCAM is up to 523x and 3498x faster than Eyeriss and traditional CPUs like Intel Skylake, respectively. Furthermore, the energy consumed by our DeepCAM approach is 2.16x to 109x less compared to Eyeriss. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: Accepted to Design, Automation and Test in Europe (DATE) Conference, 2023

Journal ref: Design, Automation and Test in Europe (DATE) Conference, 2023

arXiv:2301.09519 [pdf, ps, other]

A New Approach to Learning Linear Dynamical Systems

Authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Morris Yau

Abstract: Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in… ▽ More Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in the 1960's. Towards these ends, we provide the first polynomial time algorithm for learning a linear dynamical system from a polynomial length trajectory up to polynomial error in the system parameters under essentially minimal assumptions: observability, controllability, and marginal stability. Our algorithm is built on a method of moments estimator to directly estimate Markov parameters from which the dynamics can be extracted. Furthermore, we provide statistical lower bounds when our observability and controllability assumptions are violated. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2210.12899 [pdf, other]

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Runcong Kuang, Gokul Krishnan, Yu Cao, Priyadarshini Panda

Abstract: SNNs are an active research domain towards energy efficient machine intelligence. Compared to conventional ANNs, SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing pl… ▽ More SNNs are an active research domain towards energy efficient machine intelligence. Compared to conventional ANNs, SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing platforms. Today, In-Memory Computing (IMC) architectures have been proposed to alleviate the "memory-wall bottleneck" prevalent in von-Neumann architectures. Although recent works have proposed IMC-based SNN hardware accelerators, the following have been overlooked- 1) the adverse effects of crossbar non-ideality on SNN performance due to repeated analog dot-product operations over multiple time-steps, 2) hardware overheads of essential SNN-specific components such as the LIF/IF and data communication modules. To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs. SpikeSim consists of a practical monolithic IMC architecture called SpikeFlow for map** SNNs. Additionally, the non-ideality computation engine (NICE) and energy-latency-area (ELA) engine performs hardware-realistic evaluation of SpikeFlow-mapped SNNs. Based on 65nm CMOS implementation and experiments on CIFAR10, CIFAR100 and TinyImagenet datasets, we find that the LIF/IF neuronal module has significant area contribution (>11% of the total hardware area). We propose SNN topological modifications leading to 1.24x and 10x reduction in the neuronal module's area and the overall energy-delay-product value, respectively. Furthermore, in this work, we perform a holistic comparison between IMC implemented ANN and SNNs and conclude that lower number of time-steps are the key to achieve higher throughput and energy-efficiency for SNNs compared to 4-bit ANNs. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: 14 pages, 22 figures

arXiv:2207.11903 [pdf, ps, other]

Minimax Rates for Robust Community Detection

Authors: Allen Liu, Ankur Moitra

Abstract: In this work, we study the problem of community detection in the stochastic block model with adversarial node corruptions. Our main result is an efficient algorithm that can tolerate an $ε$-fraction of corruptions and achieves error $O(ε) + e^{-\frac{C}{2} (1 \pm o(1))}$ where $C = (\sqrt{a} - \sqrt{b})^2$ is the signal-to-noise ratio and $a/n$ and $b/n$ are the inter-community and intra-community… ▽ More In this work, we study the problem of community detection in the stochastic block model with adversarial node corruptions. Our main result is an efficient algorithm that can tolerate an $ε$-fraction of corruptions and achieves error $O(ε) + e^{-\frac{C}{2} (1 \pm o(1))}$ where $C = (\sqrt{a} - \sqrt{b})^2$ is the signal-to-noise ratio and $a/n$ and $b/n$ are the inter-community and intra-community connection probabilities respectively. These bounds essentially match the minimax rates for the SBM without corruptions. We also give robust algorithms for $\mathbb{Z}_2$-synchronization. At the heart of our algorithm is a new semidefinite program that uses global information to robustly boost the accuracy of a rough clustering. Moreover, we show that our algorithms are doubly-robust in the sense that they work in an even more challenging noise model that mixes adversarial corruptions with unbounded monotone changes, from the semi-random model. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: To appear in FOCS 2022

arXiv:2207.02841 [pdf, other]

From algorithms to connectivity and back: finding a giant component in random k-SAT

Authors: Zongchen Chen, Nitya Mani, Ankur Moitra

Abstract: We take an algorithmic approach to studying the solution space geometry of relatively sparse random and bounded degree $k$-CNFs for large $k$. In the course of doing so, we establish that with high probability, a random $k$-CNF $Φ$ with $n$ variables and clause density $α= m/n \lesssim 2^{k/6}$ has a giant component of solutions that are connected in a graph where solutions are adjacent if they ha… ▽ More We take an algorithmic approach to studying the solution space geometry of relatively sparse random and bounded degree $k$-CNFs for large $k$. In the course of doing so, we establish that with high probability, a random $k$-CNF $Φ$ with $n$ variables and clause density $α= m/n \lesssim 2^{k/6}$ has a giant component of solutions that are connected in a graph where solutions are adjacent if they have Hamming distance $O_k(\log n)$ and that a similar result holds for bounded degree $k$-CNFs at similar densities. We are also able to deduce looseness results for random and bounded degree $k$-CNFs in a similar regime. Although our main motivation was understanding the geometry of the solution space, our methods have algorithmic implications. Towards that end, we construct an idealized block dynamics that samples solutions from a random $k$-CNF $Φ$ with density $α= m/n \lesssim 2^{k/52}$. We show this Markov chain can with high probability be implemented in polynomial time and by leveraging spectral independence, we also observe that it mixes relatively fast, giving a polynomial time algorithm to with high probability sample a uniformly random solution to a random $k$-CNF. Our work suggests that the natural route to pinning down when a giant component exists is to develop sharper algorithms for sampling solutions in random $k$-CNFs. △ Less

Submitted 15 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: 41 pages, 1 figure

MSC Class: 68W20; 68W25; 68W40; 68Q87 ACM Class: G.3; F.2.2

arXiv:2206.14754 [pdf, other]

Distilling Model Failures as Directions in Latent Space

Authors: Saachi Jain, Hannah Lawrence, Ankur Moitra, Aleksander Madry

Abstract: Existing methods for isolating hard subpopulations and spurious correlations in datasets often require human intervention. This can make these methods labor-intensive and dataset-specific. To address these shortcomings, we present a scalable method for automatically distilling a model's failure modes. Specifically, we harness linear classifiers to identify consistent error patterns, and, in turn,… ▽ More Existing methods for isolating hard subpopulations and spurious correlations in datasets often require human intervention. This can make these methods labor-intensive and dataset-specific. To address these shortcomings, we present a scalable method for automatically distilling a model's failure modes. Specifically, we harness linear classifiers to identify consistent error patterns, and, in turn, induce a natural representation of these failure modes as directions within the feature space. We demonstrate that this framework allows us to discover and automatically caption challenging subpopulations within the training dataset. Moreover, by combining our framework with off-the-shelf diffusion models, we can generate images that are especially challenging for the analyzed model, and thus can be used to perform synthetic data augmentation that helps remedy the model's failure modes. Code available at https://github.com/MadryLab/failure-directions △ Less

Submitted 2 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

arXiv:2206.09599 [pdf, other]

doi 10.1145/3531437.3539729

Examining the Robustness of Spiking Neural Networks on Non-ideal Memristive Crossbars

Authors: Abhiroop Bhattacharjee, Youngeun Kim, Abhishek Moitra, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) owing to their asynchronous, sparse, and binary information processing. To improve the energy-efficiency and throughput, SNNs can be implemented on memristive crossbars where Multiply-and-Accumulate (MAC) operations are realized in the analog domain using emerging Non-Volatile-Mem… ▽ More Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) owing to their asynchronous, sparse, and binary information processing. To improve the energy-efficiency and throughput, SNNs can be implemented on memristive crossbars where Multiply-and-Accumulate (MAC) operations are realized in the analog domain using emerging Non-Volatile-Memory (NVM) devices. Despite the compatibility of SNNs with memristive crossbars, there is little attention to study on the effect of intrinsic crossbar non-idealities and stochasticity on the performance of SNNs. In this paper, we conduct a comprehensive analysis of the robustness of SNNs on non-ideal crossbars. We examine SNNs trained via learning algorithms such as, surrogate gradient and ANN-SNN conversion. Our results show that repetitive crossbar computations across multiple time-steps induce error accumulation, resulting in a huge performance drop during SNN inference. We further show that SNNs trained with a smaller number of time-steps achieve better accuracy when deployed on memristive crossbars. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: Accepted in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), 2022

arXiv:2206.03446 [pdf, ps, other]

Learning in Observable POMDPs, without Computationally Intractable Oracles

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planni… ▽ More Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planning or estimation problem as a subroutine. In this work we develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations. Our techniques circumvent the more traditional approach of using the principle of optimism under uncertainty to promote exploration, and instead give a novel application of barycentric spanners to constructing policy covers. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2205.14284 [pdf, other]

Provably Auditing Ordinary Least Squares in Low Dimensions

Authors: Ankur Moitra, Dhruv Rohatgi

Abstract: Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be remov… ▽ More Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion, specifically meaning that the sign of a particular coefficient of the estimated regressor changes. However, besides the trivial exponential-time algorithm, the only approach for computing this metric is a greedy heuristic that lacks provable guarantees under reasonable, verifiable assumptions; the heuristic provides a loose upper bound on the stability and also cannot certify lower bounds on it. We show that in the low-dimensional regime where the number of covariates is a constant but the number of samples is large, there are efficient algorithms for provably estimating (a fractional version of) this metric. Applying our algorithms to the Boston Housing dataset, we exhibit regression analyses where we can estimate the stability up to a factor of $3$ better than the greedy heuristic, and analyses where we can certify stability to drop** even a majority of the samples. △ Less

Submitted 5 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 32 pages, 4 figures. Added acknowledgments/funding

arXiv:2204.05422 [pdf, other]

SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks

Authors: Ruokai Yin, Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. Recently, SNNs with backpropagation through time (BPTT) have achieved a higher accuracy result on image recognition tasks than other SNN training algorithms. Despite the success from the algorithm per… ▽ More Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. Recently, SNNs with backpropagation through time (BPTT) have achieved a higher accuracy result on image recognition tasks than other SNN training algorithms. Despite the success from the algorithm perspective, prior works neglect the evaluation of the hardware energy overheads of BPTT due to the lack of a hardware evaluation platform for this SNN training algorithm. Moreover, although SNNs have long been seen as an energy-efficient counterpart of ANNs, a quantitative comparison between the training cost of SNNs and ANNs is missing. To address the aforementioned issues, in this work, we introduce SATA (Sparsity-Aware Training Accelerator), a BPTT-based training accelerator for SNNs. The proposed SATA provides a simple and re-configurable systolic-based accelerator architecture, which makes it easy to analyze the training energy for BPTT-based SNN training algorithms. By utilizing the sparsity, SATA increases its computation energy efficiency by $5.58 \times$ compared to the one without using sparsity. Based on SATA, we show quantitative analyses of the energy efficiency of SNN training and compare the training cost of SNNs and ANNs. The results show that, on Eyeriss-like systolic-based architecture, SNNs consume $1.27\times$ more total energy with sparsities when compared to ANNs. We find that such high training energy cost is from time-repetitive convolution operations and data movements during backpropagation. Moreover, to propel the future SNN training algorithm design, we provide several observations on energy efficiency for different SNN-specific training parameters and propose an energy estimation framework for SNN training. Code for our framework is made publicly available. △ Less

Submitted 19 December, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 13 pages. Accepted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022)

arXiv:2204.05274 [pdf, other]

doi 10.1145/3489517.3530473

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Authors: Abhiroop Bhattacharjee, Yeshwanth Venkatesha, Abhishek Moitra, Priyadarshini Panda

Abstract: Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in… ▽ More Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)- CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ~3.48x memory-efficiency and ~2.4-3.1x energy-savings compared to conventional multi-task inference in Pipelined task mode. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted in Design Automation Conference (DAC), 2022

Journal ref: 59th Design Automation Conference (DAC), 2022

arXiv:2202.04271 [pdf, other]

Adversarial Detection without Model Information

Authors: Abhishek Moitra, Youngeun Kim, Priyadarshini Panda

Abstract: Prior state-of-the-art adversarial detection works are classifier model dependent, i.e., they require classifier model outputs and parameters for training the detector or during adversarial detection. This makes their detection approach classifier model specific. Furthermore, classifier model outputs and parameters might not always be accessible. To this end, we propose a classifier model independ… ▽ More Prior state-of-the-art adversarial detection works are classifier model dependent, i.e., they require classifier model outputs and parameters for training the detector or during adversarial detection. This makes their detection approach classifier model specific. Furthermore, classifier model outputs and parameters might not always be accessible. To this end, we propose a classifier model independent adversarial detection method using a simple energy function to distinguish between adversarial and natural inputs. We train a standalone detector independent of the classifier model, with a layer-wise energy separation (LES) training to increase the separation between natural and adversarial energies. With this, we perform energy distribution-based adversarial detection. Our method achieves comparable performance with state-of-the-art detection works (ROC-AUC > 0.9) across a wide range of gradient, score and gaussian noise attacks on CIFAR10, CIFAR100 and TinyImagenet datasets. Furthermore, compared to prior works, our detection approach is light-weight, requires less amount of training data (40% of the actual dataset) and is transferable across different datasets. For reproducibility, we provide layer-wise energy separation training code at https://github.com/Intelligent-Computing-Lab-Yale/Energy-Separation-Training △ Less

Submitted 5 April, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: This paper has 14 pages of content and 2 pages of references

arXiv:2202.03133 [pdf, other]

Rate Coding or Direct Coding: Which One is Better for Accurate, Robust, and Energy-efficient Spiking Neural Networks?

Authors: Youngeun Kim, Hyoungseob Park, Abhishek Moitra, Abhiroop Bhattacharjee, Yeshwanth Venkatesha, Priyadarshini Panda

Abstract: Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes. Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system as they show state-of-the-art performance on large-scale datasets. Despite their usage, there is… ▽ More Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes. Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system as they show state-of-the-art performance on large-scale datasets. Despite their usage, there is little attention to comparing these two coding schemes in a fair manner. In this paper, we conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency. First, we compare the performance of two coding techniques with various architectures and datasets. Then, we measure the robustness of the coding techniques on two adversarial attack methods. Finally, we compare the energy-efficiency of two coding schemes on a digital hardware platform. Our results show that direct coding can achieve better accuracy especially for a small number of timesteps. In contrast, rate coding shows better robustness to adversarial attacks owing to the non-differentiable spike generation process. Rate coding also yields higher energy-efficiency than direct coding which requires multi-bit precision for the first layer. Our study explores the characteristics of two codings, which is an important design consideration for building SNNs. The code is made available at https://github.com/Intelligent-Computing-Lab-Yale/Rate-vs-Direct. △ Less

Submitted 12 April, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: Accepted to ICASSP2022

arXiv:2201.04735 [pdf, ps, other]

Planning in Observable POMDPs in Quasipolynomial Time

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though the problem is known to be computationally hard. Almost… ▽ More Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though the problem is known to be computationally hard. Almost all existing planning algorithms either run in exponential time, lack provable performance guarantees, or require placing strong assumptions on the transition dynamics under every possible policy. In this work, we revisit the planning problem and ask: are there natural and well-motivated assumptions that make planning easy? Our main result is a quasipolynomial-time algorithm for planning in (one-step) observable POMDPs. Specifically, we assume that well-separated distributions on states lead to well-separated distributions on observations, and thus the observations are at least somewhat informative in each step. Crucially, this assumption places no restrictions on the transition dynamics of the POMDP; nevertheless, it implies that near-optimal policies admit quasi-succinct descriptions, which is not true in general (under standard hardness assumptions). Our analysis is based on new quantitative bounds for filter stability -- i.e. the rate at which an optimal filter for the latent state forgets its initialization. Furthermore, we prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis. △ Less

Submitted 23 March, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: 52 pages

arXiv:2112.06380 [pdf, ps, other]

Robust Voting Rules from Algorithmic Robust Statistics

Authors: Allen Liu, Ankur Moitra

Abstract: Maximum likelihood estimation furnishes powerful insights into voting theory, and the design of voting rules. However the MLE can usually be badly corrupted by a single outlying sample. This means that a single voter or a group of colluding voters can vote strategically and drastically affect the outcome. Motivated by recent progress in algorithmic robust statistics, we revisit the fundamental pro… ▽ More Maximum likelihood estimation furnishes powerful insights into voting theory, and the design of voting rules. However the MLE can usually be badly corrupted by a single outlying sample. This means that a single voter or a group of colluding voters can vote strategically and drastically affect the outcome. Motivated by recent progress in algorithmic robust statistics, we revisit the fundamental problem of estimating the central ranking in a Mallows model, but ask for an estimator that is provably robust, unlike the MLE. Our main result is an efficiently computable estimator that achieves nearly optimal robustness guarantees. In particular the robustness guarantees are dimension-independent in the sense that our overall accuracy does not depend on the number of alternatives being ranked. As an immediate consequence, we show that while the landmark Gibbard-Satterthwaite theorem tells us a strong impossiblity result about designing strategy-proof voting rules, there are quantitatively strong ways to protect against large coalitions if we assume that the remaining voters voters are honest and their preferences are sampled from a Mallows model. Our work also makes technical contributions to algorithmic robust statistics by designing new spectral filtering techniques that can exploit the intricate combinatorial dependencies in the Mallows model. △ Less

Submitted 16 July, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

arXiv:2111.06395 [pdf, ps, other]

Kalman Filtering with Adversarial Corruptions

Authors: Sitan Chen, Frederic Koehler, Ankur Moitra, Morris Yau

Abstract: Here we revisit the classic problem of linear quadratic estimation, i.e. estimating the trajectory of a linear dynamical system from noisy measurements. The celebrated Kalman filter gives an optimal estimator when the measurement noise is Gaussian, but is widely known to break down when one deviates from this assumption, e.g. when the noise is heavy-tailed. Many ad hoc heuristics have been employe… ▽ More Here we revisit the classic problem of linear quadratic estimation, i.e. estimating the trajectory of a linear dynamical system from noisy measurements. The celebrated Kalman filter gives an optimal estimator when the measurement noise is Gaussian, but is widely known to break down when one deviates from this assumption, e.g. when the noise is heavy-tailed. Many ad hoc heuristics have been employed in practice for dealing with outliers. In a pioneering work, Schick and Mitter gave provable guarantees when the measurement noise is a known infinitesimal perturbation of a Gaussian and raised the important question of whether one can get similar guarantees for large and unknown perturbations. In this work we give a truly robust filter: we give the first strong provable guarantees for linear quadratic estimation when even a constant fraction of measurements have been adversarially corrupted. This framework can model heavy-tailed and even non-stationary noise processes. Our algorithm robustifies the Kalman filter in the sense that it competes with the optimal algorithm that knows the locations of the corruptions. Our work is in a challenging Bayesian setting where the number of measurements scales with the complexity of what we need to estimate. Moreover, in linear dynamical systems past information decays over time. We develop a suite of new techniques to robustly extract information across different time steps and over varying time scales. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 57 pages, comments welcome

arXiv:2110.13052 [pdf, ps, other]

Can Q-Learning be Improved with Advice?

Authors: Noah Golowich, Ankur Moitra

Abstract: Despite rapid progress in theoretical reinforcement learning (RL) over the last few years, most of the known guarantees are worst-case in nature, failing to take advantage of structure that may be known a priori about a given RL problem at hand. In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumve… ▽ More Despite rapid progress in theoretical reinforcement learning (RL) over the last few years, most of the known guarantees are worst-case in nature, failing to take advantage of structure that may be known a priori about a given RL problem at hand. In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal $Q$-value function, is given to the algorithm. We show that when the predictions about the optimal $Q$-value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate. This improvement holds for both uniform regret bounds and gap-based ones. Further, we are able to achieve this property with an algorithm that achieves sublinear regret when given arbitrary predictions (i.e., even those which are not a distillation). Our work extends a recent line of work on algorithms with predictions, which has typically focused on simple online problems such as caching and scheduling, to the more complex and general problem of reinforcement learning. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2106.12021 [pdf, other]

doi 10.1109/TCSI.2021.3110487

DetectX -- Adversarial Input Detection using Current Signatures in Memristive XBar Arrays

Authors: Abhishek Moitra, Priyadarshini Panda

Abstract: Adversarial input detection has emerged as a prominent technique to harden Deep Neural Networks(DNNs) against adversarial attacks. Most prior works use neural network-based detectors or complex statistical analysis for adversarial detection. These approaches are computationally intensive and vulnerable to adversarial attacks. To this end, we propose DetectX - a hardware friendly adversarial detect… ▽ More Adversarial input detection has emerged as a prominent technique to harden Deep Neural Networks(DNNs) against adversarial attacks. Most prior works use neural network-based detectors or complex statistical analysis for adversarial detection. These approaches are computationally intensive and vulnerable to adversarial attacks. To this end, we propose DetectX - a hardware friendly adversarial detection mechanism using hardware signatures like Sum of column Currents (SoI) in memristive crossbars (XBar). We show that adversarial inputs have higher SoI compared to clean inputs. However, the difference is too small for reliable adversarial detection. Hence, we propose a dual-phase training methodology: Phase1 training is geared towards increasing the separation between clean and adversarial SoIs; Phase2 training improves the overall robustness against different strengths of adversarial attacks. For hardware-based adversarial detection, we implement the DetectX module using 32nm CMOS circuits and integrate it with a Neurosim-like analog crossbar architecture. We perform hardware evaluation of the Neurosim+DetectX system on the Neurosim platform using datasets-CIFAR10(VGG8), CIFAR100(VGG16) and TinyImagenet(ResNet18). Our experiments show that DetectX is 10x-25x more energy efficient and immune to dynamic adversarial attacks compared to previous state-of-the-art works. Moreover, we achieve high detection performance (ROC-AUC > 0.95) for strong white-box and black-box attacks. The code has been released at https://github.com/Intelligent-Computing-Lab-Yale/DetectX △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: 14 pages, 13 figures

Journal ref: IEEE Transactions on Circuits and Systems 1- Regular Papers, 2021

arXiv:2106.08393 [pdf, ps, other]

Spoofing Generalization: When Can't You Trust Proprietary Models?

Authors: Ankur Moitra, Elchanan Mossel, Colin Sandon

Abstract: In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no po… ▽ More In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent difficulties of blindly trusting large proprietary models or data. △ Less

Submitted 23 March, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.02774 [pdf, ps, other]

Robust Model Selection and Nearly-Proper Learning for GMMs

Authors: Jerry Li, Allen Liu, Ankur Moitra

Abstract: In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate ad… ▽ More In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/ε)$ samples from a distribution that is $ε$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(ε)$ in $\textsf{poly}(k/ε)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e.g. $k/ε^2$ components for kernel density estimates) or run in time exponential in $k$. Moreover, by adapting our techniques we obtain similar results for reconstructing Fourier-sparse signals. △ Less

Submitted 22 April, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2106.02680 [pdf, ps, other]

Algorithms from Invariants: Smoothed Analysis of Orbit Recovery over $SO(3)$

Authors: Allen Liu, Ankur Moitra

Abstract: In this work we study orbit recovery over $SO(3)$, where the goal is to recover a function on the sphere from noisy, randomly rotated copies of it. We assume that the function is a linear combination of low-degree spherical harmonics. This is a natural abstraction for the problem of recovering the three-dimensional structure of a molecule through cryo-electron tomography. For provably learning the… ▽ More In this work we study orbit recovery over $SO(3)$, where the goal is to recover a function on the sphere from noisy, randomly rotated copies of it. We assume that the function is a linear combination of low-degree spherical harmonics. This is a natural abstraction for the problem of recovering the three-dimensional structure of a molecule through cryo-electron tomography. For provably learning the parameters of a generative model, the method of moments is the standard workhorse of theoretical machine learning. It turns out that there is a natural incarnation of the method of moments for orbit recovery based on invariant theory. Bandeira et al. [BBSK+18] used invariant theory to give tight bounds on the sample complexity in terms of the noise level. However many of the key challenges remain: Can we prove bounds on the sample complexity that are polynomial in $n$, the dimension of the signal? The bounds in [BBSK+18] hide constants that have an unspecified dependence on $n$ and only hold in the limit as $σ^2 \rightarrow \infty$ where $σ^2$ is the variance of the noise. Moreover can we give efficient algorithms? We revisit these challenges from the perspective of smoothed analysis, where we assume that the coefficients of the signal, in the basis of spherical harmonics, are subject to small random perturbations. Our main result is a quasi-polynomial time algorithm for orbit recovery over $SO(3)$ in this model. Our approach is based on frequency marching, which sequentially solves linear systems to find higher degree coefficients. Our main technical contribution is to show that these linear systems have unique solutions, are well-conditioned, and that the error can be made to compound over at most a logarithmic number of rounds. We believe that our work takes an important first step towards uncovering the algorithmic implications of invariant theory. △ Less

Submitted 1 May, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.04003 [pdf, other]

doi 10.23919/DATE51398.2021.9474001

Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks

Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energy-spending component for DNNs, hard… ▽ More With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energy-spending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this paper, we show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaledVdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization. △ Less

Submitted 9 May, 2021; originally announced May 2021.

Comments: 6 pages, 8 figures, 3 tables; Accepted in DATE 2021 conference. arXiv admin note: text overlap with arXiv:2008.11298

Journal ref: 2021 Design, Automation and Test in Europe (DATE) Conference

arXiv:2104.09665 [pdf, ps, other]

Learning GMMs with Nearly Optimal Robustness Guarantees

Authors: Allen Liu, Ankur Moitra

Abstract: In this work we solve the problem of robustly learning a high-dimensional Gaussian mixture model with $k$ components from $ε$-corrupted samples up to accuracy $\widetilde{O}(ε)$ in total variation distance for any constant $k$ and with mild assumptions on the mixture. This robustness guarantee is optimal up to polylogarithmic factors. The main challenge is that most earlier works rely on learning… ▽ More In this work we solve the problem of robustly learning a high-dimensional Gaussian mixture model with $k$ components from $ε$-corrupted samples up to accuracy $\widetilde{O}(ε)$ in total variation distance for any constant $k$ and with mild assumptions on the mixture. This robustness guarantee is optimal up to polylogarithmic factors. The main challenge is that most earlier works rely on learning individual components in the mixture, but this is impossible in our setting, at least for the types of strong robustness guarantees we are aiming for. Instead we introduce a new framework which we call {\em strong observability} that gives us a route to circumvent this obstacle. △ Less

Submitted 14 November, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Showing 1–50 of 105 results for author: Moitra, A