Search | arXiv e-print repository

Energy Transformer

Authors: Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov

Abstract: Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not st… ▽ More Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks. △ Less

Submitted 31 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2006.11979 [pdf, other]

ELF: An Early-Exiting Framework for Long-Tailed Classification

Authors: Rahul Duggal, Scott Freitas, Sunny Dhamnani, Duen Horng Chau, Jimeng Sun

Abstract: The natural world often follows a long-tailed data distribution where only a few classes account for most of the examples. This long-tail causes classifiers to overfit to the majority class. To mitigate this, prior solutions commonly adopt class rebalancing strategies such as data resampling and loss resha**. However, by treating each example within a class equally, these methods fail to account… ▽ More The natural world often follows a long-tailed data distribution where only a few classes account for most of the examples. This long-tail causes classifiers to overfit to the majority class. To mitigate this, prior solutions commonly adopt class rebalancing strategies such as data resampling and loss resha**. However, by treating each example within a class equally, these methods fail to account for the important notion of example hardness, i.e., within each class some examples are easier to classify than others. To incorporate this notion of hardness into the learning process, we propose the EarLy-exiting Framework(ELF). During training, ELF learns to early-exit easy examples through auxiliary branches attached to a backbone network. This offers a dual benefit-(1) the neural network increasingly focuses on hard examples, since they contribute more to the overall network loss; and (2) it frees up additional model capacity to distinguish difficult examples. Experimental results on two large-scale datasets, ImageNet LT and iNaturalist'18, demonstrate that ELF can improve state-of-the-art accuracy by more than 3 percent. This comes with the additional benefit of reducing up to 20 percent of inference time FLOPS. ELF is complementary to prior work and can naturally integrate with a variety of existing methods to tackle the challenge of long-tailed distributions. △ Less

Submitted 13 September, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

arXiv:2001.11363 [pdf, other]

doi 10.1145/3366423.3380241

REST: Robust and Efficient Neural Networks for Sleep Monitoring in the Wild

Authors: Rahul Duggal, Scott Freitas, Cao Xiao, Duen Horng Chau, Jimeng Sun

Abstract: In recent years, significant attention has been devoted towards integrating deep learning technologies in the healthcare domain. However, to safely and practically deploy deep learning models for home health monitoring, two significant challenges must be addressed: the models should be (1) robust against noise; and (2) compact and energy-efficient. We propose REST, a new method that simultaneously… ▽ More In recent years, significant attention has been devoted towards integrating deep learning technologies in the healthcare domain. However, to safely and practically deploy deep learning models for home health monitoring, two significant challenges must be addressed: the models should be (1) robust against noise; and (2) compact and energy-efficient. We propose REST, a new method that simultaneously tackles both issues via 1) adversarial training and controlling the Lipschitz constant of the neural network through spectral regularization while 2) enabling neural network compression through sparsity regularization. We demonstrate that REST produces highly-robust and efficient models that substantially outperform the original full-sized models in the presence of noise. For the sleep staging task over single-channel electroencephalogram (EEG), the REST model achieves a macro-F1 score of 0.67 vs. 0.39 achieved by a state-of-the-art model in the presence of Gaussian noise while obtaining 19x parameter reduction and 15x MFLOPS reduction on two large, real-world EEG datasets. By deploying these models to an Android application on a smartphone, we quantitatively observe that REST allows models to achieve up to 17x energy reduction and 9x faster inference. We open-source the code repository with this paper: https://github.com/duggalrahul/REST. △ Less

Submitted 29 January, 2020; originally announced January 2020.

Comments: Accepted to WWW 2020

arXiv:2001.07769 [pdf, other]

Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning

Authors: Nilaksh Das, Haekyu Park, Zijie J. Wang, Fred Hohman, Robert Firstman, Emily Rogers, Duen Horng Chau

Abstract: Deep neural networks (DNNs) are increasingly powering high-stakes applications such as autonomous cars and healthcare; however, DNNs are often treated as "black boxes" in such applications. Recent research has also revealed that DNNs are highly vulnerable to adversarial attacks, raising serious concerns over deploying DNNs in the real world. To overcome these deficiencies, we are develo** Massif… ▽ More Deep neural networks (DNNs) are increasingly powering high-stakes applications such as autonomous cars and healthcare; however, DNNs are often treated as "black boxes" in such applications. Recent research has also revealed that DNNs are highly vulnerable to adversarial attacks, raising serious concerns over deploying DNNs in the real world. To overcome these deficiencies, we are develo** Massif, an interactive tool for deciphering adversarial attacks. Massif identifies and interactively visualizes neurons and their connections inside a DNN that are strongly activated or suppressed by an adversarial attack. Massif provides both a high-level, interpretable overview of the effect of an attack on a DNN, and a low-level, detailed description of the affected neurons. These tightly coupled views in Massif help people better understand which input features are most vulnerable or important for correct predictions. △ Less

Submitted 16 February, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

Comments: Appear in ACM Conference on Human Factors in Computing Systems (CHI) Late-Breaking Works 2020, 7 pages

arXiv:1904.05419 [pdf, other]

doi 10.1109/VAST47406.2019.8986948

FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning

Authors: Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, Duen Horng Chau

Abstract: The growing capability and accessibility of machine learning has led to its application to many real-world domains and data about people. Despite the benefits algorithmic systems may bring, models can reflect, inject, or exacerbate implicit and explicit societal biases into their outputs, disadvantaging certain demographic subgroups. Discovering which biases a machine learning model has introduced… ▽ More The growing capability and accessibility of machine learning has led to its application to many real-world domains and data about people. Despite the benefits algorithmic systems may bring, models can reflect, inject, or exacerbate implicit and explicit societal biases into their outputs, disadvantaging certain demographic subgroups. Discovering which biases a machine learning model has introduced is a great challenge, due to the numerous definitions of fairness and the large number of potentially impacted subgroups. We present FairVis, a mixed-initiative visual analytics system that integrates a novel subgroup discovery technique for users to audit the fairness of machine learning models. Through FairVis, users can apply domain knowledge to generate and investigate known subgroups, and explore suggested and similar subgroups. FairVis' coordinated views enable users to explore a high-level overview of subgroup performance and subsequently drill down into detailed investigation of specific subgroups. We show how FairVis helps to discover biases in two real datasets used in predicting income and recidivism. As a visual analytics system devoted to discovering bias in machine learning, FairVis demonstrates how interactive visualization may help data scientists and the general public understand and create more equitable algorithmic systems. △ Less

Submitted 1 September, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted as a VAST conference paper to IEEE VIS'19

arXiv:1902.00541 [pdf, other]

The Efficacy of SHIELD under Different Threat Models

Authors: Cory Cornelius, Nilaksh Das, Shang-Tse Chen, Li Chen, Michael E. Kounavis, Duen Horng Chau

Abstract: In this appraisal paper, we evaluate the efficacy of SHIELD, a compression-based defense framework for countering adversarial attacks on image classification models, which was published at KDD 2018. Here, we consider alternative threat models not studied in the original work, where we assume that an adaptive adversary is aware of the ensemble defense approach, the defensive pre-processing, and the… ▽ More In this appraisal paper, we evaluate the efficacy of SHIELD, a compression-based defense framework for countering adversarial attacks on image classification models, which was published at KDD 2018. Here, we consider alternative threat models not studied in the original work, where we assume that an adaptive adversary is aware of the ensemble defense approach, the defensive pre-processing, and the architecture and weights of the models used in the ensemble. We define scenarios with varying levels of threat and empirically analyze the proposed defense by varying the degree of information available to the attacker, spanning from a full white-box attack to the gray-box threat model described in the original work. To evaluate the robustness of the defense against an adaptive attacker, we consider the targeted-attack success rate of the Projected Gradient Descent (PGD) attack, which is a strong gradient-based adversarial attack proposed in adversarial machine learning research. We also experiment with training the SHIELD ensemble from scratch, which is different from re-training using a pre-trained model as done in the original work. We find that the targeted PGD attack has a success rate of 64.3% against the original SHIELD ensemble in the full white box scenario, but this drops to 48.9% if the models used in the ensemble are trained from scratch instead of being retrained. Our experiments further reveal that an ensemble whose models are re-trained indeed have higher correlation in the cosine similarity space, and models that are trained from scratch are less vulnerable to targeted attacks in the white-box and gray-box scenarios. △ Less

Submitted 2 August, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: Appraisal paper of existing method accepted for oral presentation at KDD LEMINCS 2019

arXiv:1809.01587 [pdf, other]

doi 10.1109/TVCG.2018.2864500

GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

Authors: Minsuk Kahng, Nikhil Thorat, Duen Horng Chau, Fernanda Viégas, Martin Wattenberg

Abstract: Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology. While visual and interactive approaches have been successfully developed to help people more easily learn deep learning, most existing tools focus on simpler models. In this work, we present GAN Lab, the first interactive visualization tool designed fo… ▽ More Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology. While visual and interactive approaches have been successfully developed to help people more easily learn deep learning, most existing tools focus on simpler models. In this work, we present GAN Lab, the first interactive visualization tool designed for non-experts to learn and experiment with Generative Adversarial Networks (GANs), a popular class of complex deep learning models. With GAN Lab, users can interactively train generative models and visualize the dynamic training process's intermediate results. GAN Lab tightly integrates an model overview graph that summarizes GAN's structure, and a layered distributions view that helps users interpret the interplay between submodels. GAN Lab introduces new interactive experimentation features for learning complex deep learning models, such as step-by-step training at multiple levels of abstraction for understanding intricate training dynamics. Implemented using TensorFlow.js, GAN Lab is accessible to anyone via modern web browsers, without the need for installation or specialized hardware, overcoming a major practical challenge in deploying interactive tools for deep learning. △ Less

Submitted 5 September, 2018; originally announced September 2018.

Comments: This paper will be published in the IEEE Transactions on Visualization and Computer Graphics, 25(1), January 2019, and presented at IEEE VAST 2018

arXiv:1804.05810 [pdf, other]

doi 10.1007/978-3-030-10925-7_4

ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector

Authors: Shang-Tse Chen, Cory Cornelius, Jason Martin, Duen Horng Chau

Abstract: Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detecto… ▽ More Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems. △ Less

Submitted 30 April, 2019; v1 submitted 16 April, 2018; originally announced April 2018.

Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 52-68, 2018

arXiv:1801.06889 [pdf, other]

Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers

Authors: Fred Hohman, Minsuk Kahng, Robert Pienta, Duen Horng Chau

Abstract: Deep learning has recently seen rapid development and received significant attention due to its state-of-the-art performance on previously-thought hard problems. However, because of the internal complexity and nonlinear structure of deep neural networks, the underlying decision making processes for why these models are achieving such performance are challenging and sometimes mystifying to interpre… ▽ More Deep learning has recently seen rapid development and received significant attention due to its state-of-the-art performance on previously-thought hard problems. However, because of the internal complexity and nonlinear structure of deep neural networks, the underlying decision making processes for why these models are achieving such performance are challenging and sometimes mystifying to interpret. As deep learning spreads across domains, it is of paramount importance that we equip users of deep learning with tools for understanding when a model works correctly, when it fails, and ultimately how to improve its performance. Standardized toolkits for building neural networks have helped democratize deep learning; visual analytics systems have now been developed to support model explanation, interpretation, debugging, and improvement. We present a survey of the role of visual analytics in deep learning research, which highlights its short yet impactful history and thoroughly summarizes the state-of-the-art using a human-centered interrogative framework, focusing on the Five W's and How (Why, Who, What, How, When, and Where). We conclude by highlighting research directions and open research problems. This survey helps researchers and practitioners in both visual analytics and deep learning to quickly learn key aspects of this young and rapidly growing body of research, whose impact spans a diverse range of domains. △ Less

Submitted 14 May, 2018; v1 submitted 21 January, 2018; originally announced January 2018.

Comments: Under review for IEEE Transactions on Visualization and Computer Graphics (TVCG)

ACM Class: H.5.2; I.5.1.d; I.6.9.c; I.6.9.f; I.2.6.g

arXiv:1704.01942 [pdf, other]

ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models

Authors: Minsuk Kahng, Pierre Y. Andrews, Aditya Kalro, Duen Horng Chau

Abstract: While deep learning models have achieved state-of-the-art accuracies for many prediction tasks, understanding these models remains a challenge. Despite the recent interest in develo** visual tools to help users interpret deep learning models, the complexity and wide variety of models deployed in industry, and the large-scale datasets that they used, pose unique design challenges that are inadequ… ▽ More While deep learning models have achieved state-of-the-art accuracies for many prediction tasks, understanding these models remains a challenge. Despite the recent interest in develo** visual tools to help users interpret deep learning models, the complexity and wide variety of models deployed in industry, and the large-scale datasets that they used, pose unique design challenges that are inadequately addressed by existing work. Through participatory design sessions with over 15 researchers and engineers at Facebook, we have developed, deployed, and iteratively improved ActiVis, an interactive visualization system for interpreting large-scale deep learning models and results. By tightly integrating multiple coordinated views, such as a computation graph overview of the model architecture, and a neuron activation view for pattern discovery and comparison, users can explore complex deep neural network models at both the instance- and subset-level. ActiVis has been deployed on Facebook's machine learning platform. We present case studies with Facebook researchers and engineers, and usage scenarios of how ActiVis may work with different models. △ Less

Submitted 8 August, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

Comments: Will be presented at IEEE VAST 2017 and published in IEEE Transactions on Visualization and Computer Graphics, 24(1)

arXiv:1506.06318 [pdf, ps, other]

Communication Efficient Distributed Agnostic Boosting

Authors: Shang-Tse Chen, Maria-Florina Balcan, Duen Horng Chau

Abstract: We consider the problem of learning from distributed data in the agnostic setting, i.e., in the presence of arbitrary forms of noise. Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient. This improves significantly over prior works that were eit… ▽ More We consider the problem of learning from distributed data in the agnostic setting, i.e., in the presence of arbitrary forms of noise. Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient. This improves significantly over prior works that were either communication efficient only in noise-free scenarios or computationally prohibitive. Empirical results on large synthetic and real-world datasets demonstrate the effectiveness and scalability of the proposed approach. △ Less

Submitted 18 November, 2016; v1 submitted 21 June, 2015; originally announced June 2015.

Showing 1–11 of 11 results for author: Chau, D H