-
Charting the Right Manifold: Manifold Mixup for Few-shot Learning
Authors:
Puneet Mangla,
Mayank Singh,
Abhishek Sinha,
Nupur Kumari,
Vineeth N Balasubramanian,
Balaji Krishnamurthy
Abstract:
Few-shot learning algorithms aim to learn model parameters capable of adapting to unseen classes with the help of only a few labeled examples. A recent regularization technique - Manifold Mixup focuses on learning a general-purpose representation, robust to small changes in the data distribution. Since the goal of few-shot learning is closely linked to robust representation learning, we study Mani…
▽ More
Few-shot learning algorithms aim to learn model parameters capable of adapting to unseen classes with the help of only a few labeled examples. A recent regularization technique - Manifold Mixup focuses on learning a general-purpose representation, robust to small changes in the data distribution. Since the goal of few-shot learning is closely linked to robust representation learning, we study Manifold Mixup in this problem setting. Self-supervised learning is another technique that learns semantically meaningful features, using only the inherent structure of the data. This work investigates the role of learning relevant feature manifold for few-shot tasks using self-supervision and regularization techniques. We observe that regularizing the feature manifold, enriched via self-supervised techniques, with Manifold Mixup significantly improves few-shot learning performance. We show that our proposed method S2M2 beats the current state-of-the-art accuracy on standard few-shot learning datasets like CIFAR-FS, CUB, mini-ImageNet and tiered-ImageNet by 3-8 %. Through extensive experimentation, we show that the features learned using our approach generalize to complex few-shot evaluation tasks, cross-domain scenarios and are robust against slight changes to data distribution.
△ Less
Submitted 18 January, 2020; v1 submitted 28 July, 2019;
originally announced July 2019.
-
Extensible and Scalable Adaptive Sampling on Supercomputers
Authors:
Eugen Hruska,
Vivekanandan Balasubramanian,
Hyungro Lee,
Shantenu Jha,
Cecilia Clementi
Abstract:
The accurate sampling of protein dynamics is an ongoing challenge despite the utilization of High-Performance Computers (HPC) systems. Utilizing only "brute force" MD simulations requires an unacceptably long time to solution. Adaptive sampling methods allow a more effective sampling of protein dynamics than standard MD simulations. Depending on the restarting strategy the speed up can be more tha…
▽ More
The accurate sampling of protein dynamics is an ongoing challenge despite the utilization of High-Performance Computers (HPC) systems. Utilizing only "brute force" MD simulations requires an unacceptably long time to solution. Adaptive sampling methods allow a more effective sampling of protein dynamics than standard MD simulations. Depending on the restarting strategy the speed up can be more than one order of magnitude. One challenge limiting the utilization of adaptive sampling by domain experts is the relatively high complexity of efficiently running adaptive sampling on HPC systems. We discuss how the ExTASY framework can set up new adaptive sampling strategies, and reliably execute resulting workflows at scale on HPC platforms. Here the folding dynamics of four proteins are predicted with no a priori information.
△ Less
Submitted 24 September, 2020; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Reconfiguration Algorithms for High Precision Communications in Time Sensitive Networks: Time-Aware Shaper Configuration with IEEE 802.1Qcc (Extended Version)
Authors:
Ahmed Nasrallah,
Venkatraman Balasubramanian,
Akhilesh Thyagaturu,
Martin Reisslein,
Hesham ElBakoury
Abstract:
As new networking paradigms emerge for different networking applications, e.g., cyber-physical systems, and different services are handled under a converged data link technology, e.g., Ethernet, certain applications with mission critical traffic cannot coexist on the same physical networking infrastructure using traditional Ethernet packet-switched networking protocols. The IEEE 802.1Q Time Sensit…
▽ More
As new networking paradigms emerge for different networking applications, e.g., cyber-physical systems, and different services are handled under a converged data link technology, e.g., Ethernet, certain applications with mission critical traffic cannot coexist on the same physical networking infrastructure using traditional Ethernet packet-switched networking protocols. The IEEE 802.1Q Time Sensitive Networking (TSN) task group is develo** protocol standards to provide deterministic properties on Ethernet based packet-switched networks. In particular, the IEEE 802.1Qcc, centralized management and control, and the IEEE 802.1Qbv, Time-Aware Shaper, can be used to manage and control scheduled traffic streams with periodic properties along with best-effort traffic on the same network infrastructure. In this paper, we investigate the effects of using the IEEE 802.1Qcc management protocol to accurately and precisely configure TAS enabled switches (with transmission windows governed by gate control lists (GCLs) with gate control entries (GCEs)) ensuring ultra-low latency, zero packet loss, and minimal jitter for scheduled TSN traffic. We examine both a centralized network/distributed user model (hybrid model) and a fully-distributed (decentralized) 802.1Qcc model on a typical industrial control network with the goal of maximizing scheduled traffic streams.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Submodular Batch Selection for Training Deep Neural Networks
Authors:
K J Joseph,
Vamshi Teja R,
Krishnakant Singh,
Vineeth N Balasubramanian
Abstract:
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to…
▽ More
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Automatic estimation of heading date of paddy rice using deep learning
Authors:
Sai Vikas Desai,
Vineeth N Balasubramanian,
Tokihiro Fukatsu,
Seishi Ninomiya,
Wei Guo
Abstract:
Accurate estimation of heading date of paddy rice greatly helps the breeders to understand the adaptability of different crop varieties in a given location. The heading date also plays a vital role in determining grain yield for research experiments. Visual examination of the crop is laborious and time consuming. Therefore, quick and precise estimation of heading date of paddy rice is highly essen…
▽ More
Accurate estimation of heading date of paddy rice greatly helps the breeders to understand the adaptability of different crop varieties in a given location. The heading date also plays a vital role in determining grain yield for research experiments. Visual examination of the crop is laborious and time consuming. Therefore, quick and precise estimation of heading date of paddy rice is highly essential. In this work, we propose a simple pipeline to detect regions containing flowering panicles from ground level RGB images of paddy rice. Given a fixed region size for an image, the number of regions containing flowering panicles is directly proportional to the number of flowering panicles present. Consequently, we use the flowering panicle region counts to estimate the heading date of the crop. The method is based on image classification using Convolutional Neural Networks (CNNs). We evaluated the performance of our algorithm on five time series image sequences of three different varieties of rice crops. When compared to the previous work on this dataset, the accuracy and general versatility of the method has been improved and heading date has been estimated with a mean absolute error of less than 1 day.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery
Authors:
Chaitanya Devaguptapu,
Ninad Akolekar,
Manuj M Sharma,
Vineeth N Balasubramanian
Abstract:
Can we improve detection in the thermal domain by borrowing features from rich domains like visual RGB? In this paper, we propose a pseudo-multimodal object detector trained on natural image domain data to help improve the performance of object detection in thermal images. We assume access to a large-scale dataset in the visual RGB domain and relatively smaller dataset (in terms of instances) in t…
▽ More
Can we improve detection in the thermal domain by borrowing features from rich domains like visual RGB? In this paper, we propose a pseudo-multimodal object detector trained on natural image domain data to help improve the performance of object detection in thermal images. We assume access to a large-scale dataset in the visual RGB domain and relatively smaller dataset (in terms of instances) in the thermal domain, as is common today. We propose the use of well-known image-to-image translation frameworks to generate pseudo-RGB equivalents of a given thermal image and then use a multi-modal architecture for object detection in the thermal image. We show that our framework outperforms existing benchmarks without the explicit need for paired training examples from the two domains. We also show that our framework has the ability to learn with less data from thermal domain when using our approach. Our code and pre-trained models are made available at https://github.com/tdchaitanya/MMTOD
△ Less
Submitted 15 July, 2020; v1 submitted 21 May, 2019;
originally announced May 2019.
-
TSN Algorithms for Large Scale Networks: A Survey and Conceptual Comparison
Authors:
Ahmed Nasrallah,
Venkatraman Balasubramanian,
Akhilesh Thyagaturu,
Martin Reisslein,
Hesham ElBakoury
Abstract:
This paper provides a comprehensive survey of queueing and scheduling mechanisms for supporting large scale deterministic networks (LDNs). The survey finds that extensive mechanism design research and standards development for LDNs has been conducted over the past few years. However, these mechanism design studies have not been followed up with a comprehensive rigorous evaluation. The main outcome…
▽ More
This paper provides a comprehensive survey of queueing and scheduling mechanisms for supporting large scale deterministic networks (LDNs). The survey finds that extensive mechanism design research and standards development for LDNs has been conducted over the past few years. However, these mechanism design studies have not been followed up with a comprehensive rigorous evaluation. The main outcome of this survey is a clear organization of the various research and standardization efforts towards queueing and scheduling mechanisms for LDNs as well as the identification of the main strands of mechanism development and their interdependencies. Based on this survey, it appears urgent to conduct a comprehensive rigorous simulation study of the main strands of mechanisms.
△ Less
Submitted 19 June, 2019; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Quantum Complexity of Time Evolution with Chaotic Hamiltonians
Authors:
Vijay Balasubramanian,
Matthew DeCross,
Arjun Kar,
Onkar Parrikar
Abstract:
We study the quantum complexity of time evolution in large-$N$ chaotic systems, with the SYK model as our main example. This complexity is expected to increase linearly for exponential time prior to saturating at its maximum value, and is related to the length of minimal geodesics on the manifold of unitary operators that act on Hilbert space. Using the Euler-Arnold formalism, we demonstrate that…
▽ More
We study the quantum complexity of time evolution in large-$N$ chaotic systems, with the SYK model as our main example. This complexity is expected to increase linearly for exponential time prior to saturating at its maximum value, and is related to the length of minimal geodesics on the manifold of unitary operators that act on Hilbert space. Using the Euler-Arnold formalism, we demonstrate that there is always a geodesic between the identity and the time evolution operator $e^{-iHt}$ whose length grows linearly with time. This geodesic is minimal until there is an obstruction to its minimality, after which it can fail to be a minimum either locally or globally. We identify a criterion - the Eigenstate Complexity Hypothesis (ECH) - which bounds the overlap between off-diagonal energy eigenstate projectors and the $k$-local operators of the theory, and use it to show that the linear geodesic will at least be a local minimum for exponential time. We show numerically that the large-$N$ SYK model (which is chaotic) satisfies ECH and thus has no local obstructions to linear growth of complexity for exponential time, as expected from holographic duality. In contrast, we also study the case with $N=2$ fermions (which is integrable) and find short-time linear complexity growth followed by oscillations. Our analysis relates complexity to familiar properties of physical theories like their spectra and the structure of energy eigenstates and has implications for the hypothesized computational complexity class separations PSPACE $\nsubseteq$ BQP/poly and PSPACE $\nsubseteq$ BQSUBEXP/subexp, and the "fast-forwarding" of quantum Hamiltonians.
△ Less
Submitted 3 June, 2020; v1 submitted 14 May, 2019;
originally announced May 2019.
-
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models
Authors:
Mayank Singh,
Abhishek Sinha,
Nupur Kumari,
Harshitha Machiraju,
Balaji Krishnamurthy,
Vineeth N Balasubramanian
Abstract:
Neural networks are vulnerable to adversarial attacks -- small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the…
▽ More
Neural networks are vulnerable to adversarial attacks -- small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for construction of adversarial examples. LAT results in minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100 datasets.
△ Less
Submitted 25 June, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
FANTrack: 3D Multi-Object Tracking with Feature Association Network
Authors:
Erkan Baser,
Venkateshwaran Balasubramanian,
Prarthana Bhattacharyya,
Krzysztof Czarnecki
Abstract:
We propose a data-driven approach to online multi-object tracking (MOT) that uses a convolutional neural network (CNN) for data association in a tracking-by-detection framework. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediousl…
▽ More
We propose a data-driven approach to online multi-object tracking (MOT) that uses a convolutional neural network (CNN) for data association in a tracking-by-detection framework. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects. Our solution learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train. We evaluate our approach on the challenging KITTI dataset and show competitive results. Our code is available at https://git.uwaterloo.ca/wise-lab/fantrack.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Teaching GANs to Sketch in Vector Format
Authors:
Varshaneya V,
S Balasubramanian,
Vineeth N Balasubramanian
Abstract:
Sketching is more fundamental to human cognition than speech. Deep Neural Networks (DNNs) have achieved the state-of-the-art in speech-related tasks but have not made significant development in generating stroke-based sketches a.k.a sketches in vector format. Though there are Variational Auto Encoders (VAEs) for generating sketches in vector format, there is no Generative Adversarial Network (GAN)…
▽ More
Sketching is more fundamental to human cognition than speech. Deep Neural Networks (DNNs) have achieved the state-of-the-art in speech-related tasks but have not made significant development in generating stroke-based sketches a.k.a sketches in vector format. Though there are Variational Auto Encoders (VAEs) for generating sketches in vector format, there is no Generative Adversarial Network (GAN) architecture for the same. In this paper, we propose a standalone GAN architecture SkeGAN and a VAE-GAN architecture VASkeGAN, for sketch generation in vector format. SkeGAN is a stochastic policy in Reinforcement Learning (RL), capable of generating both multidimensional continuous and discrete outputs. VASkeGAN hybridizes a VAE and a GAN, in order to couple the efficient representation of data by VAE with the powerful generating capabilities of a GAN, to produce visually appealing sketches. We also propose a new metric called the Ske-score which quantifies the quality of vector sketches. We have validated that SkeGAN and VASkeGAN generate visually appealing sketches by using Human Turing Test and Ske-score.
△ Less
Submitted 7 April, 2019;
originally announced April 2019.
-
RADICAL-Cybertools: Middleware Building Blocks for Scalable Science
Authors:
Vivek Balasubramanian,
Shantenu Jha,
Andre Merzky,
Matteo Turilli
Abstract:
RADICAL-Cybertools (RCT) are a set of software systems that serve as middleware to develop efficient and effective tools for scientific computing. Specifically, RCT enable executing many-task applications at extreme scale and on a variety of computing infrastructures. RCT are building blocks, designed to work as stand-alone systems, integrated among themselves or integrated with third-party system…
▽ More
RADICAL-Cybertools (RCT) are a set of software systems that serve as middleware to develop efficient and effective tools for scientific computing. Specifically, RCT enable executing many-task applications at extreme scale and on a variety of computing infrastructures. RCT are building blocks, designed to work as stand-alone systems, integrated among themselves or integrated with third-party systems. RCT enables innovative science in multiple domains, including but not limited to biophysics, climate science and particle physics, consuming hundreds of millions of core hours. This paper provides an overview of RCT systems, their impact, and the architectural principles and software engineering underlying RCT
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Middleware Building Blocks for Workflow Systems
Authors:
Matteo Turilli,
Vivek Balasubramanian,
Andre Merzky,
Ioannis Paraskevakos,
Shantenu Jha
Abstract:
This paper describes a building blocks approach to the design of scientific workflow systems. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they are designed and developed in accordance with this approach. This paper offers three main contributions: (i) showing the relevance of the design principles underlying the building blocks approach to suppor…
▽ More
This paper describes a building blocks approach to the design of scientific workflow systems. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they are designed and developed in accordance with this approach. This paper offers three main contributions: (i) showing the relevance of the design principles underlying the building blocks approach to support scientific workflows on high performance computing platforms; (ii) illustrating a set of building blocks that enable multiple points of integration, "unifying" conceptual reasoning across otherwise very different tools and systems; and (iii) case studies discussing how RADICAL-Cybertools are integrated with existing workflow, workload, and general purpose computing systems and used to develop domain-specific workflow systems.
△ Less
Submitted 27 June, 2019; v1 submitted 24 March, 2019;
originally announced March 2019.
-
What the odor is not: Estimation by elimination
Authors:
Vijay Singh,
Martin Tchernookov,
Vijay Balasubramanian
Abstract:
Olfactory systems use a small number of broadly sensitive receptors to combinatorially encode a vast number of odors. We propose a method of decoding such distributed representations by exploiting a statistical fact: receptors that do not respond to an odor carry more information than receptors that do because they signal the absence of all odorants that bind to them. Thus, it is easier to identif…
▽ More
Olfactory systems use a small number of broadly sensitive receptors to combinatorially encode a vast number of odors. We propose a method of decoding such distributed representations by exploiting a statistical fact: receptors that do not respond to an odor carry more information than receptors that do because they signal the absence of all odorants that bind to them. Thus, it is easier to identify what the odor is not, rather than what the odor is. For realistic numbers of receptors, response functions, and odor complexity, this method of elimination turns an underconstrained decoding problem into a solvable one, allowing accurate determination of odorants in a mixture and their concentrations. We construct a neural network realization of our algorithm based on the structure of the olfactory pathway.
△ Less
Submitted 25 August, 2021; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Zero-Shot Task Transfer
Authors:
Arghya Pal,
Vineeth N Balasubramanian
Abstract:
In this work, we present a novel meta-learning algorithm, i.e. TTNet, that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zero-shot tasks. Such intuition finds its foothold in cog…
▽ More
In this work, we present a novel meta-learning algorithm, i.e. TTNet, that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zero-shot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel-concept (depth understanding) by correlating it with old concepts (hand movement or self-motion), without receiving explicit supervision. We evaluated our model on the Taskonomy dataset, with four tasks as zero-shot: surface-normal, room layout, depth, and camera pose estimation. These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology out-performs state-of-the-art models (which use ground truth)on each of our zero-shot tasks, showing promise on zero-shot task transfer. We also conducted extensive experiments to study the various choices of our methodology, as well as showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the firstsuch effort on zero-shot learning in the task space.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
The size of the immune repertoire of bacteria
Authors:
Serena Bradde,
Armita Nourmohammad,
Sidhartha Goyal,
Vijay Balasubramanian
Abstract:
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas mechanism, that confers adaptive immunity against phage. In such species, individual bacteria maintain a "cassette" of viral DNA elements called spacers as a memory of past infections. The typical cassette contains a few dozen spacers. Given that bacteria can have very large genomes, and since having more spacers should co…
▽ More
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas mechanism, that confers adaptive immunity against phage. In such species, individual bacteria maintain a "cassette" of viral DNA elements called spacers as a memory of past infections. The typical cassette contains a few dozen spacers. Given that bacteria can have very large genomes, and since having more spacers should confer a better memory, it is puzzling that so little genetic space would be devoted by bacteria to their adaptive immune system. Here, we identify a fundamental trade-off between the size of the bacterial immune repertoire and effectiveness of response to a given threat, and show how this tradeoff imposes a limit on the optimal size of the CRISPR cassette.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Neural Network Attributions: A Causal Perspective
Authors:
Aditya Chattopadhyay,
Piyushi Manupriya,
Anirban Sarkar,
Vineeth N Balasubramanian
Abstract:
We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology to compute the causal effect of each feature on the output is presented. With reasonable assumptions on the causal structure of the input data, we propose algor…
▽ More
We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology to compute the causal effect of each feature on the output is presented. With reasonable assumptions on the causal structure of the input data, we propose algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality. We also show how this method can be used for recurrent neural networks. We report experimental results on both simulated and real datasets showcasing the promise and usefulness of the proposed algorithm.
△ Less
Submitted 3 July, 2019; v1 submitted 6 February, 2019;
originally announced February 2019.
-
DANTE: Deep AlterNations for Training nEural networks
Authors:
Vaibhav B Sinha,
Sneha Kudugunta,
Adepu Ravi Sankar,
Surya Teja Chavali,
Purushottam Kar,
Vineeth N Balasubramanian
Abstract:
We present DANTE, a novel method for training neural networks using the alternating minimization principle. DANTE provides an alternate perspective to traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convexity to cast training a neural network as a bi-quasi-convex optimization problem. We show that for neural network con…
▽ More
We present DANTE, a novel method for training neural networks using the alternating minimization principle. DANTE provides an alternate perspective to traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convexity to cast training a neural network as a bi-quasi-convex optimization problem. We show that for neural network configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations effectively in this formulation. DANTE can also be extended to networks with multiple hidden layers. In experiments on standard datasets, neural networks trained using the proposed method were found to be promising and competitive to traditional backpropagation techniques, both in terms of quality of the solution, as well as training speed.
△ Less
Submitted 9 August, 2020; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Low-Cost Transfer Learning of Face Tasks
Authors:
Thrupthi Ann John,
Isha Dua,
Vineeth N Balasubramanian,
C. V. Jawahar
Abstract:
Do we know what the different filters of a face network represent? Can we use this filter information to train other tasks without transfer learning? For instance, can age, head pose, emotion and other face related tasks be learned from face recognition network without transfer learning? Understanding the role of these filters allows us to transfer knowledge across tasks and take advantage of larg…
▽ More
Do we know what the different filters of a face network represent? Can we use this filter information to train other tasks without transfer learning? For instance, can age, head pose, emotion and other face related tasks be learned from face recognition network without transfer learning? Understanding the role of these filters allows us to transfer knowledge across tasks and take advantage of large data sets in related tasks. Given a pretrained network, we can infer which tasks the network generalizes for and the best way to transfer the information to a new task.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
The dual of non-extremal area: differential entropy in higher dimensions
Authors:
Vijay Balasubramanian,
Charles Rabideau
Abstract:
The Ryu-Takayanagi formula relates entanglement entropy in a field theory to the area of extremal surfaces anchored to the boundary of a dual AdS space. It is interesting to ask if there is also an information theoretic interpretation of the areas of non-extremal surfaces that are not necessarily boundary-anchored. In general, the physics outside such surfaces is associated to observers restricted…
▽ More
The Ryu-Takayanagi formula relates entanglement entropy in a field theory to the area of extremal surfaces anchored to the boundary of a dual AdS space. It is interesting to ask if there is also an information theoretic interpretation of the areas of non-extremal surfaces that are not necessarily boundary-anchored. In general, the physics outside such surfaces is associated to observers restricted to a time-strip in the dual boundary field theory. When the latter is two-dimensional, it is known that the differential entropy associated to the strip computes the length of the dual bulk curve, and has an interpretation in terms of the information cost in Bell pairs of restoring correlations inaccessible to observers in the strip. A general realization of this formalism in higher dimensions is unknown. We first prove a no-go theorem eliminating candidate expressions for higher dimensional differential entropy based on entropic c-theorems. Then we propose a new formula in terms of an integral of shape derivatives of the entanglement entropy of ball shaped regions. Our proposal stems from the physical requirement that differential entropy must be locally finite and conformally invariant. Demanding cancellation of the well-known UV divergences of entanglement entropy in field theory guides us to our conjecture, which we test for surfaces in $AdS_4$. Our results suggest a candidate c-function for field theories in arbitrary dimensions.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.
-
Information flows in strongly coupled ABJM theory
Authors:
Vijay Balasubramanian,
Niko Jokela,
Arttu Pönni,
Alfonso V. Ramallo
Abstract:
We use holographic methods to characterize the RG flow of quantum information in a Chern-Simons theory coupled to massive fermions. First, we use entanglement entropy and mutual information between strips to derive the dimension of the RG-driving operator and a monotonic c-function. We then display a scaling regime where, unlike in a CFT, the mutual information between strips changes non-monotonic…
▽ More
We use holographic methods to characterize the RG flow of quantum information in a Chern-Simons theory coupled to massive fermions. First, we use entanglement entropy and mutual information between strips to derive the dimension of the RG-driving operator and a monotonic c-function. We then display a scaling regime where, unlike in a CFT, the mutual information between strips changes non-monotonically with strip width, vanishing in both IR and UV but rising to a maximum at intermediate scales. The associated information transitions also contribute to non-monotonicity in the conditional mutual information which characterizes the independence of neighboring strips after conditioning on a third. Finally, we construct a measure of extensivity which tests to what extent information that region A shares with regions B and C is additive. In general, mutual information is super-extensive in holographic theories, and we might expect super-extensivity to be maximized in CFTs since they are scale-free. Surprisingly, our massive theory is more super-extensive than a CFT in a range of scales near the UV limit, although it is less super-extensive than a CFT at all lower scales. Our analysis requires the full ten-dimensional dual gravity background, and the extremal surfaces computing entanglement entropy explore all of these dimensions.
△ Less
Submitted 30 January, 2019; v1 submitted 23 November, 2018;
originally announced November 2018.
-
Binding Complexity and Multiparty Entanglement
Authors:
Vijay Balasubramanian,
Matthew DeCross,
Arjun Kar,
Onkar Parrikar
Abstract:
We introduce "binding complexity", a new notion of circuit complexity which quantifies the difficulty of distributing entanglement among multiple parties, each consisting of many local degrees of freedom. We define binding complexity of a given state as the minimal number of quantum gates that must act between parties to prepare it. To illustrate the new notion we compute it in a toy model for a s…
▽ More
We introduce "binding complexity", a new notion of circuit complexity which quantifies the difficulty of distributing entanglement among multiple parties, each consisting of many local degrees of freedom. We define binding complexity of a given state as the minimal number of quantum gates that must act between parties to prepare it. To illustrate the new notion we compute it in a toy model for a scalar field theory, using certain multiparty entangled states which are analogous to configurations that are known in AdS/CFT to correspond to multiboundary wormholes. Pursuing this analogy, we show that our states can be prepared by the Euclidean path integral in $(0+1)$-dimensional quantum mechanics on graphs with wormhole-like structure. We compute the binding complexity of our states by adapting the Euler-Arnold approach to Nielsen's geometrization of gate counting, and find a scaling with entropy that resembles a result for the interior volume of holographic multiboundary wormholes. We also compute the binding complexity of general coherent states in perturbation theory, and show that for "double-trace deformations" of the Hamiltonian the effects resemble expansion of a wormhole interior in holographic theories.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Emergent classical spacetime from microstates of an incipient black hole
Authors:
Vijay Balasubramanian,
David Berenstein,
Aitor Lewkowycz,
Alexandra Miller,
Onkar Parrikar,
Charles Rabideau
Abstract:
Black holes have an enormous underlying space of microstates, but universal macroscopic physics characterized by mass, charge and angular momentum as well as a causally disconnected interior. This leads two related puzzles: (1) How does the effective factorization of interior and exterior degrees of freedom emerge in gravity?, and (2) How does the underlying degeneracy of states wind up having a g…
▽ More
Black holes have an enormous underlying space of microstates, but universal macroscopic physics characterized by mass, charge and angular momentum as well as a causally disconnected interior. This leads two related puzzles: (1) How does the effective factorization of interior and exterior degrees of freedom emerge in gravity?, and (2) How does the underlying degeneracy of states wind up having a geometric realization in the horizon area and in properties of the singularity? We explore these puzzles in the context of an incipient black hole in the AdS/CFT correspondence, the microstates of which are dual to half-BPS states of the $\mathcal{N}=4$ super-Yang-Mills theory. First, we construct a code subspace for this black hole and show how to organize it as a tensor product of a universal macroscopic piece (describing the exterior), and a factor corresponding to the microscopic degrees of freedom (describing the interior). We then study the classical phase space and symplectic form for low-energy excitations around the black hole. On the AdS side, we find that the symplectic form has a new physical degree of freedom at the stretched horizon of the black hole, reminiscent of soft hair, which is absent in the microstates. We explicitly show how such a soft mode emerges from the microscopic phase space in the dual CFT via a canonical transformation and how it encodes partial information about the microscopic degrees of freedom of the black hole.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis
Authors:
K J Joseph,
Arghya Pal,
Sailaja Rajanala,
Vineeth N Balasubramanian
Abstract:
Generating an image from its description is a challenging task worth solving because of its numerous practical applications ranging from image editing to virtual reality. All existing methods use one single caption to generate a plausible image. A single caption by itself, can be limited, and may not be able to capture the variety of concepts and behavior that may be present in the image. We propo…
▽ More
Generating an image from its description is a challenging task worth solving because of its numerous practical applications ranging from image editing to virtual reality. All existing methods use one single caption to generate a plausible image. A single caption by itself, can be limited, and may not be able to capture the variety of concepts and behavior that may be present in the image. We propose two deep generative models that generate an image by making use of multiple captions describing it. This is achieved by ensuring 'Cross-Caption Cycle Consistency' between the multiple captions and the generated image(s). We report quantitative and qualitative results on the standard Caltech-UCSD Birds (CUB) and Oxford-102 Flowers datasets to validate the efficacy of the proposed approach.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
MASON: A Model AgnoStic ObjectNess Framework
Authors:
K J Joseph,
Vineeth N Balasubramanian
Abstract:
This paper proposes a simple, yet very effective method to localize dominant foreground objects in an image, to pixel-level precision. The proposed method 'MASON' (Model-AgnoStic ObjectNess) uses a deep convolutional network to generate category-independent and model-agnostic heat maps for any image. The network is not explicitly trained for the task, and hence, can be used off-the-shelf in tandem…
▽ More
This paper proposes a simple, yet very effective method to localize dominant foreground objects in an image, to pixel-level precision. The proposed method 'MASON' (Model-AgnoStic ObjectNess) uses a deep convolutional network to generate category-independent and model-agnostic heat maps for any image. The network is not explicitly trained for the task, and hence, can be used off-the-shelf in tandem with any other network or task. We show that this framework scales to a wide variety of images, and illustrate the effectiveness of MASON in three varied application contexts.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Dynamic self-organized error-correction of grid cells by border cells
Authors:
Eli Pollock,
Niral Desai,
Xue-Xin Wei,
Vijay Balasubramanian
Abstract:
Grid cells in the entorhinal cortex are believed to establish their regular, spatially correlated firing patterns by path integration of the animal's motion. Mechanisms for path integration, e.g. in attractor network models, predict stochastic drift of grid responses, which is not observed experimentally. We demonstrate a biologically plausible mechanism of dynamic self-organization by which borde…
▽ More
Grid cells in the entorhinal cortex are believed to establish their regular, spatially correlated firing patterns by path integration of the animal's motion. Mechanisms for path integration, e.g. in attractor network models, predict stochastic drift of grid responses, which is not observed experimentally. We demonstrate a biologically plausible mechanism of dynamic self-organization by which border cells, which fire at environmental boundaries, can correct such drift in grid cells. In our model, experience-dependent Hebbian plasticity during exploration allows border cells to learn connectivity to grid cells. Border cells in this learned network reset the phase of drifting grids. This error-correction mechanism is robust to environmental shape and complexity, including enclosures with interior barriers, and makes distinctive predictions for environmental deformation experiments. Our work demonstrates how diverse cell types in the entorhinal cortex could interact dynamically and adaptively to achieve robust path integration.
△ Less
Submitted 4 August, 2018;
originally announced August 2018.
-
On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks
Authors:
Adepu Ravi Sankar,
Vishwak Srinivasan,
Vineeth N Balasubramanian
Abstract:
Theoretical analysis of the error landscape of deep neural networks has garnered significant interest in recent years. In this work, we theoretically study the importance of noise in the trajectories of gradient descent towards optimal solutions in multi-layer neural networks. We show that adding noise (in different ways) to a neural network while training increases the rank of the product of weig…
▽ More
Theoretical analysis of the error landscape of deep neural networks has garnered significant interest in recent years. In this work, we theoretically study the importance of noise in the trajectories of gradient descent towards optimal solutions in multi-layer neural networks. We show that adding noise (in different ways) to a neural network while training increases the rank of the product of weight matrices of a multi-layer linear neural network. We thus study how adding noise can assist reaching a global optimum when the product matrix is full-rank (under certain conditions). We establish theoretical foundations between the noise induced into the neural network - either to the gradient, to the architecture, or to the input/output to a neural network - and the rank of product of weight matrices. We corroborate our theoretical findings with empirical results.
△ Less
Submitted 21 July, 2018;
originally announced July 2018.
-
How a well-adapting immune system remembers
Authors:
Andreas Mayer,
Vijay Balasubramanian,
Aleksandra M. Walczak,
Thierry Mora
Abstract:
An adaptive agent predicting the future state of an environment must weigh trust in new observations against prior experiences. In this light, we propose a view of the adaptive immune system as a dynamic Bayesian machinery that updates its memory repertoire by balancing evidence from new pathogen encounters against past experience of infection to predict and prepare for future threats. This framew…
▽ More
An adaptive agent predicting the future state of an environment must weigh trust in new observations against prior experiences. In this light, we propose a view of the adaptive immune system as a dynamic Bayesian machinery that updates its memory repertoire by balancing evidence from new pathogen encounters against past experience of infection to predict and prepare for future threats. This framework links the observed initial rapid increase of the memory pool early in life followed by a mid-life plateau to the ease of learning salient features of sparse environments. We also derive a modulated memory pool update rule in agreement with current vaccine response experiments. Our results suggest that pathogenic environments are sparse and that memory repertoires significantly decrease infection costs even with moderate sampling. The predicted optimal update scheme maps onto commonly considered competitive dynamics for antigen receptors.
△ Less
Submitted 13 November, 2018; v1 submitted 14 June, 2018;
originally announced June 2018.
-
Entanglement versus entwinement in symmetric product orbifolds
Authors:
Vijay Balasubramanian,
Ben Craps,
Tim De Jonckheere,
Gábor Sárosi
Abstract:
We study the entanglement entropy of gauged internal degrees of freedom in a two dimensional symmetric product orbifold CFT, whose configurations consist of $N$ strands sewn together into "long" strings, with wavefunctions symmetrized under permutations. In earlier work a related notion of "entwinement" was introduced. Here we treat this system analogously to a system of $N$ identical particles. F…
▽ More
We study the entanglement entropy of gauged internal degrees of freedom in a two dimensional symmetric product orbifold CFT, whose configurations consist of $N$ strands sewn together into "long" strings, with wavefunctions symmetrized under permutations. In earlier work a related notion of "entwinement" was introduced. Here we treat this system analogously to a system of $N$ identical particles. From an algebraic point of view, we point out that the reduced density matrix on $k$ out of $N$ particles is not associated with a subalgebra of operators, but rather with a linear subspace, which we explain is sufficient. In the orbifold CFT, we compute the entropy of a single strand in states holographically dual in the D1/D5 system to a conical defect geometry or a massless BTZ black hole and find a result identical to entwinement. We also calculate the entropy of two strands in the state that represents the conical defect; the result differs from entwinement. In this case, matching entwinement would require finding a gauge-invariant way to impose continuity across strands.
△ Less
Submitted 30 January, 2019; v1 submitted 7 June, 2018;
originally announced June 2018.
-
A geometric attractor mechanism for self-organization of entorhinal grid modules
Authors:
Louis Kang,
Vijay Balasubramanian
Abstract:
Grid cells in the medial entorhinal cortex (MEC) respond when an animal occupies a periodic lattice of "grid fields" in the environment. The grids are organized in modules with spatial periods, or scales, clustered around discrete values separated by ratios in the range 1.2--2.0. We propose a mechanism that produces this modular structure through dynamical self-organization in the MEC. In attracto…
▽ More
Grid cells in the medial entorhinal cortex (MEC) respond when an animal occupies a periodic lattice of "grid fields" in the environment. The grids are organized in modules with spatial periods, or scales, clustered around discrete values separated by ratios in the range 1.2--2.0. We propose a mechanism that produces this modular structure through dynamical self-organization in the MEC. In attractor network models of grid formation, the grid scale of a single module is set by the distance of recurrent inhibition between neurons. We show that the MEC forms a hierarchy of discrete modules if a smooth increase in inhibition distance along its dorso-ventral axis is accompanied by excitatory interactions along this axis. Moreover, constant scale ratios between successive modules arise through geometric relationships between triangular grids and have values that fall within the observed range. We discuss how interactions required by our model might be tested experimentally.
△ Less
Submitted 11 March, 2019; v1 submitted 4 June, 2018;
originally announced June 2018.
-
A competitive binding model predicts nonlinear responses of olfactory receptors to complex mixtures
Authors:
Vijay Singh,
Nicolle R. Murphy,
Vijay Balasubramanian,
Joel D. Mainland
Abstract:
In color vision, the quantitative rules for mixing lights to make a target color are well understood. By contrast, the rules for mixing odorants to make a target odor remain elusive. A solution to this problem in vision relied on characterizing receptor responses to different wavelengths of light and subsequently relating these responses to perception. In olfaction, experimentally measuring recept…
▽ More
In color vision, the quantitative rules for mixing lights to make a target color are well understood. By contrast, the rules for mixing odorants to make a target odor remain elusive. A solution to this problem in vision relied on characterizing receptor responses to different wavelengths of light and subsequently relating these responses to perception. In olfaction, experimentally measuring receptor responses to a representative set of complex mixtures is intractable due to the vast number of possibilities. To meet this challenge, we develop a biophysical model that predicts mammalian receptor responses to complex mixtures using responses to single odorants. The dominant nonlinearity in our model is competitive binding (CB): only one odorant molecule can attach to a receptor binding site at a time. This simple framework predicts receptor responses to mixtures of up to twelve monomolecular odorants to within 15\% of experimental observations and provides a powerful method for leveraging limited experimental data. Simple extensions of our model describe phenomena such as synergy, overshadowing, and inhibition. We demonstrate that the presence of such interactions can be identified via systematic deviations from the competitive binding model.
△ Less
Submitted 4 February, 2019; v1 submitted 1 May, 2018;
originally announced May 2018.
-
Adaptive Ensemble Biomolecular Simulations at Scale
Authors:
Vivek Balasubramanian,
Travis Jensen,
Matteo Turilli,
Peter Kasson,
Michael Shirts,
Shantenu Jha
Abstract:
Recent advances in both theory and methods have created opportunities to simulate biomolecular processes more efficiently using adaptive ensemble simulations. Ensemble-based simulations are used widely to compute a number of individual simulation trajectories and analyze statistics across them. Adaptive ensemble simulations offer a further level of sophistication and flexibility by enabling high-l…
▽ More
Recent advances in both theory and methods have created opportunities to simulate biomolecular processes more efficiently using adaptive ensemble simulations. Ensemble-based simulations are used widely to compute a number of individual simulation trajectories and analyze statistics across them. Adaptive ensemble simulations offer a further level of sophistication and flexibility by enabling high-level algorithms to control simulations based on intermediate results. Novel high-level algorithms require sophisticated approaches to utilize the intermediate data during runtime. Thus, there is a need for scalable software systems to support adaptive ensemble-based applications. We describe the operations in executing adaptive workflows, classify different types of adaptations, and describe challenges in implementing them in software tools. We enhance Ensemble Toolkit (EnTK) -- an ensemble execution system -- to support the scalable execution of adaptive workflows on HPC systems, and characterize the adaptation overhead in EnTK. We implement two high-level adaptive ensemble algorithms -- expanded ensemble and Markov state modeling, and execute upto $2^{12}$ ensemble members, on thousands of cores on three distinct HPC platforms. We highlight scientific advantages enabled by the novel capabilities of our approach. To the best of our knowledge, this is the first attempt at describing and implementing multiple adaptive ensemble workflows using a common conceptual and implementation framework.
△ Less
Submitted 3 June, 2019; v1 submitted 12 April, 2018;
originally announced April 2018.
-
Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Authors:
Arghya Pal,
Vineeth N Balasubramanian
Abstract:
Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial D…
▽ More
Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.
△ Less
Submitted 14 March, 2018;
originally announced March 2018.
-
Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification
Authors:
Vaibhav B Sinha,
Sukrut Rao,
Vineeth N Balasubramanian
Abstract:
Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on t…
▽ More
Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on the principle of Expectation-Maximization (EM). We propose a new simple, yet effective, EM-based algorithm, which can be interpreted as a `hard' version of DS, that allows much faster convergence while maintaining similar accuracy in aggregation. We show the use of this algorithm as a quick and effective technique for online, real-time sentiment annotation. We also prove that our algorithm converges to the estimated labels at a linear rate. Our experiments on standard datasets show a significant speedup in time taken for aggregation - upto $\sim$8x over Dawid-Skene and $\sim$6x over other fast EM methods, at competitive accuracy performance. The code for the implementation of the algorithms can be found at https://github.com/GoodDeeds/Fast-Dawid-Skene
△ Less
Submitted 7 September, 2018; v1 submitted 7 March, 2018;
originally announced March 2018.
-
Adaptation of olfactory receptor abundances for efficient coding
Authors:
Tiberiu Tesileanu,
Simona Cocco,
Remi Monasson,
Vijay Balasubramanian
Abstract:
Olfactory receptor usage is highly heterogeneous, with some receptor types being orders of magnitude more abundant than others. We propose an explanation for this striking fact: the receptor distribution is tuned to maximally represent information about the olfactory environment in a regime of efficient coding that is sensitive to the global context of correlated sensor responses. This model predi…
▽ More
Olfactory receptor usage is highly heterogeneous, with some receptor types being orders of magnitude more abundant than others. We propose an explanation for this striking fact: the receptor distribution is tuned to maximally represent information about the olfactory environment in a regime of efficient coding that is sensitive to the global context of correlated sensor responses. This model predicts that in mammals, where olfactory sensory neurons are replaced regularly, receptor abundances should continuously adapt to odor statistics. Experimentally, increased exposure to odorants leads variously, but reproducibly, to increased, decreased, or unchanged abundances of different activated receptors. We demonstrate that this diversity of effects is required for efficient coding when sensors are broadly correlated, and provide an algorithm for predicting which olfactory receptors should increase or decrease in abundance following specific environmental changes. Finally, we give simple dynamical rules for neural birth and death processes that might underlie this adaptation.
△ Less
Submitted 22 January, 2019; v1 submitted 28 January, 2018;
originally announced January 2018.
-
Comments on Entanglement Entropy in String Theory
Authors:
Vijay Balasubramanian,
Onkar Parrikar
Abstract:
Entanglement entropy for spatial subregions is difficult to define in string theory because of the extended nature of strings. Here we propose a definition for Bosonic open strings using the framework of string field theory. The key difference (compared to ordinary quantum field theory) is that the subregion is chosen inside a Cauchy surface in the "space of open string configurations". We first p…
▽ More
Entanglement entropy for spatial subregions is difficult to define in string theory because of the extended nature of strings. Here we propose a definition for Bosonic open strings using the framework of string field theory. The key difference (compared to ordinary quantum field theory) is that the subregion is chosen inside a Cauchy surface in the "space of open string configurations". We first present a simple calculation of this entanglement entropy in free light-cone string field theory, ignoring subtleties related to the factorization of the Hilbert space. We reproduce the answer expected from an effective field theory point of view, namely a sum over the one-loop entanglement entropies corresponding to all the particle-excitations of the string, and further show that the full string theory regulates the ultraviolet divergences in the entanglement entropy. We then revisit the question of factorization of the Hilbert space by analyzing the covariant phase-space associated with a subregion in Witten's covariant string field theory. We show that the pure gauge (i.e., BRST exact) modes in the string field become dynamical at the entanglement cut. Thus, a proper definition of the entropy must involve an extended Hilbert space, with new stringy edge modes localized at the entanglement cut.
△ Less
Submitted 26 January, 2018; v1 submitted 10 January, 2018;
originally announced January 2018.
-
Entanglement Entropy and the Colored Jones Polynomial
Authors:
Vijay Balasubramanian,
Matthew DeCross,
Jackson Fliss,
Arjun Kar,
Robert G. Leigh,
Onkar Parrikar
Abstract:
We study the multi-party entanglement structure of states in Chern-Simons theory created by performing the path integral on 3-manifolds with linked torus boundaries, called link complements. For gauge group $SU(2)$, the wavefunctions of these states (in a particular basis) are the colored Jones polynomials of the corresponding links. We first review the case of $U(1)$ Chern-Simons theory where the…
▽ More
We study the multi-party entanglement structure of states in Chern-Simons theory created by performing the path integral on 3-manifolds with linked torus boundaries, called link complements. For gauge group $SU(2)$, the wavefunctions of these states (in a particular basis) are the colored Jones polynomials of the corresponding links. We first review the case of $U(1)$ Chern-Simons theory where these are stabilizer states, a fact we use to re-derive an explicit formula for the entanglement entropy across a general link bipartition. We then present the following results for $SU(2)$ Chern-Simons theory: (i) The entanglement entropy for a bipartition of a link gives a lower bound on the genus of surfaces in the ambient $S^3$ separating the two sublinks. (ii) All torus links (namely, links which can be drawn on the surface of a torus) have a GHZ-like entanglement structure -- i.e., partial traces leave a separable state. By contrast, through explicit computation, we test in many examples that hyperbolic links (namely, links whose complements admit hyperbolic structures) have W-like entanglement -- i.e., partial traces leave a non-separable state. (iii) Finally, we consider hyperbolic links in the complexified $SL(2,C)$ Chern-Simons theory, which is closely related to 3d Einstein gravity with a negative cosmological constant. In the limit of small Newton constant, we discuss how the entanglement structure is controlled by the Neumann-Zagier potential on the moduli space of hyperbolic structures on the link complement.
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
High-throughput Binding Affinity Calculations at Extreme Scales
Authors:
Jumana Dakka,
Matteo Turilli,
David W Wright,
Stefan J Zasada,
Vivek Balasubramanian,
Shunzhou Wan,
Peter V Coveney,
Shantenu Jha
Abstract:
Resistance to chemotherapy and molecularly targeted therapies is a major factor in limiting the effectiveness of cancer treatment. In many cases, resistance can be linked to genetic changes in target proteins, either pre-existing or evolutionarily selected during treatment. Key to overcoming this challenge is an understanding of the molecular determinants of drug binding. Using multi-stage pipelin…
▽ More
Resistance to chemotherapy and molecularly targeted therapies is a major factor in limiting the effectiveness of cancer treatment. In many cases, resistance can be linked to genetic changes in target proteins, either pre-existing or evolutionarily selected during treatment. Key to overcoming this challenge is an understanding of the molecular determinants of drug binding. Using multi-stage pipelines of molecular simulations we can gain insights into the binding free energy and the residence time of a ligand, which can inform both stratified and personal treatment regimes and drug development. To support the scalable, adaptive and automated calculation of the binding free energy on high-performance computing resources, we introduce the High- throughput Binding Affinity Calculator (HTBAC). HTBAC uses a building block approach in order to attain both workflow flexibility and performance. We demonstrate close to perfect weak scaling to hundreds of concurrent multi-stage binding affinity calculation pipelines. This permits a rapid time-to-solution that is essentially invariant of the calculation protocol, size of candidate ligands and number of ensemble simulations. As such, HTBAC advances the state of the art of binding affinity calculations and protocols.
△ Less
Submitted 13 February, 2018; v1 submitted 25 December, 2017;
originally announced December 2017.
-
ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Authors:
Vishwak Srinivasan,
Adepu Ravi Sankar,
Vineeth N Balasubramanian
Abstract:
Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it work…
▽ More
Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $\textit{ADINE}$, which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ($\ge 1$) can help escape saddles much faster. Using this motivation, we propose our method $\textit{ADINE}$ that helps weigh the previous updates more (by setting the momentum parameter $> 1$), evaluate our proposed algorithm on deep neural networks and show that $\textit{ADINE}$ helps the learning algorithm to converge much faster without compromising on the generalization error.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
STWalk: Learning Trajectory Representations in Temporal Graphs
Authors:
Supriya Pandhre,
Himangi Mittal,
Manish Gupta,
Vineeth N Balasubramanian
Abstract:
Analyzing the temporal behavior of nodes in time-varying graphs is useful for many applications such as targeted advertising, community evolution and outlier detection. In this paper, we present a novel approach, STWalk, for learning trajectory representations of nodes in temporal graphs. The proposed framework makes use of structural properties of graphs at current and previous time-steps to lear…
▽ More
Analyzing the temporal behavior of nodes in time-varying graphs is useful for many applications such as targeted advertising, community evolution and outlier detection. In this paper, we present a novel approach, STWalk, for learning trajectory representations of nodes in temporal graphs. The proposed framework makes use of structural properties of graphs at current and previous time-steps to learn effective node trajectory representations. STWalk performs random walks on a graph at a given time step (called space-walk) as well as on graphs from past time-steps (called time-walk) to capture the spatio-temporal behavior of nodes. We propose two variants of STWalk to learn trajectory representations. In one algorithm, we perform space-walk and time-walk as part of a single step. In the other variant, we perform space-walk and time-walk separately and combine the learned representations to get the final trajectory embedding. Extensive experiments on three real-world temporal graph datasets validate the effectiveness of the learned representations when compared to three baseline methods. We also show the goodness of the learned trajectory embeddings for change point detection, as well as demonstrate that arithmetic operations on these trajectory representations yield interesting and interpretable results.
△ Less
Submitted 11 November, 2017;
originally announced November 2017.
-
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Authors:
Aditya Chattopadhyay,
Anirban Sarkar,
Prantik Howlader,
Vineeth N Balasubramanian
Abstract:
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems. However, these deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest in develo** explainable deep learning models, and this paper is an effort in this direction.…
▽ More
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems. However, these deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest in develo** explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image, when compared to state-of-the-art. We provide a mathematical derivation for the proposed method, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the corresponding class label. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ provides promising human-interpretable visual explanations for a given CNN architecture across multiple tasks including classification, image caption generation and 3D action recognition; as well as in new settings such as knowledge distillation.
△ Less
Submitted 9 November, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Authors:
Vivek Balasubramanian,
Matteo Turilli,
Weiming Hu,
Matthieu Lefebvre,
Wenjie Lei,
Guido Cervone,
Jeroen Tromp,
Shantenu Jha
Abstract:
Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and ada…
▽ More
Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.
△ Less
Submitted 16 May, 2018; v1 submitted 23 October, 2017;
originally announced October 2017.
-
Attentive Semantic Video Generation using Captions
Authors:
Tanya Marwah,
Gaurav Mittal,
Vineeth N. Balasubramanian
Abstract:
This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to disting…
▽ More
This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. The network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network's ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition. (Accepted in International Conference in Computer Vision (ICCV) 2017)
△ Less
Submitted 21 October, 2017; v1 submitted 20 August, 2017;
originally announced August 2017.
-
Disorder and the neural representation of complex odors: smelling in the real world
Authors:
Kamesh Krishnamurthy,
Ann M Hermundstad,
Thierry Mora,
Aleksandra M Walczak,
Vijay Balasubramanian
Abstract:
Animals smelling in the real world use a small number of receptors to sense a vast number of natural molecular mixtures, and proceed to learn arbitrary associations between odors and valences. Here, we propose a new interpretation of how the architecture of olfactory circuits is adapted to meet these immense complementary challenges. First, the diffuse binding of receptors to many molecules compre…
▽ More
Animals smelling in the real world use a small number of receptors to sense a vast number of natural molecular mixtures, and proceed to learn arbitrary associations between odors and valences. Here, we propose a new interpretation of how the architecture of olfactory circuits is adapted to meet these immense complementary challenges. First, the diffuse binding of receptors to many molecules compresses a vast odor space into a tiny receptor space, while preserving similarity. Next, lateral interactions "densify" and decorrelate the response, enhancing robustness to noise. Finally, disordered projections from the periphery to the central brain reconfigure the densely packed information into a format suitable for flexible learning of associations and valences. We test our theory empirically using data from Drosophila. Our theory suggests that the neural processing of olfactory information differs from the other senses in its fundamental use of disorder.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
Multiresolution Match Kernels for Gesture Video Classification
Authors:
Hemanth Venkateswara,
Vineeth N. Balasubramanian,
Prasanth Lade,
Sethuraman Panchanathan
Abstract:
The emergence of depth imaging technologies like the Microsoft Kinect has renewed interest in computational methods for gesture classification based on videos. For several years now, researchers have used the Bag-of-Features (BoF) as a primary method for generation of feature vectors from video data for recognition of gestures. However, the BoF method is a coarse representation of the information…
▽ More
The emergence of depth imaging technologies like the Microsoft Kinect has renewed interest in computational methods for gesture classification based on videos. For several years now, researchers have used the Bag-of-Features (BoF) as a primary method for generation of feature vectors from video data for recognition of gestures. However, the BoF method is a coarse representation of the information in a video, which often leads to poor similarity measures between videos. Besides, when features extracted from different spatio-temporal locations in the video are pooled to create histogram vectors in the BoF method, there is an intrinsic loss of their original locations in space and time. In this paper, we propose a new Multiresolution Match Kernel (MMK) for video classification, which can be considered as a generalization of the BoF method. We apply this procedure to hand gesture classification based on RGB-D videos of the American Sign Language(ASL) hand gestures and our results show promise and usefulness of this new method.
△ Less
Submitted 22 June, 2017;
originally announced June 2017.
-
Are Saddles Good Enough for Deep Learning?
Authors:
Adepu Ravi Sankar,
Vineeth N Balasubramanian
Abstract:
Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge t…
▽ More
Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training deep networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as `good saddles'. We also verified the famous Wigner's Semicircle Law in our experimental results.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Heavy-Heavy-Light-Light correlators in Liouville theory
Authors:
Vijay Balasubramanian,
Alice Bernamonti,
Ben Craps,
Tim De Jonckheere,
Federico Galli
Abstract:
We compute four-point functions of two heavy and two "perturbatively heavy" operators in the semiclassical limit of Liouville theory on the sphere. We obtain these "Heavy-Heavy-Light-Light" (HHLL) correlators to leading order in the conformal weights of the light insertions in two ways: (a) via a path integral approach, combining different methods to evaluate correlation functions from complex sol…
▽ More
We compute four-point functions of two heavy and two "perturbatively heavy" operators in the semiclassical limit of Liouville theory on the sphere. We obtain these "Heavy-Heavy-Light-Light" (HHLL) correlators to leading order in the conformal weights of the light insertions in two ways: (a) via a path integral approach, combining different methods to evaluate correlation functions from complex solutions for the Liouville field, and (b) via the conformal block expansion. This latter approach identifies an integral over the continuum of normalizable states and a sum over an infinite tower of lighter discrete states, whose contribution we extract by analytically continuing standard results to our HHLL setting. The sum over this tower reproduces the sum over those complex saddlepoints of the path integral that contribute to the correlator. Our path integral computations reveal that when the two light operators are inserted at equal time in radial quantization, the leading-order HHLL correlator is independent of their separation, and more generally that at this order there is no short-distance singularity as the two light operators approach each other. The conformal block expansion likewise shows that in the discrete sum short-distance singularities are indeed absent for all intermediate states that contribute. In particular, the Virasoro vacuum block, which would have been singular at short distances, is not exchanged. The separation-independence of equal-time correlators is due to cancelations between the discrete contributions. These features lead to a Lorentzian singularity that, in conformal theories with anti-de Sitter (AdS) duals, would be associated to locality below the AdS scale.
△ Less
Submitted 15 August, 2017; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Entanglement shadows in LLM geometries
Authors:
Vijay Balasubramanian,
Albion Lawrence,
Andrew Rolph,
Simon Ross
Abstract:
We find a new example of an asymptotically $AdS_5 \times S^5$ geometry which has an entanglement shadow: that is, a region of spacetime which no Ryu-Takayanagi minimal surface enters. Our example is a particular case of the supersymmetric LLM geometries. Our results illustrate how minimal surfaces, which holographically geometrize entanglement entropy, can fail to probe the whole of spacetime, pos…
▽ More
We find a new example of an asymptotically $AdS_5 \times S^5$ geometry which has an entanglement shadow: that is, a region of spacetime which no Ryu-Takayanagi minimal surface enters. Our example is a particular case of the supersymmetric LLM geometries. Our results illustrate how minimal surfaces, which holographically geometrize entanglement entropy, can fail to probe the whole of spacetime, posing a challenge for attempts to directly reconstruct holographic geometries from the entanglement entropies of the dual field theory. We also comment on the relation to previous investigations of minimal surfaces localised in the $S^5$ factor of AdS$_5 \times S^5$.
△ Less
Submitted 11 October, 2017; v1 submitted 11 April, 2017;
originally announced April 2017.
-
Community-based Outlier Detection for Edge-attributed Graphs
Authors:
Supriya Pandhre,
Manish Gupta,
Vineeth N Balasubramanian
Abstract:
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. Beyond graph analysis tasks like graph query processing, link analysis, influence propagation, there has recently been some work in the area of outlier detection for information network data. Although various kinds of outliers have been studied for graph data, there is not much work on anoma…
▽ More
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. Beyond graph analysis tasks like graph query processing, link analysis, influence propagation, there has recently been some work in the area of outlier detection for information network data. Although various kinds of outliers have been studied for graph data, there is not much work on anomaly detection from edge-attributed graphs. In this paper, we introduce a method that detects novel outlier graph nodes by taking into account the node data and edge data simultaneously to detect anomalies. We model the problem as a community detection task, where outliers form a separate community. We propose a method that uses a probabilistic graph model (Hidden Markov Random Field) for joint modeling of nodes and edges in the network to compute Holistic Community Outliers (HCOutliers). Thus, our model presents a natural setting for heterogeneous graphs that have multiple edges/relationships between two nodes. EM (Expectation Maximization) is used to learn model parameters, and infer hidden community labels. Experimental results on synthetic datasets and the DBLP dataset show the effectiveness of our approach for finding novel outliers from networks.
△ Less
Submitted 11 November, 2017; v1 submitted 30 December, 2016;
originally announced December 2016.
-
Echoes of chaos from string theory black holes
Authors:
Vijay Balasubramanian,
Ben Craps,
Bartłomiej Czech,
Gábor Sárosi
Abstract:
The strongly coupled D1-D5 conformal field theory is a microscopic model of black holes which is expected to have chaotic dynamics. Here, we study the weak coupling limit of the theory where it is integrable rather than chaotic. In this limit, the operators creating microstates of the lowest mass black hole are known exactly. We consider the time-ordered two-point function of light probes in these…
▽ More
The strongly coupled D1-D5 conformal field theory is a microscopic model of black holes which is expected to have chaotic dynamics. Here, we study the weak coupling limit of the theory where it is integrable rather than chaotic. In this limit, the operators creating microstates of the lowest mass black hole are known exactly. We consider the time-ordered two-point function of light probes in these microstates, normalized by the same two-point function in vacuum. These correlators display a universal early-time decay followed by late-time sporadic behavior. To find a prescription for temporal coarse-graining of these late fluctuations we appeal to random matrix theory, where we show that a progressive time-average smooths the spectral form factor (a proxy for the 2-point function) in a typical draw of a random matrix. This coarse-grained quantity reproduces the matrix ensemble average to a good approximation. Employing this coarse-graining in the D1-D5 system, we find that the early-time decay is followed by a dip, a ramp and a plateau, in remarkable qualitative agreement with recent studies of the Sachdev-Ye-Kitaev (SYK) model. We study the timescales involved, comment on similarities and differences between our integrable model and the chaotic SYK model, and suggest ways to extend our results away from the integrable limit.
△ Less
Submitted 3 April, 2017; v1 submitted 13 December, 2016;
originally announced December 2016.