-
CodeGemma: Open Code Models Based on Gemma
Authors:
CodeGemma Team,
Heri Zhao,
Jeffrey Hui,
Joshua Howland,
Nam Nguyen,
Siqi Zuo,
Andrea Hu,
Christopher A. Choquette-Choo,
**gyue Shen,
Joe Kelley,
Kshitij Bansal,
Luke Vilnis,
Mateo Wirth,
Paul Michel,
Peter Choy,
Pratik Joshi,
Ravin Kumar,
Sarmad Hashmi,
Shubham Agrawal,
Zhitao Gong,
Jane Fine,
Tris Warkentin,
Ale Jakse Hartman,
Bin Ni,
Kathy Korevec
, et al. (2 additional authors not shown)
Abstract:
This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open…
▽ More
This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings.
△ Less
Submitted 18 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
RadSegNet: A Reliable Approach to Radar Camera Fusion
Authors:
Kshitiz Bansal,
Keshav Rungta,
Dinesh Bharadia
Abstract:
Perception systems for autonomous driving have seen significant advancements in their performance over last few years. However, these systems struggle to show robustness in extreme weather conditions because sensors like lidars and cameras, which are the primary sensors in a sensor suite, see a decline in performance under these conditions. In order to solve this problem, camera-radar fusion syste…
▽ More
Perception systems for autonomous driving have seen significant advancements in their performance over last few years. However, these systems struggle to show robustness in extreme weather conditions because sensors like lidars and cameras, which are the primary sensors in a sensor suite, see a decline in performance under these conditions. In order to solve this problem, camera-radar fusion systems provide a unique opportunity for all weather reliable high quality perception. Cameras provides rich semantic information while radars can work through occlusions and in all weather conditions. In this work, we show that the state-of-the-art fusion methods perform poorly when camera input is degraded, which essentially results in losing the all-weather reliability they set out to achieve. Contrary to these approaches, we propose a new method, RadSegNet, that uses a new design philosophy of independent information extraction and truly achieves reliability in all conditions, including occlusions and adverse weather. We develop and validate our proposed system on the benchmark Astyx dataset and further verify these results on the RADIATE dataset. When compared to state-of-the-art methods, RadSegNet achieves a 27% improvement on Astyx and 41.46% increase on RADIATE, in average precision score and maintains a significantly better performance in adverse weather conditions
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Pointillism: Accurate 3D bounding box estimation with multi-radars
Authors:
Kshitiz Bansal,
Keshav Rungta,
Siyuan Zhu,
Dinesh Bharadia
Abstract:
Autonomous perception requires high-quality environment sensing in the form of 3D bounding boxes of dynamic objects. The primary sensors used in automotive systems are light-based cameras and LiDARs. However, they are known to fail in adverse weather conditions. Radars can potentially solve this problem as they are barely affected by adverse weather conditions. However, specular reflections of wir…
▽ More
Autonomous perception requires high-quality environment sensing in the form of 3D bounding boxes of dynamic objects. The primary sensors used in automotive systems are light-based cameras and LiDARs. However, they are known to fail in adverse weather conditions. Radars can potentially solve this problem as they are barely affected by adverse weather conditions. However, specular reflections of wireless signals cause poor performance of radar point clouds. We introduce Pointillism, a system that combines data from multiple spatially separated radars with an optimal separation to mitigate these problems. We introduce a novel concept of Cross Potential Point Clouds, which uses the spatial diversity induced by multiple radars and solves the problem of noise and sparsity in radar point clouds. Furthermore, we present the design of RP-net, a novel deep learning architecture, designed explicitly for radar's sparse data distribution, to enable accurate 3D bounding box estimation. The spatial techniques designed and proposed in this paper are fundamental to radars point cloud distribution and would benefit other radar sensing applications.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Sequential Change Detection through Empirical Distribution and Universal Codes
Authors:
Vikrant Malik,
R. K. Bansal
Abstract:
Universal compression algorithms have been studied in the past for sequential change detection, where they have been used to estimate the post-change distribution in the modified version of the Cumulative Sum (CUSUM) Test. In this paper, we introduce a modified CUSUM test where the pre-change distribution is also unknown and an empirical version of the pre-change distribution is used to implement…
▽ More
Universal compression algorithms have been studied in the past for sequential change detection, where they have been used to estimate the post-change distribution in the modified version of the Cumulative Sum (CUSUM) Test. In this paper, we introduce a modified CUSUM test where the pre-change distribution is also unknown and an empirical version of the pre-change distribution is used to implement the algorithm. We present a study of various characteristics of this modified CUSUM Test and then prove its asymptotic optimality.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts
Authors:
Harsh Agarwal,
Keshav Bansal,
Abhinav Joshi,
Ashutosh Modi
Abstract:
Emotion Recognition in Conversations (ERC) is an important and active research area. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this obse…
▽ More
Emotion Recognition in Conversations (ERC) is an important and active research area. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component that improves performance. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications). We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing models for ERC on MOSEI and IEMOCAP datasets.
△ Less
Submitted 7 November, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
BAKSA at SemEval-2020 Task 9: Bolstering CNN with Self-Attention for Sentiment Analysis of Code Mixed Text
Authors:
Ayush Kumar,
Harsh Agarwal,
Keshav Bansal,
Ashutosh Modi
Abstract:
Sentiment Analysis of code-mixed text has diversified applications in opinion mining ranging from tagging user reviews to identifying social or political sentiments of a sub-population. In this paper, we present an ensemble architecture of convolutional neural net (CNN) and self-attention based LSTM for sentiment analysis of code-mixed tweets. While the CNN component helps in the classification of…
▽ More
Sentiment Analysis of code-mixed text has diversified applications in opinion mining ranging from tagging user reviews to identifying social or political sentiments of a sub-population. In this paper, we present an ensemble architecture of convolutional neural net (CNN) and self-attention based LSTM for sentiment analysis of code-mixed tweets. While the CNN component helps in the classification of positive and negative tweets, the self-attention based LSTM, helps in the classification of neutral tweets, because of its ability to identify correct sentiment among multiple sentiment bearing units. We achieved F1 scores of 0.707 (ranked 5th) and 0.725 (ranked 13th) on Hindi-English (Hinglish) and Spanish-English (Spanglish) datasets, respectively. The submissions for Hinglish and Spanglish tasks were made under the usernames ayushk and harsh_6 respectively.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Mathematical Reasoning via Self-supervised Skip-tree Training
Authors:
Markus N. Rabe,
Dennis Lee,
Kshitij Bansal,
Christian Szegedy
Abstract:
We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal mathematics, we propose a novel skip-t…
▽ More
We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal mathematics, we propose a novel skip-tree task. We find that models trained on the skip-tree task show surprisingly strong mathematical reasoning abilities, and outperform models trained on standard skip-sequence tasks. We also analyze the models' ability to formulate new conjectures by measuring how often the predictions are provable and useful in other proofs.
△ Less
Submitted 12 August, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Reducing Commutativity Verification to Reachability with Differencing Abstractions
Authors:
Eric Koskinen,
Kshitij Bansal
Abstract:
Commutativity of data structure methods is of ongoing interest, with roots in the database community. In recent years commutativity has been shown to be a key ingredient to enabling multicore concurrency in contexts such as parallelizing compilers, transactional memory, speculative execution and, more broadly, software scalability. Despite this interest, it remains an open question as to how a dat…
▽ More
Commutativity of data structure methods is of ongoing interest, with roots in the database community. In recent years commutativity has been shown to be a key ingredient to enabling multicore concurrency in contexts such as parallelizing compilers, transactional memory, speculative execution and, more broadly, software scalability. Despite this interest, it remains an open question as to how a data structure's commutativity specification can be verified automatically from its implementation.
In this paper, we describe techniques to automatically prove the correctness of method commutativity conditions from data structure implementations. We introduce a new kind of abstraction that characterizes the ways in which the effects of two methods differ depending on the order in which the methods are applied, and abstracts away effects of methods that would be the same regardless of the order. We then describe a novel algorithm that reduces the problem to reachability, so that off-the-shelf program analysis tools can perform the reasoning necessary for proving commutativity. Finally, we describe a proof-of-concept implementation and experimental results, showing that our tool can verify commutativity of data structures such as a memory cell, counter, two-place Set, array-based stack, queue, and a rudimentary hash table. We conclude with a discussion of what makes a data structure's commutativity provable with today's tools and what needs to be done to prove more in the future.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
On-Device User Intent Prediction for Context and Sequence Aware Recommendation
Authors:
Benu Madhab Changmai,
Divija Nagaraju,
Debi Prasanna Mohanty,
Kriti Singh,
Kunal Bansal,
Sukumar Moharana
Abstract:
The pursuit of improved accuracy in recommender systems has led to the incorporation of user context. Context-aware recommender systems typically handle large amounts of data which must be uploaded and stored on the cloud, putting the user's personal information at risk. While there have been previous studies on privacy-sensitive and context-aware recommender systems, there has not been a full-fle…
▽ More
The pursuit of improved accuracy in recommender systems has led to the incorporation of user context. Context-aware recommender systems typically handle large amounts of data which must be uploaded and stored on the cloud, putting the user's personal information at risk. While there have been previous studies on privacy-sensitive and context-aware recommender systems, there has not been a full-fledged system deployed in an isolated mobile environment. We propose a secure and efficient on-device mechanism to predict a user's next intention. The knowledge of the user's real-time intention can help recommender systems to provide more relevant recommendations at the right moment. Our proposed algorithm is both context and sequence aware. We embed user intentions as weighted nodes in an n-dimensional vector space where each dimension represents a specific user context factor. Through a neighborhood searching method followed by a sequence matching algorithm, we search for the most relevant node to make the prediction. An evaluation of our methodology was done on a diverse real-world dataset where it was able to address practical scenarios like behavior drifts and sequential patterns efficiently and robustly. Our system also outperformed most of the state-of-the-art methods when evaluated for a similar problem domain on standard datasets.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Mathematical Reasoning in Latent Space
Authors:
Dennis Lee,
Christian Szegedy,
Markus N. Rabe,
Sarah M. Loos,
Kshitij Bansal
Abstract:
We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. The set of rewrites (i.e. transformations) that can be successfully performed on a statement represents essential semantic features of the statement. We can compress this information by embedding the formula in a vector space, such that…
▽ More
We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. The set of rewrites (i.e. transformations) that can be successfully performed on a statement represents essential semantic features of the statement. We can compress this information by embedding the formula in a vector space, such that the vector associated with a statement can be used to predict whether a statement can be rewritten by other theorems. Predicting the embedding of a formula generated by some rewrite rule is naturally viewed as approximate reasoning in the latent space. In order to measure the effectiveness of this reasoning, we perform approximate deduction sequences in the latent space and use the resulting embedding to inform the semantic features of the corresponding formal statement (which is obtained by performing the corresponding rewrite sequence using real formulas). Our experiments show that graph neural networks can make non-trivial predictions about the rewrite-success of statements, even when they propagate predicted latent representations for several steps. Since our corpus of mathematical formulas includes a wide variety of mathematical disciplines, this experiment is a strong indicator for the feasibility of deduction in latent space in general.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Learning to Reason in Large Theories without Imitation
Authors:
Kshitij Bansal,
Christian Szegedy,
Markus N. Rabe,
Sarah M. Loos,
Viktor Toman
Abstract:
In this paper, we demonstrate how to do automated theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We suggest an exploration mechanism that mixes in additional premises selected by a tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. This helps with exploring and learning whic…
▽ More
In this paper, we demonstrate how to do automated theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We suggest an exploration mechanism that mixes in additional premises selected by a tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. This helps with exploring and learning which premises are relevant for proving a new theorem. Our experiments show that the theorem prover trained with this exploration mechanism outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.
△ Less
Submitted 11 June, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Graph Representations for Higher-Order Logic and Theorem Proving
Authors:
Aditya Paliwal,
Sarah Loos,
Markus Rabe,
Kshitij Bansal,
Christian Szegedy
Abstract:
This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning. Higher-order logic is highly expressive and, eve…
▽ More
This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning. Higher-order logic is highly expressive and, even though it is well-structured with a clearly defined grammar and semantics, there still remains no well-established method to convert formulas into graph-based representations. In this paper, we consider several graphical representations of higher-order logic and evaluate them against the HOList benchmark for higher-order theorem proving.
△ Less
Submitted 12 September, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Hierarchical Policy Learning is Sensitive to Goal Space Design
Authors:
Zach Dwiel,
Madhavun Candadai,
Mariano Phielipp,
Arjun K. Bansal
Abstract:
Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of…
▽ More
Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of optimal goal spaces still remain unknown and consequently there is no method yet to learn optimal goal spaces. Motivated by this, we systematically analyze how various modifications to the ground-truth goal-space affect learning in hierarchical models with the aim of identifying important properties of optimal goal spaces. Our results show that, while rotation of ground-truth goal spaces and noise had no effect, having additional unnecessary factors significantly impaired learning in hierarchical models.
△ Less
Submitted 25 June, 2019; v1 submitted 4 May, 2019;
originally announced May 2019.
-
HOList: An Environment for Machine Learning of Higher-Order Theorem Proving
Authors:
Kshitij Bansal,
Sarah M. Loos,
Markus N. Rabe,
Christian Szegedy,
Stewart Wilcox
Abstract:
We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement l…
▽ More
We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment. HOL Light comes with a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture, from which we derive a challenging benchmark for automated reasoning. We also present a deep reinforcement learning driven automated theorem prover, DeepHOL, with strong initial results on this benchmark.
△ Less
Submitted 1 November, 2019; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Automated Performance Assessment in Transoesophageal Echocardiography with Convolutional Neural Networks
Authors:
Evangelos B. Mazomenos,
Kamakshi Bansal,
Bruce Martin,
Andrew Smith,
Susan Wright,
Danail Stoyanov
Abstract:
Transoesophageal echocardiography (TEE) is a valuable diagnostic and monitoring imaging modality. Proper image acquisition is essential for diagnosis, yet current assessment techniques are solely based on manual expert review. This paper presents a supervised deep learn ing framework for automatically evaluating and grading the quality of TEE images. To obtain the necessary dataset, 38 participant…
▽ More
Transoesophageal echocardiography (TEE) is a valuable diagnostic and monitoring imaging modality. Proper image acquisition is essential for diagnosis, yet current assessment techniques are solely based on manual expert review. This paper presents a supervised deep learn ing framework for automatically evaluating and grading the quality of TEE images. To obtain the necessary dataset, 38 participants of varied experience performed TEE exams with a high-fidelity virtual reality (VR) platform. Two Convolutional Neural Network (CNN) architectures, AlexNet and VGG, structured to perform regression, were finetuned and validated on manually graded images from three evaluators. Two different scoring strategies, a criteria-based percentage and an overall general impression, were used. The developed CNN models estimate the average score with a root mean square accuracy ranging between 84%-93%, indicating the ability to replicate expert valuation. Proposed strategies for automated TEE assessment can have a significant impact on the training process of new TEE operators, providing direct feedback and facilitating the development of the necessary dexterous skills.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Automatic Generation of Precise and Useful Commutativity Conditions (Extended Version)
Authors:
Kshitij Bansal,
Eric Koskinen,
Omer Tripp
Abstract:
Reasoning about commutativity between data-structure operations is an important problem with applications including parallelizing compilers, optimistic parallelization and, more recently, Ethereum smart contracts. There have been research results on automatic generation of commutativity conditions, yet we are unaware of any fully automated technique to generate conditions that are both sound and e…
▽ More
Reasoning about commutativity between data-structure operations is an important problem with applications including parallelizing compilers, optimistic parallelization and, more recently, Ethereum smart contracts. There have been research results on automatic generation of commutativity conditions, yet we are unaware of any fully automated technique to generate conditions that are both sound and effective.
We have designed such a technique, driven by an algorithm that iteratively refines a conservative approximation of the commutativity (and non-commutativity) condition for a pair of methods into an increasingly precise version. The algorithm terminates if/when the entire state space has been considered, and can be aborted at any time to obtain a partial yet sound commutativity condition. We have generalized our work to left-/right-movers and proved relative completeness. We describe aspects of our technique that lead to useful commutativity conditions, including how predicates are selected during refinement and heuristics that impact the output shape of the condition.
We have implemented our technique in a prototype open-source tool Servois. Our algorithm produces quantifier-free queries that are dispatched to a back-end SMT solver. We evaluate Servois through two case studies: (i) We synthesize commutativity conditions for a range of data structures including Set, HashTable, Accumulator, Counter, and Stack. (ii) We consider an Ethereum smart contract called BlockKing, and show that Servois can detect serious concurrency-related vulnerabilities and guide developers to construct robust and efficient implementations.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
Authors:
Scott Cyphers,
Arjun K. Bansal,
Anahita Bhiwandiwalla,
Jayaram Bobba,
Matthew Brookhart,
Avijit Chakraborty,
Will Constable,
Christian Convey,
Leona Cook,
Omar Kanawi,
Robert Kimball,
Jason Knight,
Nikolay Korovaiko,
Varun Kumar,
Yixing Lao,
Christopher R. Lishka,
Jaikrishnan Menon,
Jennifer Myers,
Sandeep Aswath Narayana,
Adam Procter,
Tristan J. Webb
Abstract:
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the tr…
▽ More
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations).
△ Less
Submitted 29 January, 2018; v1 submitted 24 January, 2018;
originally announced January 2018.
-
Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
Authors:
Urs Köster,
Tristan J. Webb,
Xin Wang,
Marcel Nassar,
Arjun K. Bansal,
William H. Constable,
Oğuz H. Elibol,
Scott Gray,
Stewart Hall,
Luke Hornof,
Amir Khosrowshahi,
Carey Kloss,
Ruby J. Pai,
Naveen Rao
Abstract:
Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the F…
▽ More
Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference.
△ Less
Submitted 2 December, 2017; v1 submitted 6 November, 2017;
originally announced November 2017.
-
Reasoning with Finite Sets and Cardinality Constraints in SMT
Authors:
Kshitij Bansal,
Clark Barrett,
Andrew Reynolds,
Cesare Tinelli
Abstract:
We consider the problem of deciding the satisfiability of quantifier-free formulas in the theory of finite sets with cardinality constraints. Sets are a common high-level data structure used in programming; thus, such a theory is useful for modeling program constructs directly. More importantly, sets are a basic construct of mathematics and thus natural to use when formalizing the properties of co…
▽ More
We consider the problem of deciding the satisfiability of quantifier-free formulas in the theory of finite sets with cardinality constraints. Sets are a common high-level data structure used in programming; thus, such a theory is useful for modeling program constructs directly. More importantly, sets are a basic construct of mathematics and thus natural to use when formalizing the properties of computational systems. We develop a calculus describing a modular combination of a procedure for reasoning about membership constraints with a procedure for reasoning about cardinality constraints. Cardinality reasoning involves tracking how different sets overlap. For efficiency, we avoid considering Venn regions directly, as done in previous work. Instead, we develop a novel technique wherein potentially overlap** regions are considered incrementally as needed, using a graph to track the interaction among the different regions. The calculus has been designed to facilitate its implementation within SMT solvers based on the DPLL($T$) architecture. Our experimental results demonstrate that the new techniques are competitive with previous techniques and can scale much better on certain classes of problems.
△ Less
Submitted 31 October, 2018; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Unifying Geometric Features and Facial Action Units for Improved Performance of Facial Expression Analysis
Authors:
Mehdi Ghayoumi,
Arvind K Bansal
Abstract:
Previous approaches to model and analyze facial expression analysis use three different techniques: facial action units, geometric features and graph based modelling. However, previous approaches have treated these technique separately. There is an interrelationship between these techniques. The facial expression analysis is significantly improved by utilizing these map**s between major geometri…
▽ More
Previous approaches to model and analyze facial expression analysis use three different techniques: facial action units, geometric features and graph based modelling. However, previous approaches have treated these technique separately. There is an interrelationship between these techniques. The facial expression analysis is significantly improved by utilizing these map**s between major geometric features involved in facial expressions and the subset of facial action units whose presence or absence are unique to a facial expression. This paper combines dimension reduction techniques and image classification with search space pruning achieved by this unique subset of facial action units to significantly prune the search space. The performance results on the publicly facial expression database shows an improvement in performance by 70% over time while maintaining the emotion recognition correctness.
△ Less
Submitted 2 June, 2016;
originally announced June 2016.
-
On Deciding Local Theory Extensions via E-matching
Authors:
Kshitij Bansal,
Andrew Reynolds,
Tim King,
Clark Barrett,
Thomas Wies
Abstract:
Satisfiability Modulo Theories (SMT) solvers incorporate decision procedures for theories of data types that commonly occur in software. This makes them important tools for automating verification problems. A limitation frequently encountered is that verification problems are often not fully expressible in the theories supported natively by the solvers. Many solvers allow the specification of appl…
▽ More
Satisfiability Modulo Theories (SMT) solvers incorporate decision procedures for theories of data types that commonly occur in software. This makes them important tools for automating verification problems. A limitation frequently encountered is that verification problems are often not fully expressible in the theories supported natively by the solvers. Many solvers allow the specification of application-specific theories as quantified axioms, but their handling is incomplete outside of narrow special cases.
In this work, we show how SMT solvers can be used to obtain complete decision procedures for local theory extensions, an important class of theories that are decidable using finite instantiation of axioms. We present an algorithm that uses E-matching to generate instances incrementally during the search, significantly reducing the number of generated instances compared to eager instantiation strategies. We have used two SMT solvers to implement this algorithm and conducted an extensive experimental evaluation on benchmarks derived from verification conditions for heap-manipulating programs. We believe that our results are of interest to both the users of SMT solvers as well as their developers.
△ Less
Submitted 27 August, 2015;
originally announced August 2015.
-
Discovering Hidden Factors of Variation in Deep Networks
Authors:
Brian Cheung,
Jesse A. Livezey,
Arjun K. Bansal,
Bruno A. Olshausen
Abstract:
Deep learning has enjoyed a great deal of success because of its ability to learn useful features for tasks such as classification. But there has been less exploration in learning the factors of variation apart from the classification signal. By augmenting autoencoders with simple regularization terms during training, we demonstrate that standard deep architectures can discover and explicitly repr…
▽ More
Deep learning has enjoyed a great deal of success because of its ability to learn useful features for tasks such as classification. But there has been less exploration in learning the factors of variation apart from the classification signal. By augmenting autoencoders with simple regularization terms during training, we demonstrate that standard deep architectures can discover and explicitly represent factors of variation beyond those relevant for categorization. We introduce a cross-covariance penalty (XCov) as a method to disentangle factors like handwriting style for digits and subject identity in faces. We demonstrate this on the MNIST handwritten digit database, the Toronto Faces Database (TFD) and the Multi-PIE dataset by generating manipulated instances of the data. Furthermore, we demonstrate these deep networks can extrapolate `hidden' variation in the supervised signal.
△ Less
Submitted 17 June, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm
Authors:
Siddharth Jain,
R. K. Bansal
Abstract:
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) \emph{zero entropy} sources that are characterized by strong long-term memory, and ii) the processes with weak memory as descri…
▽ More
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) \emph{zero entropy} sources that are characterized by strong long-term memory, and ii) the processes with weak memory as described through various mixing conditions.
For zero entropy sources, a general statement on match length is obtained. It is used in the proof of almost sure optimality of Fixed Shift Variant of Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example of stationary and ergodic processes generated by an irrational rotation we establish that for a window of size $n_w$, a compression ratio given by $O(\frac{\log n_w}{{n_w}^a})$ where $a$ depends on $n_w$ and approaches 1 as $n_w \rightarrow \infty$, is obtained under the application of FSLZ and SWLZ algorithms. Also, we give a general expression for the compression ratio for a class of stationary and ergodic processes with zero entropy.
Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of the \emph{normalized} version of recurrence times and establish the \emph{large deviation property} (LDP) for a class of mixing processes. Also, an estimator of entropy based on recurrence times is proposed for which large deviation principle is proved for sources satisfying similar mixing conditions.
△ Less
Submitted 6 November, 2014; v1 submitted 5 November, 2014;
originally announced November 2014.
-
K-Algorithm A Modified Technique for Noise Removal in Handwritten Documents
Authors:
Kanika Bansal,
Rajiv Kumar
Abstract:
OCR has been an active research area since last few decades. OCR performs the recognition of the text in the scanned document image and converts it into editable form. The OCR process can have several stages like pre-processing, segmentation, recognition and post processing. The pre-processing stage is a crucial stage for the success of OCR, which mainly deals with noise removal. In the present pa…
▽ More
OCR has been an active research area since last few decades. OCR performs the recognition of the text in the scanned document image and converts it into editable form. The OCR process can have several stages like pre-processing, segmentation, recognition and post processing. The pre-processing stage is a crucial stage for the success of OCR, which mainly deals with noise removal. In the present paper, a modified technique for noise removal named as K-Algorithm has been proposed, which has two stages as filtering and binarization. The proposed technique shows improvised results in comparison to median filtering technique.
△ Less
Submitted 6 June, 2013;
originally announced June 2013.
-
On Match Lengths and the Asymptotic Behavior of Sliding Window Lempel-Ziv Algorithm for Zero Entropy Sequences
Authors:
Siddharth Jain,
Rakesh Kumar Bansal
Abstract:
The Sliding Window Lempel-Ziv (SWLZ) algorithm has been studied from various perspectives in information theory literature. In this paper, we provide a general law which defines the asymptotics of match length for stationary and ergodic zero entropy processes. Moreover, we use this law to choose the match length $L_o$ in the almost sure optimality proof of Fixed Shift Variant of Lempel-Ziv (FSLZ)…
▽ More
The Sliding Window Lempel-Ziv (SWLZ) algorithm has been studied from various perspectives in information theory literature. In this paper, we provide a general law which defines the asymptotics of match length for stationary and ergodic zero entropy processes. Moreover, we use this law to choose the match length $L_o$ in the almost sure optimality proof of Fixed Shift Variant of Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. First, through an example of stationary and ergodic processes generated by irrational rotation we establish that for a window size of $n_w$ a compression ratio given by $O(\frac{\log n_w}{{n_w}^a})$ where $a$ is arbitrarily close to 1 and $0 < a < 1$, is obtained under the application of FSLZ and SWLZ algorithms. Further, we give a general expression for the compression ratio for a class of stationary and totally ergodic processes with zero entropy.
△ Less
Submitted 17 May, 2013; v1 submitted 5 March, 2013;
originally announced March 2013.
-
On Large Deviation Property of Recurrence Times
Authors:
Siddharth Jain,
Rakesh Kumar Bansal
Abstract:
We extend the study by Ornstein and Weiss on the asymptotic behavior of the normalized version of recurrence times and establish the large deviation property for a certain class of mixing processes. Further, an estimator for entropy based on recurrence times is proposed for which large deviation behavior is proved for stationary and ergodic sources satisfying similar mixing conditions.
We extend the study by Ornstein and Weiss on the asymptotic behavior of the normalized version of recurrence times and establish the large deviation property for a certain class of mixing processes. Further, an estimator for entropy based on recurrence times is proposed for which large deviation behavior is proved for stationary and ergodic sources satisfying similar mixing conditions.
△ Less
Submitted 17 May, 2013; v1 submitted 5 March, 2013;
originally announced March 2013.
-
A Note on the Complexity of Model-Checking Bounded Multi-Pushdown Systems
Authors:
Kshitij Bansal,
Stéphane Demri
Abstract:
In this note, we provide complexity characterizations of model checking multi-pushdown systems. Multi-pushdown systems model recursive concurrent programs in which any sequential process has a finite control. We consider three standard notions for boundedness: context boundedness, phase boundedness and stack ordering. The logical formalism is a linear-time temporal logic extending well-known logic…
▽ More
In this note, we provide complexity characterizations of model checking multi-pushdown systems. Multi-pushdown systems model recursive concurrent programs in which any sequential process has a finite control. We consider three standard notions for boundedness: context boundedness, phase boundedness and stack ordering. The logical formalism is a linear-time temporal logic extending well-known logic CaRet but dedicated to multi-pushdown systems in which abstract operators (related to calls and returns) such as those for next-time and until are parameterized by stacks. We show that the problem is EXPTIME-complete for context-bounded runs and unary encoding of the number of context switches; we also prove that the problem is 2EXPTIME-complete for phase-bounded runs and unary encoding of the number of phase switches. In both cases, the value k is given as an input (whence it is not a constant of the model-checking problem), which makes a substantial difference in the complexity. In certain cases, our results improve previous complexity results.
△ Less
Submitted 6 December, 2012;
originally announced December 2012.
-
Geographic Information Systems in Evaluation and Visualization of Construction Schedule
Authors:
V. K. Bansal,
Mahesh Pal
Abstract:
Commercially available scheduling tools such as Primavera and Microsoft Project fail to provide information pertaining to the spatial aspects of construction project. A methodology using geographical information systems (GIS) is developed to represent spatial aspects of the construction progress graphically by synchronizing it with construction schedule. The spatial aspects are depicted by 3D mo…
▽ More
Commercially available scheduling tools such as Primavera and Microsoft Project fail to provide information pertaining to the spatial aspects of construction project. A methodology using geographical information systems (GIS) is developed to represent spatial aspects of the construction progress graphically by synchronizing it with construction schedule. The spatial aspects are depicted by 3D model developed in AutoCAD and construction schedule is generated using Microsoft Excel. Spatial and scheduling information are linked together into the GIS environment (ArcGIS). The GIS-based system developed in this study may help in better understanding the schedule along with its spatial aspects.
△ Less
Submitted 25 March, 2008;
originally announced March 2008.
-
A Coding Theorem Characterizing Renyi's Entropy through Variable-to-Fixed Length Codes
Authors:
Vaneet Aggarwal,
R. K. Bansal
Abstract:
This paper has been withdrawn
This paper has been withdrawn
△ Less
Submitted 28 October, 2006; v1 submitted 8 July, 2006;
originally announced July 2006.