Search | arXiv e-print repository

This&That: Language-Gesture Controlled Video Generation for Robot Planning

Authors: Boyang Wang, Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman, Nima Fazeli, Jeong Joon Park

Abstract: We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communicat… ▽ More We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intents, and 3) translating visual planning into robot actions. We propose language-gesture conditioning to generate videos, which is both simpler and clearer than existing language-only methods, especially in complex and uncertain environments. We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness in addressing the above three challenges, and justifies the use of video generation as an intermediate representation for generalizable task planning and execution. Project website: https://cfeng16.github.io/this-and-that/. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2405.18377 [pdf, other]

LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models

Authors: Anthony Sarah, Sharath Nittur Sridhar, Maciej Szankin, Sairam Sundaresan

Abstract: The abilities of modern large language models (LLMs) in solving natural language processing, complex reasoning, sentiment analysis and other tasks have been extraordinary which has prompted their extensive adoption. Unfortunately, these abilities come with very high memory and computational costs which precludes the use of LLMs on most hardware platforms. To mitigate this, we propose an effective… ▽ More The abilities of modern large language models (LLMs) in solving natural language processing, complex reasoning, sentiment analysis and other tasks have been extraordinary which has prompted their extensive adoption. Unfortunately, these abilities come with very high memory and computational costs which precludes the use of LLMs on most hardware platforms. To mitigate this, we propose an effective method of finding Pareto-optimal network architectures based on LLaMA2-7B using one-shot NAS. In particular, we fine-tune LLaMA2-7B only once and then apply genetic algorithm-based search to find smaller, less computationally complex network architectures. We show that, for certain standard benchmark tasks, the pre-trained LLaMA2-7B network is unnecessarily large and complex. More specifically, we demonstrate a 1.5x reduction in model size and 1.3x speedup in throughput for certain tasks with negligible drop in accuracy. In addition to finding smaller, higher-performing network architectures, our method does so more effectively and efficiently than certain pruning or sparsification techniques. Finally, we demonstrate how quantization is complementary to our method and that the size and complexity of the networks we find can be further decreased using quantization. We believe that our work provides a way to automatically create LLMs which can be used on less expensive and more readily available hardware platforms. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2312.13301 [pdf, other]

SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture Search

Authors: Sharath Nittur Sridhar, Maciej Szankin, Fang Chen, Sairam Sundaresan, Anthony Sarah

Abstract: Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms. Popular approaches separate the training of super-networks from the search for sub-networks, often employing predictors to alleviate the computational overhead associated with search. Additi… ▽ More Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms. Popular approaches separate the training of super-networks from the search for sub-networks, often employing predictors to alleviate the computational overhead associated with search. Additionally, certain methods also incorporate the quantization policy within the search space. However, while the quantization policy search for convolutional neural networks is well studied, the extension of these methods to transformers and especially foundation models remains under-explored. In this paper, we demonstrate that by using multi-objective search algorithms paired with lightly trained predictors, we can efficiently search for both the sub-network architecture and the corresponding quantization policy and outperform their respective baselines across different performance objectives such as accuracy, model size, and latency. Specifically, we demonstrate that our approach performs well across both uni-modal (ViT and BERT) and multi-modal (BEiT-3) transformer-based architectures as well as convolutional architectures (ResNet). For certain networks, we demonstrate an improvement of up to $4.80x$ and $3.44x$ for latency and model size respectively, without degradation in accuracy compared to the fully quantized INT8 baselines. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2308.15609 [pdf, other]

InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning

Authors: Sharath Nittur Sridhar, Souvik Kundu, Sairam Sundaresan, Maciej Szankin, Anthony Sarah

Abstract: One-Shot Neural Architecture Search (NAS) algorithms often rely on training a hardware agnostic super-network for a domain specific task. Optimal sub-networks are then extracted from the trained super-network for different hardware platforms. However, training super-networks from scratch can be extremely time consuming and compute intensive especially for large models that rely on a two-stage trai… ▽ More One-Shot Neural Architecture Search (NAS) algorithms often rely on training a hardware agnostic super-network for a domain specific task. Optimal sub-networks are then extracted from the trained super-network for different hardware platforms. However, training super-networks from scratch can be extremely time consuming and compute intensive especially for large models that rely on a two-stage training process of pre-training and fine-tuning. State of the art pre-trained models are available for a wide range of tasks, but their large sizes significantly limits their applicability on various hardware platforms. We propose InstaTune, a method that leverages off-the-shelf pre-trained weights for large models and generates a super-network during the fine-tuning stage. InstaTune has multiple benefits. Firstly, since the process happens during fine-tuning, it minimizes the overall time and compute resources required for NAS. Secondly, the sub-networks extracted are optimized for the target task, unlike prior work that optimizes on the pre-training objective. Finally, InstaTune is easy to "plug and play" in existing frameworks. By using multi-objective evolutionary search algorithms along with lightly trained predictors, we find Pareto-optimal sub-networks that outperform their respective baselines across different performance objectives such as accuracy and MACs. Specifically, we demonstrate that our approach performs well across both unimodal (ViT and BERT) and multi-modal (BEiT-3) transformer based architectures. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2307.11764 [pdf, other]

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

Authors: Souvik Kundu, Sharath Nittur Sridhar, Maciej Szankin, Sairam Sundaresan

Abstract: Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However, their large model sizes often prohibit their applications on resource-constrained edge devices. Existing solutions of yielding parameter-efficient BERT models la… ▽ More Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However, their large model sizes often prohibit their applications on resource-constrained edge devices. Existing solutions of yielding parameter-efficient BERT models largely rely on compute-exhaustive training and fine-tuning. Moreover, they often rely on additional compute heavy models to mitigate the performance gap. In this paper, we present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models that can take an off-the-shelf pre-trained BERT model and yield highly parameter-efficient models for downstream tasks. In particular, we perform sensitivity analysis to rank each individual parameter tensor, that then is used to trim them accordingly during fine-tuning for a given parameter or FLOPs budget. Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, SST-2 and SQuAD, showing better performance at similar or smaller parameter budget compared to various alternatives. △ Less

Submitted 31 August, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures, 2 tables

arXiv:2304.14912 [pdf, other]

Human Activity Recognition Using Self-Supervised Representations of Wearable Data

Authors: Maximilien Burq, Niranjan Sridhar

Abstract: Automated and accurate human activity recognition (HAR) using body-worn sensors enables practical and cost efficient remote monitoring of Activity of DailyLiving (ADL), which are shown to provide clinical insights across multiple therapeutic areas. Development of accurate algorithms for human activity recognition(HAR) is hindered by the lack of large real-world labeled datasets. Furthermore, algor… ▽ More Automated and accurate human activity recognition (HAR) using body-worn sensors enables practical and cost efficient remote monitoring of Activity of DailyLiving (ADL), which are shown to provide clinical insights across multiple therapeutic areas. Development of accurate algorithms for human activity recognition(HAR) is hindered by the lack of large real-world labeled datasets. Furthermore, algorithms seldom work beyond the specific sensor on which they are prototyped, prompting debate about whether accelerometer-based HAR is even possible [Tong et al., 2020]. Here we develop a 6-class HAR model with strong performance when evaluated on real-world datasets not seen during training. Our model is based on a frozen self-supervised representation learned on a large unlabeled dataset, combined with a shallow multi-layer perceptron with temporal smoothing. The model obtains in-dataset state-of-the art performance on the Capture24 dataset ($κ= 0.86$). Out-of-distribution (OOD) performance is $κ= 0.7$, with both the representation and the perceptron models being trained on data from a different sensor. This work represents a key step towards device-agnostic HAR models, which can help contribute to increased standardization of model evaluation in the HAR field. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: this article expands work introduced in arXiv:2112.12272

arXiv:2302.03523 [pdf, other]

Sparse Mixture Once-for-all Adversarial Training for Efficient In-Situ Trade-Off Between Accuracy and Robustness of DNNs

Authors: Souvik Kundu, Sairam Sundaresan, Sharath Nittur Sridhar, Shunlin Lu, Han Tang, Peter A. Beerel

Abstract: Existing deep neural networks (DNNs) that achieve state-of-the-art (SOTA) performance on both clean and adversarially-perturbed images rely on either activation or weight conditioned convolution operations. However, such conditional learning costs additional multiply-accumulate (MAC) or addition operations, increasing inference memory and compute costs. To that end, we present a sparse mixture onc… ▽ More Existing deep neural networks (DNNs) that achieve state-of-the-art (SOTA) performance on both clean and adversarially-perturbed images rely on either activation or weight conditioned convolution operations. However, such conditional learning costs additional multiply-accumulate (MAC) or addition operations, increasing inference memory and compute costs. To that end, we present a sparse mixture once for all adversarial training (SMART), that allows a model to train once and then in-situ trade-off between accuracy and robustness, that too at a reduced compute and parameter overhead. In particular, SMART develops two expert paths, for clean and adversarial images, respectively, that are then conditionally trained via respective dedicated sets of binary sparsity masks. Extensive evaluations on multiple image classification datasets across different models show SMART to have up to 2.72x fewer non-zero parameters costing proportional reduction in compute overhead, while yielding SOTA accuracy-robustness trade-off. Additionally, we present insightful observations in designing sparse masks to successfully condition on both clean and perturbed images. △ Less

Submitted 27 December, 2022; originally announced February 2023.

Comments: 5 pages, 5 figures, 2 tables

arXiv:2205.10358 [pdf, other]

A Hardware-Aware Framework for Accelerating Neural Architecture Search Across Modalities

Authors: Daniel Cummings, Anthony Sarah, Sharath Nittur Sridhar, Maciej Szankin, Juan Pablo Munoz, Sairam Sundaresan

Abstract: Recent advances in Neural Architecture Search (NAS) such as one-shot NAS offer the ability to extract specialized hardware-aware sub-network configurations from a task-specific super-network. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still under-explored. Popula… ▽ More Recent advances in Neural Architecture Search (NAS) such as one-shot NAS offer the ability to extract specialized hardware-aware sub-network configurations from a task-specific super-network. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still under-explored. Popular methods decouple the super-network training from the sub-network search and use performance predictors to reduce the computational burden of searching on different hardware platforms. We propose a flexible search framework that automatically and efficiently finds optimal sub-networks that are optimized for different performance metrics and hardware configurations. Specifically, we show how evolutionary algorithms can be paired with lightly trained objective predictors in an iterative cycle to accelerate architecture search in a multi-objective setting for various modalities including machine translation and image classification. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2202.12954 [pdf, other]

A Hardware-Aware System for Accelerating Deep Neural Network Optimization

Authors: Anthony Sarah, Daniel Cummings, Sharath Nittur Sridhar, Sairam Sundaresan, Maciej Szankin, Tristan Webb, J. Pablo Munoz

Abstract: Recent advances in Neural Architecture Search (NAS) which extract specialized hardware-aware configurations (a.k.a. "sub-networks") from a hardware-agnostic "super-network" have become increasingly popular. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still largely… ▽ More Recent advances in Neural Architecture Search (NAS) which extract specialized hardware-aware configurations (a.k.a. "sub-networks") from a hardware-agnostic "super-network" have become increasingly popular. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still largely under-explored. For example, some recent network morphism techniques allow a super-network to be trained once and then have hardware-specific networks extracted from it as needed. These methods decouple the super-network training from the sub-network search and thus decrease the computational burden of specializing to different hardware platforms. We propose a comprehensive system that automatically and efficiently finds sub-networks from a pre-trained super-network that are optimized to different performance metrics and hardware configurations. By combining novel search tactics and algorithms with intelligent use of predictors, we significantly decrease the time needed to find optimal sub-networks from a given super-network. Further, our approach does not require the super-network to be refined for the target task a priori, thus allowing it to interface with any super-network. We demonstrate through extensive experiments that our system works seamlessly with existing state-of-the-art super-network training methods in multiple domains. Moreover, we show how novel search tactics paired with evolutionary algorithms can accelerate the search process for ResNet50, MobileNetV3 and Transformer while maintaining objective space Pareto front diversity and demonstrate an 8x faster search result than the state-of-the-art Bayesian optimization WeakNAS approach. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.12934 [pdf, other]

Accelerating Neural Architecture Exploration Across Modalities Using Genetic Algorithms

Authors: Daniel Cummings, Sharath Nittur Sridhar, Anthony Sarah, Maciej Szankin

Abstract: Neural architecture search (NAS), the study of automating the discovery of optimal deep neural network architectures for tasks in domains such as computer vision and natural language processing, has seen rapid growth in the machine learning research community. While there have been many recent advancements in NAS, there is still a significant focus on reducing the computational cost incurred when… ▽ More Neural architecture search (NAS), the study of automating the discovery of optimal deep neural network architectures for tasks in domains such as computer vision and natural language processing, has seen rapid growth in the machine learning research community. While there have been many recent advancements in NAS, there is still a significant focus on reducing the computational cost incurred when validating discovered architectures by making search more efficient. Evolutionary algorithms, specifically genetic algorithms, have a history of usage in NAS and continue to gain popularity versus other optimization approaches as a highly efficient way to explore the architecture objective space. Most NAS research efforts have centered around computer vision tasks and only recently have other modalities, such as the rapidly growing field of natural language processing, been investigated in depth. In this work, we show how genetic algorithms can be paired with lightly trained objective predictors in an iterative cycle to accelerate multi-objective architectural exploration in a way that works in the modalities of both machine translation and image classification. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.12411 [pdf, other]

TrimBERT: Tailoring BERT for Trade-offs

Authors: Sharath Nittur Sridhar, Anthony Sarah, Sairam Sundaresan

Abstract: Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and fine-tuning which limits wider adoptability. While self-attention layers have been well-studied, a strong justification for inclusion of the intermediate layers… ▽ More Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and fine-tuning which limits wider adoptability. While self-attention layers have been well-studied, a strong justification for inclusion of the intermediate layers which follow them remains missing in the literature. In this work, we show that reducing the number of intermediate layers in BERT-Base results in minimal fine-tuning accuracy loss of downstream tasks while significantly decreasing model size and training time. We further mitigate two key bottlenecks, by replacing all softmax operations in the self-attention layers with a computationally simpler alternative and removing half of all layernorm operations. This further decreases the training time while maintaining a high level of fine-tuning accuracy. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2012.11881

arXiv:2112.12272 [pdf]

Human Activity Recognition on wrist-worn accelerometers using self-supervised neural networks

Authors: Niranjan Sridhar, Lance Myers

Abstract: Measures of Activity of Daily Living (ADL) are an important indicator of overall health but difficult to measure in-clinic. Automated and accurate human activity recognition (HAR) using wrist-worn accelerometers enables practical and cost efficient remote monitoring of ADL. Key obstacles in develo** high quality HAR is the lack of large labeled datasets and the performance loss when applying mod… ▽ More Measures of Activity of Daily Living (ADL) are an important indicator of overall health but difficult to measure in-clinic. Automated and accurate human activity recognition (HAR) using wrist-worn accelerometers enables practical and cost efficient remote monitoring of ADL. Key obstacles in develo** high quality HAR is the lack of large labeled datasets and the performance loss when applying models trained on small curated datasets to the continuous stream of heterogeneous data in real-life. In this work we design a self-supervised learning paradigm to create a robust representation of accelerometer data that can generalize across devices and subjects. We demonstrate that this representation can separate activities of daily living and achieve strong HAR accuracy (on multiple benchmark datasets) using very few labels. We also propose a segmentation algorithm which can identify segments of salient activity and boost HAR accuracy on continuous real-life data. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2103.12335 [pdf, other]

Model Based Control of Commercial-Off-TheShelf (COTS) Unmanned Rotorcraft for BrickWall Construction

Authors: Nithya Sridhar, Sai Abhinay. N, Chaithanya Krishna. B, Shubhankar Shobhit, Kaushik Das, Debasish Ghose

Abstract: This work proposes a systematic framework for modelling and controller design of a Commercial-Off-The Shelf (COTS) unmanned rotorcraft using control theory and principles, for brick wall construction. With point to point navigation as the primary application, command velocities in the three axes of the Unmanned Aerial Vehicle (UAV) are considered as inputs of the system while its actual velocities… ▽ More This work proposes a systematic framework for modelling and controller design of a Commercial-Off-The Shelf (COTS) unmanned rotorcraft using control theory and principles, for brick wall construction. With point to point navigation as the primary application, command velocities in the three axes of the Unmanned Aerial Vehicle (UAV) are considered as inputs of the system while its actual velocities are system outputs. Using the sine and step response data acquired from a Hardware-in-Loop (HiL) test simulator, the considered system was modelled in individual axes with the help of the proposed framework. This model was employed for controller design where a sliding mode controller was chosen to satisfy certain requirements of the application like robustness, flexibility and accuracy. The model was validated using step response data and produced a deviation of only 9%. Finally, the controller results from field test showed fine control up to 8 cms accuracy. Sliding Mode Control (SMC) was also compared with a linear controller derived from iterative experimentations and seen to perform better than the latter in terms of accuracy, and robustness to parametric variations and wind disturbances. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: MBZIRC Symposium 2020

arXiv:2012.11881 [pdf, other]

Undivided Attention: Are Intermediate Layers Necessary for BERT?

Authors: Sharath Nittur Sridhar, Anthony Sarah

Abstract: In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based architectures have a self-attention block followed by a block of intermediate layers as the basic building component. However, a strong justification for the inclusion of… ▽ More In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based architectures have a self-attention block followed by a block of intermediate layers as the basic building component. However, a strong justification for the inclusion of these intermediate layers remains missing in the literature. In this work we investigate the importance of intermediate layers on the overall network performance of downstream tasks. We show that reducing the number of intermediate layers and modifying the architecture for BERT-BASE results in minimal loss in fine-tuning accuracy for downstream tasks while decreasing the number of parameters and training time of the model. Additionally, we use centered kernel alignment and probing linear classifiers to gain insight into our architectural modifications and justify that removal of intermediate layers has little impact on the fine-tuned accuracy. △ Less

Submitted 4 April, 2023; v1 submitted 22 December, 2020; originally announced December 2020.

arXiv:2012.09904 [pdf, other]

Attention-based Image Upsampling

Authors: Souvik Kundu, Hesham Mostafa, Sharath Nittur Sridhar, Sairam Sundaresan

Abstract: Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on image classification and object detection tasks. In this work, we show how attention mechanisms can be used to replace another canonical operation: strided tra… ▽ More Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on image classification and object detection tasks. In this work, we show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution. We term our novel attention-based operation attention-based upsampling since it increases/upsamples the spatial dimensions of the feature maps. Through experiments on single image super-resolution and joint-image upsampling tasks, we show that attention-based upsampling consistently outperforms traditional upsampling methods based on strided transposed convolution or based on adaptive filters while using fewer parameters. We show that the inherent flexibility of the attention mechanism, which allows it to use separate sources for calculating the attention coefficients and the attention targets, makes attention-based upsampling a natural choice when fusing information from multiple image modalities. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:1904.09348 [pdf, other]

Compact Scene Graphs for Layout Composition and Patch Retrieval

Authors: Subarna Tripathi, Sharath Nittur Sridhar, Sairam Sundaresan, Hanlin Tang

Abstract: Structured representations such as scene graphs serve as an efficient and compact representation that can be used for downstream rendering or retrieval tasks. However, existing efforts to generate realistic images from scene graphs perform poorly on scene composition for cluttered or complex scenes. We propose two contributions to improve the scene composition. First, we enhance the scene graph re… ▽ More Structured representations such as scene graphs serve as an efficient and compact representation that can be used for downstream rendering or retrieval tasks. However, existing efforts to generate realistic images from scene graphs perform poorly on scene composition for cluttered or complex scenes. We propose two contributions to improve the scene composition. First, we enhance the scene graph representation with heuristic-based relations, which add minimal storage overhead. Second, we use extreme points representation to supervise the learning of the scene composition network. These methods achieve significantly higher performance over existing work (69.0% vs 51.2% in relation score metric). We additionally demonstrate how scene graphs can be used to retrieve pose-constrained image patches that are semantically similar to the source query. Improving structured scene graph representations for rendering or retrieval is an important step towards realistic image generation. △ Less

Submitted 19 April, 2019; originally announced April 2019.

Comments: To appear in CVPRW 2019 (CEFRL)

arXiv:1804.06511 [pdf, other]

Fast Weight Long Short-Term Memory

Authors: T. Anderson Keller, Sharath Nittur Sridhar, Xin Wang

Abstract: Associative memory using fast weights is a short-term memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs). As recent studies introduced fast weights only to regular RNNs, it is unknown whether fast weight memory is beneficial to gated RNNs. In this work, we report a significant synergy between long short-term memory (LSTM) networks an… ▽ More Associative memory using fast weights is a short-term memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs). As recent studies introduced fast weights only to regular RNNs, it is unknown whether fast weight memory is beneficial to gated RNNs. In this work, we report a significant synergy between long short-term memory (LSTM) networks and fast weight associative memories. We show that this combination, in learning associative retrieval tasks, results in much faster training and lower test error, a performance boost most prominent at high memory task difficulties. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1610.01983 [pdf, other]

Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

Authors: Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, Ram Vasudevan

Abstract: Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images fro… ▽ More Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning's application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers. △ Less

Submitted 25 February, 2017; v1 submitted 6 October, 2016; originally announced October 2016.

Comments: Proceedings of International Conference on Robotics and Automation (ICRA) 2017, 8 pages

arXiv:1509.07543 [pdf, other]

On Optimizing Human-Machine Task Assignments

Authors: Andreas Veit, Michael Wilber, Rajan Vaish, Serge Belongie, James Davis, Vishal Anand, Anshu Aviral, Prithvijit Chakrabarty, Yash Chandak, Sidharth Chaturvedi, Chinmaya Devaraj, Ankit Dhall, Utkarsh Dwivedi, Sanket Gupte, Sharath N. Sridhar, Karthik Paga, Anuj Pahuja, Aditya Raisinghani, Ayush Sharma, Shweta Sharma, Darpana Sinha, Nisarg Thakkar, K. Bala Vignesh, Utkarsh Verma, Kanniganti Abhishek , et al. (26 additional authors not shown)

Abstract: When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease… ▽ More When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease cost under this setting. First, we show that reordering tasks presented to the human can create a significant accuracy improvement. Further, we show that greedily choosing parameters to maximize machine accuracy is sub-optimal, and joint optimization of the combined system improves performance. △ Less

Submitted 24 September, 2015; originally announced September 2015.

Comments: HCOMP 2015 Work in Progress

arXiv:1507.07838 [pdf, other]

Shifting Behaviour of Users: Towards Understanding the Fundamental Law of Social Networks

Authors: Yayati Gupta, S. R. S. Iyengar, Jaspal Singh Saini, Nidhi Sridhar

Abstract: Social Networking Sites (SNSs) are powerful marketing and communication tools. There are hundreds of SNSs that have entered and exited the market over time. The coexistence of multiple SNSs is a rarely observed phenomenon. Most coexisting SNSs either serve different purposes for its users or have cultural differences among them. The introduction of a new SNS with a better set of features can lead… ▽ More Social Networking Sites (SNSs) are powerful marketing and communication tools. There are hundreds of SNSs that have entered and exited the market over time. The coexistence of multiple SNSs is a rarely observed phenomenon. Most coexisting SNSs either serve different purposes for its users or have cultural differences among them. The introduction of a new SNS with a better set of features can lead to the demise of an existing SNS, as observed in the transition from Orkut to Facebook. The paper proposes a model for analyzing the transition of users from one SNS to another, when a new SNS is introduced in the system. The game theoretic model proposed considers two major factors in determining the success of a new SNS. The first being time that an old SNS gets to stabilise. We study whether the time that a SNS like Facebook received to monopolize its reach had a distinguishable effect. The second factor is the set of features showcased by the new SNS. The results of the model are also experimentally verified with data collected by means of a survey. △ Less

Submitted 7 November, 2015; v1 submitted 28 July, 2015; originally announced July 2015.

Showing 1–20 of 20 results for author: Sridhar, N