Skip to main content

Showing 1–30 of 30 results for author: Chintala, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07870  [pdf, other

    cs.RO

    OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation

    Authors: Aadhithya Iyer, Zhuoran Peng, Yinlong Dai, Irmak Guzey, Siddhant Haldar, Soumith Chintala, Lerrel Pinto

    Abstract: Open-sourced, user-friendly tools form the bedrock of scientific advancement across disciplines. The widespread adoption of data-driven learning has led to remarkable progress in multi-fingered dexterity, bimanual manipulation, and applications ranging from logistics to home robotics. However, existing data collection platforms are often proprietary, costly, or tailored to specific robotic morphol… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  2. arXiv:2402.13957  [pdf

    cs.SD cs.LG eess.AS

    Advancing Audio Fingerprinting Accuracy Addressing Background Noise and Distortion Challenges

    Authors: Navin Kamuni, Sathishkumar Chintala, Naveen Kunchakuri, Jyothi Swaroop Arlagadda Narasimharaju, Venkat Kumar

    Abstract: Audio fingerprinting, exemplified by pioneers like Shazam, has transformed digital audio recognition. However, existing systems struggle with accuracy in challenging conditions, limiting broad applicability. This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy. Built on the Dejavu Project's foundations, the study emphasizes real-world scenario simulatio… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Journal ref: 2024 IEEE 18th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2024, pp. 341-345

  3. arXiv:2401.18040  [pdf

    cs.CL cs.AI

    Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability

    Authors: Navin Kamuni, Hardik Shah, Sathishkumar Chintala, Naveen Kunchakuri, Sujatha Alla Old Dominion

    Abstract: End-to-end multi-task dialogue systems are usually designed with separate modules for the dialogue pipeline. Among these, the policy module is essential for deciding what to do in response to user input. This policy is trained by reinforcement learning algorithms by taking advantage of an environment in which an agent receives feedback in the form of a reward signal. The current dialogue systems,… ▽ More

    Submitted 25 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: 6 pages, 1 figure, 18th IEEE International Conference on Semantic Computing

  4. arXiv:2311.16098  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    On Bringing Robots Home

    Authors: Nur Muhammad Mahi Shafiullah, Anant Rai, Haritheja Etukuru, Yiqian Liu, Ishan Misra, Soumith Chintala, Lerrel Pinto

    Abstract: Throughout history, we have successfully integrated various machines into our homes. Dishwashers, laundry machines, stand mixers, and robot vacuums are a few recent examples. However, these machines excel at performing only a single task effectively. The concept of a "generalist machine" in homes - a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective - h… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project website and videos are available at https://dobb-e.com, technical documentation for getting started is available at https://docs.dobb-e.com, and code is released at https://github.com/notmahi/dobb-e

  5. arXiv:2309.12300  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    See to Touch: Learning Tactile Dexterity through Visual Incentives

    Authors: Irmak Guzey, Yinlong Dai, Ben Evans, Soumith Chintala, Lerrel Pinto

    Abstract: Equip** multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation f… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  6. arXiv:2303.12076  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

    Authors: Irmak Guzey, Ben Evans, Soumith Chintala, Lerrel Pinto

    Abstract: Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Video and code can be accessed here: https://tactile-dexterity.github.io/

  7. arXiv:2212.00922  [pdf, other

    cs.RO cs.CV cs.LG

    Navigating to Objects in the Real World

    Authors: Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

    Abstract: Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches rea… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 39 pages, 19 figures and tables, submitted to Science Robotics

  8. arXiv:2210.06463  [pdf, other

    cs.RO cs.AI cs.CV cs.HC cs.LG

    Holo-Dex: Teaching Dexterity with Immersive Mixed Reality

    Authors: Sridhar Pandian Arunachalam, Irmak Güzey, Soumith Chintala, Lerrel Pinto

    Abstract: A fundamental challenge in teaching robots is to provide an effective interface for human teachers to demonstrate useful skills to a robot. This challenge is exacerbated in dexterous manipulation, where teaching high-dimensional, contact-rich behaviors often require esoteric teleoperation tools. In this work, we present Holo-Dex, a framework for dexterous manipulation that places a teacher in an i… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Data, code and videos are available at https://holo-dex.github.io

  9. arXiv:2210.05663  [pdf, other

    cs.RO cs.CV

    CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

    Authors: Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam

    Abstract: We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a map** from spatial locations to semantic embedding vectors. Importantly, we show that this map** can be trained with supervision coming only from web-image and web-text trained models such… ▽ More

    Submitted 22 May, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Code, video, and interactive demonstrations available at https://mahis.life/clip-fields. Accepted for publication at Robotics: Science and Systems 2023 in Daegu, Korea

  10. arXiv:2110.15018  [pdf, other

    eess.AS cs.SD

    TorchAudio: Building Blocks for Audio and Speech Processing

    Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi

    Abstract: This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif… ▽ More

    Submitted 16 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  11. arXiv:2101.10384  [pdf, other

    cs.RO cs.AI

    droidlet: modular, heterogenous, multi-modal agents

    Authors: Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

    Abstract: In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale. But most of these systems are: (a) isolated (perception, speech, or language only); (b) trained on static datasets. On the other hand, in the field of robotics, large-scale learning has always been difficult. Supervision is hard to gather and real world physical interacti… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

  12. arXiv:2006.15704  [pdf, other

    cs.DC cs.LG

    PyTorch Distributed: Experiences on Accelerating Data Parallel Training

    Authors: Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala

    Abstract: This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. D… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: To appear in VLDB 2020

  13. arXiv:1912.01703  [pdf, other

    cs.LG cs.MS stat.ML

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

    Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 12 pages, 3 figures, NeurIPS 2019

  14. arXiv:1910.01727  [pdf, other

    cs.LG stat.ML

    Generalized Inner Loop Meta-Learning

    Authors: Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala

    Abstract: Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem. In this paper, we give a formalization of this shared pattern, which we call GIMLI, prove its general requirements, and derive a general-purpose algorithm for implementing similar approaches. Based on this… ▽ More

    Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 17 pages, 3 figures, 1 algorithm

  15. arXiv:1702.04770  [pdf, other

    cs.CL cs.LG cs.NE

    Training Language Models Using Target-Propagation

    Authors: Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

    Abstract: While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experim… ▽ More

    Submitted 15 February, 2017; originally announced February 2017.

  16. arXiv:1701.08435  [pdf, other

    cs.LG cs.CV

    Transformation-Based Models of Video Sequences

    Authors: Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

    Abstract: In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison… ▽ More

    Submitted 6 February, 2023; v1 submitted 29 January, 2017; originally announced January 2017.

  17. arXiv:1701.07875  [pdf, other

    stat.ML cs.LG

    Wasserstein GAN

    Authors: Martin Arjovsky, Soumith Chintala, Léon Bottou

    Abstract: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical wor… ▽ More

    Submitted 6 December, 2017; v1 submitted 26 January, 2017; originally announced January 2017.

  18. arXiv:1611.08408  [pdf, other

    cs.CV

    Semantic Segmentation using Adversarial Networks

    Authors: Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek

    Abstract: Adversarial training has been shown to produce state of the art results for generative image modeling. In this paper we propose an adversarial training approach to train semantic segmentation models. We train a convolutional semantic segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network. The… ▽ More

    Submitted 25 November, 2016; originally announced November 2016.

    Journal ref: NIPS Workshop on Adversarial Training, Dec 2016, Barcelona, Spain

  19. arXiv:1611.00625  [pdf, other

    cs.LG cs.AI

    TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games

    Authors: Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier

    Abstract: We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.

    Submitted 3 November, 2016; v1 submitted 1 November, 2016; originally announced November 2016.

    ACM Class: I.2.1

  20. arXiv:1609.02993  [pdf, other

    cs.AI cs.LG

    Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

    Authors: Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala

    Abstract: We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no o… ▽ More

    Submitted 26 November, 2016; v1 submitted 9 September, 2016; originally announced September 2016.

    Comments: 18 pages, 1 figure (2 plots), 2 tables

    ACM Class: I.2.1; I.2.6

  21. arXiv:1605.08179  [pdf, other

    stat.ML cs.CV

    Discovering Causal Signals in Images

    Authors: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

    Abstract: This paper establishes the existence of observable footprints that reveal the "causal dispositions" of the object categories appearing in collections of images. We achieve this goal in two steps. First, we take a learning approach to observational causal discovery, and build a classifier that achieves state-of-the-art performance on finding the causal direction between pairs of random variables, g… ▽ More

    Submitted 31 October, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

  22. arXiv:1604.02135  [pdf, other

    cs.CV

    A MultiPath Network for Object Detection

    Authors: Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár

    Abstract: The recent COCO object detection dataset presents several new challenges for object detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple… ▽ More

    Submitted 8 August, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

  23. arXiv:1511.07401  [pdf, other

    cs.LG cs.AI cs.NE

    MazeBase: A Sandbox for Learning from Games

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

    Abstract: This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these… ▽ More

    Submitted 7 January, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

  24. arXiv:1511.06434  [pdf, other

    cs.LG cs.CV

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

    Authors: Alec Radford, Luke Metz, Soumith Chintala

    Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversa… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under review as a conference paper at ICLR 2016

  25. arXiv:1506.08230  [pdf, other

    cs.LG cs.NE

    Convolutional networks and learning invariant to homogeneous multiplicative scalings

    Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

    Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More

    Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures, 4 tables

    Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

  26. arXiv:1506.05751  [pdf, other

    cs.CV

    Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

    Authors: Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    Abstract: In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow e… ▽ More

    Submitted 18 June, 2015; originally announced June 2015.

  27. arXiv:1503.03438  [pdf, ps, other

    cs.LG cs.NE stat.ML

    A mathematical motivation for complex-valued convolutional networks

    Authors: Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert

    Abstract: A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-v… ▽ More

    Submitted 12 December, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: 11 pages, 3 figures; this is the retitled version submitted to the journal, "Neural Computation"

    Journal ref: Neural Computation, 28 (5): 815-825, May 2016

  28. arXiv:1412.7580  [pdf, ps, other

    cs.LG cs.DC cs.NE

    Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

    Authors: Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, Yann LeCun

    Abstract: We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution implementations: one based on NVIDIA's cuFFT library, and another based on a Facebook authored FFT implementation, fbfft, that provides significant speedups over cuFFT (over 1.5x) for whole CNNs. Both of t… ▽ More

    Submitted 10 April, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

    Comments: Camera ready for ICLR2015

  29. arXiv:1212.0142  [pdf, ps, other

    cs.CV cs.LG

    Pedestrian Detection with Unsupervised Multi-Stage Feature Learning

    Authors: Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann LeCun

    Abstract: Pedestrian detection is a problem of considerable practical interest. Adding to the list of successful applications of deep learning methods to vision, we report state-of-the-art and competitive results on all major pedestrian datasets with a convolutional network model. The model uses a few new twists, such as multi-stage features, connections that skip layers to integrate global shape informatio… ▽ More

    Submitted 2 April, 2013; v1 submitted 1 December, 2012; originally announced December 2012.

    Comments: 12 pages

  30. arXiv:1204.3968  [pdf, ps, other

    cs.CV cs.LG cs.NE

    Convolutional Neural Networks Applied to House Numbers Digit Classification

    Authors: Pierre Sermanet, Soumith Chintala, Yann LeCun

    Abstract: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). ConvNets are hierarchical feature learning neural networks whose structure is biologically inspired. Unlike many popular vision approaches that are hand-designed, ConvNets can automatically learn a unique set of features optimized for a given task. We augmented the traditional ConvNet architecture by lea… ▽ More

    Submitted 17 April, 2012; originally announced April 2012.

    Comments: 4 pages, 6 figures, 2 tables