Skip to main content

Showing 1–11 of 11 results for author: Koppula, H S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.10707  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

    Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

    Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  2. arXiv:2303.14885  [pdf, other

    eess.AS cs.LG cs.SD

    Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

    Authors: Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

    Abstract: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  3. arXiv:2110.02891  [pdf, other

    cs.LG cs.SD eess.AS

    Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

    Authors: Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

    Abstract: Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms f… ▽ More

    Submitted 30 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: ICML 2022

  4. arXiv:1601.00740  [pdf, other

    cs.RO cs.CV cs.LG

    Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture

    Authors: Ashesh Jain, Hema S Koppula, Shane Soh, Bharad Raghavan, Avi Singh, Ashutosh Saxena

    Abstract: Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they perform a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADA… ▽ More

    Submitted 5 January, 2016; originally announced January 2016.

    Comments: Journal Version (ICCV and ICRA combination with more system details) http://brain4cars.com

  5. arXiv:1509.05016  [pdf, other

    cs.CV cs.AI cs.RO

    Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture

    Authors: Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, Ashutosh Saxena

    Abstract: Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our architecture consists of Recurrent Ne… ▽ More

    Submitted 16 September, 2015; originally announced September 2015.

    Comments: Follow-up of ICCV 2015 Brain4Cars http://www.brain4cars.com

  6. arXiv:1504.02789  [pdf, other

    cs.CV

    Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models

    Authors: Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, Ashutosh Saxena

    Abstract: Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they perform a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADA… ▽ More

    Submitted 19 September, 2015; v1 submitted 10 April, 2015; originally announced April 2015.

    Comments: ICCV 2015, http://brain4cars.com

  7. arXiv:1412.0691  [pdf, other

    cs.AI cs.RO

    RoboBrain: Large-Scale Knowledge Engine for Robots

    Authors: Ashutosh Saxena, Ashesh Jain, Ozan Sener, Aditya Jami, Dipendra K. Misra, Hema S. Koppula

    Abstract: In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, robot trajectories, visual features and many others. The \textit{knowledge} stored in the engine comes from mult… ▽ More

    Submitted 12 April, 2015; v1 submitted 1 December, 2014; originally announced December 2014.

    Comments: 10 pages, 9 figures

  8. arXiv:1210.1207  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Human Activities and Object Affordances from RGB-D Videos

    Authors: Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena

    Abstract: Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RG… ▽ More

    Submitted 5 May, 2013; v1 submitted 4 October, 2012; originally announced October 2012.

    Comments: arXiv admin note: substantial text overlap with arXiv:1208.0967

  9. arXiv:1208.0967  [pdf, ps, other

    cs.CV

    Human Activity Learning using Object Affordances from RGB-D Videos

    Authors: Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena

    Abstract: Human activities comprise several sub-activities performed in a sequence and involve interactions with various objects. This makes reasoning about the object affordances a central task for activity recognition. In this work, we consider the problem of jointly labeling the object affordances and human activities from RGB-D videos. We frame the problem as a Markov Random Field where the nodes repres… ▽ More

    Submitted 4 August, 2012; originally announced August 2012.

  10. arXiv:1111.5358  [pdf, other

    cs.RO cs.AI cs.CV

    Contextually Guided Semantic Labeling and Search for 3D Point Clouds

    Authors: Abhishek Anand, Hema Swetha Koppula, Thorsten Joachims, Ashutosh Saxena

    Abstract: RGB-D cameras, which give an RGB image to- gether with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the 3D point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cu… ▽ More

    Submitted 5 September, 2012; v1 submitted 22 November, 2011; originally announced November 2011.

    Comments: arXiv admin note: substantial text overlap with arXiv:1106.5551

  11. arXiv:1106.5551  [pdf, other

    cs.RO

    Labeling 3D scenes for Personal Assistant Robots

    Authors: Hema Swetha Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

    Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. We use this data to build 3D point clouds of a full scene. In this paper, we address the task of labeling objects in this 3D point cloud of a complete indoor scene such as an office. We propose a graphical model that captures various features and contextual relations, including the local visual… ▽ More

    Submitted 27 June, 2011; originally announced June 2011.