Skip to main content

Showing 1–50 of 91 results for author: Karaman, S

.
  1. arXiv:2406.04300  [pdf, other

    cs.RO

    Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models

    Authors: Phat Nguyen, Tsun-Hsuan Wang, Zhang-Wei Hong, Sertac Karaman, Daniela Rus

    Abstract: Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 14 pages, 7 figures

  2. arXiv:2405.05956  [pdf, other

    cs.RO cs.CV

    Probing Multimodal LLMs as World Models for Driving

    Authors: Shiva Sreeram, Tsun-Hsuan Wang, Alaa Maalouf, Guy Rosman, Sertac Karaman, Daniela Rus

    Abstract: We provide a sober look at the application of Multimodal Large Language Models (MLLMs) within the domain of autonomous driving and challenge/verify some common assumptions, focusing on their ability to reason and interpret dynamic driving scenarios through sequences of images/frames in a closed-loop control environment. Despite the significant advancements in MLLMs like GPT-4V, their performance i… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: https://github.com/sreeramsa/DriveSim https://www.youtube.com/watch?v=Fs8jgngOJzU

  3. arXiv:2404.01400  [pdf, other

    cs.RO

    NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification

    Authors: Juyeop Han, Lukas Lao Beyer, Guilherme V. Cavalheiro, Sertac Karaman

    Abstract: In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures, 2 tables

  4. arXiv:2403.08152  [pdf, other

    cs.RO

    Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning

    Authors: Gilhyun Ryou, Geoffrey Wang, Sertac Karaman

    Abstract: High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  5. arXiv:2310.17642  [pdf, other

    cs.RO cs.CV cs.LG

    Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

    Authors: Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

    Abstract: As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundation… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Project webpage: https://drive-anywhere.github.io Explainer video: https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be

  6. GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model

    Authors: Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

    Abstract: Energy consumption of memory accesses dominates the compute energy in energy-constrained robots which require a compact 3D map of the environment to achieve autonomy. Recent map** frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to multi-pass processing of each depth image. In this work, we present a memory-efficient continuous… ▽ More

    Submitted 19 January, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 17 pages, 12 figures, 3 tables

    Journal ref: IEEE Transactions on Robotics 40 (2024) 1339-1355

  7. arXiv:2305.16502  [pdf, other

    cs.RO

    Learning When to Ask for Help: Efficient Interactive Navigation via Implicit Uncertainty Estimation

    Authors: Ifueko Igbinedion, Sertac Karaman

    Abstract: Robots operating alongside humans often encounter unfamiliar environments that make autonomous task completion challenging. Though improving models and increasing dataset size can enhance a robot's performance in unseen environments, data collection and model refinement may be impractical in every environment. Approaches that utilize human demonstrations through manual operation can aid in refinem… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    ACM Class: I.2.9

    Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA) 2024

  8. arXiv:2305.14797  [pdf, other

    cs.RO

    Multi-Abstractive Neural Controller: An Efficient Hierarchical Control Architecture for Interactive Driving

    Authors: Xiao Li, Igor Gilitschenski, Guy Rosman, Sertac Karaman, Daniela Rus

    Abstract: As learning-based methods make their way from perception systems to planning/control stacks, robot control systems have started to enjoy the benefits that data-driven methods provide. Because control systems directly affect the motion of the robot, data-driven methods, especially black box approaches, need to be used with caution considering aspects such as stability and interpretability. In this… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  9. Studying the Impact of Semi-Cooperative Drivers on Overall Highway Flow

    Authors: Noam Buckman, Sertac Karaman, Daniela Rus

    Abstract: Semi-cooperative behaviors are intrinsic properties of human drivers and should be considered for autonomous driving. In addition, new autonomous planners can consider the social value orientation (SVO) of human drivers to generate socially-compliant trajectories. Yet the overall impact on traffic flow for this new class of planners remain to be understood. In this work, we present study of implic… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: 8 pages. Accepted at IEEE Intelligent Vehicle (IV) Symposium 2023

  10. Infrastructure-based End-to-End Learning and Prevention of Driver Failure

    Authors: Noam Buckman, Shiva Sreeram, Mathias Lechner, Yutong Ban, Ramin Hasani, Sertac Karaman, Daniela Rus

    Abstract: Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they a… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 8 pages. Accepted to ICRA 2023

  11. arXiv:2302.00243  [pdf, ps, other

    cs.RO

    Agility and Target Distribution in the Dynamic Stochastic Traveling Salesman Problem

    Authors: Aviv Adler, Oren Gal, Sertac Karaman

    Abstract: An important variant of the classic Traveling Salesman Problem (TSP) is the Dynamic TSP, in which a system with dynamic constraints is tasked with visiting a set of n target locations (in any order) in the shortest amount of time. Such tasks arise naturally in many robotic motion planning problems, particularly in exploration, surveillance and reconnaissance, and classical TSP algorithms on graphs… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

    Comments: 106 pages

    MSC Class: 60D05 (Primary)

  12. arXiv:2212.07013  [pdf, other

    cs.RO cs.LG

    Learning and Predicting Multimodal Vehicle Action Distributions in a Unified Probabilistic Model Without Labels

    Authors: Charles Richter, Patrick R. Barragán, Sertac Karaman

    Abstract: We present a unified probabilistic model that learns a representative set of discrete vehicle actions and predicts the probability of each action given a particular scenario. Our model also enables us to estimate the distribution over continuous trajectories conditioned on a scenario, representing what each discrete action would look like if executed in that scenario. While our primary objective i… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: Presented at the Fresh Perspectives on the Future of Autonomous Driving workshop, ICRA 2022

  13. arXiv:2212.03298  [pdf, other

    cs.NI cs.RO

    WiSwarm: Age-of-Information-based Wireless Networking for Collaborative Teams of UAVs

    Authors: Vishrant Tripathi, Igor Kadota, Ezra Tal, Muhammad Shahir Rahman, Alexander Warren, Sertac Karaman, Eytan Modiano

    Abstract: The Age-of-Information (AoI) metric has been widely studied in the theoretical communication networks and queuing systems literature. However, experimental evaluation of its applicability to complex real-world time-sensitive systems is largely lacking. In this work, we develop, implement, and evaluate an AoI-based application layer middleware that enables the customization of WiFi networks to the… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: To be presented at IEEE INFOCOM 2023

  14. Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time

    Authors: Keshav Gupta, Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

    Abstract: Exploration tasks are essential to many emerging robotics applications, ranging from search and rescue to space exploration. The planning problem for exploration requires determining the best locations for future measurements that will enhance the fidelity of the map, for example, by reducing its total entropy. A widely-studied technique involves computing the Mutual Information (MI) between the c… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  15. arXiv:2207.13218  [pdf, other

    cs.RO eess.SY

    Global Incremental Flight Control for Agile Maneuvering of a Tailsitter Flying Wing

    Authors: Ezra Tal, Sertac Karaman

    Abstract: This paper proposes a novel control law for accurate tracking of agile trajectories using a tailsitter flying wing unmanned aerial vehicle (UAV) that transitions between vertical take-off and landing (VTOL) and forward flight. The global control formulation enables maneuvering throughout the flight envelope, including uncoordinated flight with sideslip. Differential flatness of the nonlinear tails… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: 24 pages, 11 figures, videos of the experiments at https://aera.mit.edu/projects/TailsitterAerobatics

  16. arXiv:2207.03524  [pdf, other

    cs.RO eess.SY

    Aerobatic Trajectory Generation for a VTOL Fixed-Wing Aircraft Using Differential Flatness

    Authors: Ezra Tal, Gilhyun Ryou, Sertac Karaman

    Abstract: This paper proposes a novel algorithm for aerobatic trajectory generation for a vertical take-off and landing (VTOL) tailsitter flying wing aircraft. The algorithm differs from existing approaches for fixed-wing trajectory generation, as it considers a realistic six-degree-of-freedom (6DOF) flight dynamics model, including aerodynamics equations. Using a global dynamics model enables the generatio… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: 14 pages, 17 figures, video of experiments available at https://aera.mit.edu/projects/TailsitterAerobatics

  17. arXiv:2206.00726  [pdf, other

    cs.RO

    Cooperative Multi-Agent Trajectory Generation with Modular Bayesian Optimization

    Authors: Gilhyun Ryou, Ezra Tal, Sertac Karaman

    Abstract: We present a modular Bayesian optimization framework that efficiently generates time-optimal trajectories for a cooperative multi-agent system, such as a team of UAVs. Existing methods for multi-agent trajectory generation often rely on overly conservative constraints to reduce the complexity of this high-dimensional planning problem, leading to suboptimal solutions. We propose a novel modular str… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted to appear at Robotics: Science and Systems 2022. Video at https://youtu.be/rxQiNeXvLTc

  18. arXiv:2205.09117  [pdf, other

    cs.LG cs.RO eess.SY

    Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

    Authors: Ryan Sander, Wilko Schwarting, Tim Seyde, Igor Gilitschenski, Sertac Karaman, Daniela Rus

    Abstract: Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates tra… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: Accepted to L4DC 2022

  19. arXiv:2202.10433  [pdf, other

    cs.MA eess.SY

    The Role of Heterogeneity in Autonomous Perimeter Defense Problems

    Authors: Aviv Adler, Oscar Mickelin, Ragesh K. Ramachandran, Gaurav S. Sukhatme, Sertac Karaman

    Abstract: When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on d… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: 27 pages, 9 figures

  20. arXiv:2111.12137  [pdf, other

    cs.RO cs.CV cs.LG

    Learning Interactive Driving Policies via Data-driven Simulation

    Authors: Tsun-Hsuan Wang, Alexander Amini, Wilko Schwarting, Igor Gilitschenski, Sertac Karaman, Daniela Rus

    Abstract: Data-driven simulators promise high data-efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a simulation method that uses in-painted ado vehicles for learning robust driving policie… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: The first two authors contributed equally to this this work. Code is available here: http://vista.csail.mit.edu/

  21. arXiv:2111.12083  [pdf, other

    cs.RO cs.CV cs.LG

    VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles

    Authors: Alexander Amini, Tsun-Hsuan Wang, Igor Gilitschenski, Wilko Schwarting, Zhijian Liu, Song Han, Sertac Karaman, Daniela Rus

    Abstract: Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for aut… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: First two authors contributed equally. Code and project website is available here: https://vista.csail.mit.edu

  22. arXiv:2109.02865  [pdf, other

    cs.CV

    Journalistic Guidelines Aware News Image Captioning

    Authors: Xuewen Yang, Svebor Karaman, Joel Tetreault, Alex Jaimes

    Abstract: The task of news article image captioning aims to generate descriptive and informative captions for news article images. Unlike conventional image captions that simply describe the content of the image in general terms, news image captions follow journalistic guidelines and rely heavily on named entities to describe the image content, often drawing context from the whole article they are associate… ▽ More

    Submitted 10 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

    Journal ref: EMNLP 2021

  23. arXiv:2109.00642  [pdf, other

    cs.CV

    Searching for Efficient Multi-Stage Vision Transformers

    Authors: Yi-Lun Liao, Sertac Karaman, Vivienne Sze

    Abstract: Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted in computer vision for years. This naturally raises the question of how the performance of ViT can be advanced with design techniques of CNN. To this end, we pr… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

  24. arXiv:2107.02384  [pdf, other

    cs.RO eess.SY

    Multi-Modal Motion Planning Using Composite Pose Graph Optimization

    Authors: L. Lao Beyer, N. Balabanska, E. Tal, S. Karaman

    Abstract: In this paper, we present a motion planning framework for multi-modal vehicle dynamics. Our proposed algorithm employs transcription of the optimization objective function, vehicle dynamics, and state and control constraints into sparse factor graphs, which -- combined with mode transition constraints -- constitute a composite pose graph. By formulating the multi-modal motion planning problem in c… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: 7 pages, 6 figures, to be included in proceedings of IEEE International Conference on Robotics and Automation 2021

  25. arXiv:2105.09932  [pdf, other

    cs.RO cs.CV

    Efficient and Robust LiDAR-Based End-to-End Navigation

    Authors: Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, Daniela Rus

    Abstract: Deep learning has been used to demonstrate end-to-end neural network learning for autonomous vehicle control from raw sensory input. While LiDAR sensors provide reliably accurate information, existing end-to-end driving solutions are mainly based on cameras since processing 3D data requires a large memory footprint and computation cost. On the other hand, increasing the robustness of these systems… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: ICRA 2021. The first two authors contributed equally to this work. Project page: https://le2ed.mit.edu/

  26. arXiv:2103.10888  [pdf, other

    eess.SY

    Feedback from Pixels: Output Regulation via Learning-Based Scene View Synthesis

    Authors: Murad Abu-Khalaf, Sertac Karaman, Daniela Rus

    Abstract: We propose a novel controller synthesis involving feedback from pixels, whereby the measurement is a high dimensional signal representing a pixelated image with Red-Green-Blue (RGB) values. The approach neither requires feature extraction, nor object detection, nor visual correspondence. The control policy does not involve the estimation of states or similar latent representations. Instead, tracki… ▽ More

    Submitted 23 April, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Submitted to L4DC on November-20-2020; Accepted on March-5-2021

  27. arXiv:2102.09812  [pdf, other

    cs.LG cs.AI cs.RO

    Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space

    Authors: Wilko Schwarting, Tim Seyde, Igor Gilitschenski, Lucas Liebenwein, Ryan Sander, Sertac Karaman, Daniela Rus

    Abstract: Learning competitive behaviors in multi-agent settings such as racing requires long-term reasoning about potential adversarial interactions. This paper presents Deep Latent Competition (DLC), a novel reinforcement learning algorithm that learns competitive visual control policies through self-play in imagination. The DLC agent imagines multi-agent interaction sequences in the compact latent space… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: Wilko, Tim, and Igor contributed equally to this work; published in Conference on Robot Learning 2020

  28. arXiv:2102.09661  [pdf, other

    math.NA

    Recovering orthogonal tensors under arbitrarily strong, but locally correlated, noise

    Authors: Oscar Mickelin, Sertac Karaman

    Abstract: We consider the problem of recovering an orthogonally decomposable tensor with a subset of elements distorted by noise with arbitrarily large magnitude. We focus on the particular case where each mode in the decomposition is corrupted by noise vectors with components that are correlated locally, i.e., with nearby components. We show that this deterministic tensor completion problem has the unusual… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: 20 pages, 6 figures

    MSC Class: 65F99; 15A69

  29. arXiv:2012.00928  [pdf

    eess.SY

    Low Cost, Educational Internal Combustion Engine Electronic Control Unit Hardware-in-the-Loop Test Systems

    Authors: Sertac Karaman, Levent Guvenc

    Abstract: Different hardware platforms and their associated real time operating systems that can be used in an educational laboratory for illustrating engine electronic control unit hardware in the loop testing are presented and compared in this paper. A Matlab graphical user interface prepared for generating synthetic crank and camshaft angular position sensor signals to be fed to the engine electronic con… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 6 pages, 11 figures, 1 table

  30. arXiv:2010.14641  [pdf, other

    cs.LG cs.AI cs.RO

    Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

    Authors: Tim Seyde, Wilko Schwarting, Sertac Karaman, Daniela Rus

    Abstract: Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this objective. This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term… ▽ More

    Submitted 11 December, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

  31. arXiv:2006.02513  [pdf, other

    cs.RO

    Multi-Fidelity Black-Box Optimization for Time-Optimal Quadrotor Maneuvers

    Authors: Gilhyun Ryou, Ezra Tal, Sertac Karaman

    Abstract: We consider the problem of generating a time-optimal quadrotor trajectory that attains a set of prescribed waypoints. This problem is challenging since the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including hardware and software, in agile high-speed flight. In this… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to appear at Robotics: Science and Systems 2020. Video at https://youtu.be/igwULi_H1Kg

  32. arXiv:2005.13986  [pdf, other

    cs.RO cs.CV

    Perception-aware time optimal path parameterization for quadrotors

    Authors: Igor Spasojevic, Varun Murali, Sertac Karaman

    Abstract: The increasing popularity of quadrotors has given rise to a class of predominantly vision-driven vehicles. This paper addresses the problem of perception-aware time optimal path parametrization for quadrotors. Although many different choices of perceptual modalities are available, the low weight and power budgets of quadrotor systems makes a camera ideal for on-board navigation and estimation algo… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Accepted to appear at ICRA 2020

  33. arXiv:2001.02359  [pdf, other

    cs.CV

    Weakly Supervised Visual Semantic Parsing

    Authors: Alireza Zareian, Svebor Karaman, Shih-Fu Chang

    Abstract: Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval. Nevertheless, existing SGG methods require millions of manually annotated bounding boxes for training, and are computationally inefficient, as they exhaustively process all pai… ▽ More

    Submitted 31 March, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: To be presented at CVPR 2020 (oral paper)

  34. arXiv:2001.02314  [pdf, other

    cs.CV

    Bridging Knowledge Graphs to Generate Scene Graphs

    Authors: Alireza Zareian, Svebor Karaman, Shih-Fu Chang

    Abstract: Scene graphs are powerful representations that parse images into their abstract semantic elements, i.e., objects and their interactions, which facilitates visual comprehension and explainable reasoning. On the other hand, commonsense knowledge graphs are rich repositories that encode how the world is structured, and how general concepts interact. In this paper, we present a unified formulation of… ▽ More

    Submitted 18 July, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: To be presented at ECCV 2020

  35. arXiv:1912.06785  [pdf, other

    cs.RO cs.CV cs.LG

    Deep Context Maps: Agent Trajectory Prediction using Location-specific Latent Maps

    Authors: Igor Gilitschenski, Guy Rosman, Arjun Gupta, Sertac Karaman, Daniela Rus

    Abstract: In this paper, we propose a novel approach for agent motion prediction in cluttered environments. One of the main challenges in predicting agent motion is accounting for location and context-specific information. Our main contribution is the concept of learning context maps to improve the prediction task. Context maps are a set of location-specific latent maps that are trained alongside the predic… ▽ More

    Submitted 19 June, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

  36. arXiv:1912.04462  [pdf, other

    cs.CV

    Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

    Authors: Shiyuan Huang, Xudong Lin, Svebor Karaman, Shih-Fu Chang

    Abstract: Two-stream networks have achieved great success in video recognition. A two-stream network combines a spatial stream of RGB frames and a temporal stream of Optical Flow to make predictions. However, the temporal redundancy of RGB frames as well as the high-cost of optical flow computation creates challenges for both the performance and efficiency. Recent works instead use modern compressed video m… ▽ More

    Submitted 12 December, 2019; v1 submitted 9 December, 2019; originally announced December 2019.

  37. arXiv:1909.10673  [pdf, other

    stat.ML cs.LG math.ST

    A Theory of Uncertainty Variables for State Estimation and Inference

    Authors: Rajat Talak, Sertac Karaman, Eytan Modiano

    Abstract: We develop a new framework of uncertainty variables to model uncertainty. An uncertainty variable is characterized by an uncertainty set, in which its realization is bound to lie, while the conditional uncertainty is characterized by a set map, from a given realization of a variable to a set of possible realizations of another variable. We prove Bayes' law and the law of total probability equivale… ▽ More

    Submitted 9 December, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

  38. arXiv:1909.06963  [pdf, other

    cs.RO cs.GT cs.MA

    Stochastic Dynamic Games in Belief Space

    Authors: Wilko Schwarting, Alyssa Pierson, Sertac Karaman, Daniela Rus

    Abstract: Information gathering while interacting with other agents under sensing and motion uncertainty is critical in domains such as driving, service robots, racing, or surveillance. The interests of agents may be at odds with others, resulting in a stochastic non-cooperative dynamic game. Agents must predict others' future actions without communication, incorporate their actions into these predictions,… ▽ More

    Submitted 12 May, 2021; v1 submitted 15 September, 2019; originally announced September 2019.

    Comments: Accepted in IEEE Transactions on Robotics (T-RO) 2021

    Journal ref: IEEE Transactions on Robotics (T-RO) 2021

  39. Multi-resolution Low-rank Tensor Formats

    Authors: Oscar Mickelin, Sertac Karaman

    Abstract: We describe a simple, black-box compression format for tensors with a multiscale structure. By representing the tensor as a sum of compressed tensors defined on increasingly coarse grids, we capture low-rank structures on each grid-scale, and we show how this leads to an increase in compression for a fixed accuracy. We devise an alternating algorithm to represent a given tensor in the multiresolut… ▽ More

    Submitted 17 August, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: 29 pages, 9 figures

    MSC Class: 65F99; 15A69

    Journal ref: SIAM J. Matrix Anal. Appl., 41(3), 1086-1114. (2020)

  40. arXiv:1907.06515  [pdf, other

    cs.CV eess.IV

    Detecting and Simulating Artifacts in GAN Fake Images

    Authors: Xu Zhang, Svebor Karaman, Shih-Fu Chang

    Abstract: To detect GAN generated images, conventional supervised machine learning algorithms require collection of a number of real and fake images from the targeted GAN model. However, the specific model used by the attacker is often unavailable. To address this, we propose a GAN simulator, AutoGAN, which can simulate the artifacts produced by the common pipeline shared by several popular GAN models. Addi… ▽ More

    Submitted 15 October, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

    Comments: This is an extended version of our original AutoGAN paper which will be appeared in WIFS 2019

  41. arXiv:1906.06407  [pdf, ps, other

    math.NA math.FA

    Optimal orthogonal approximations to symmetric tensors cannot always be chosen symmetric

    Authors: Oscar Mickelin, Sertac Karaman

    Abstract: We study the problem of finding orthogonal low-rank approximations of symmetric tensors. In the case of matrices, the approximation is a truncated singular value decomposition which is then symmetric. Moreover, for rank-one approximations of tensors of any dimension, a classical result proven by Banach in 1938 shows that the optimal approximation can always be chosen to be symmetric. In contrast t… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 20 pages

    MSC Class: 15A18; 15A69; 41A29

  42. arXiv:1905.11524  [pdf, other

    eess.SY

    Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach

    Authors: Murad Abu-Khalaf, Sertac Karaman, Daniela Rus

    Abstract: We propose controller synthesis for state regulation problems in which a human operator shares control with an autonomy system, running in parallel. The autonomy system continuously improves over human action, with minimal intervention, and can take over full-control. It additively combines user input with an adaptive optimal corrective signal. It is adaptive in that it neither estimates nor requi… ▽ More

    Submitted 20 September, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted by IEEE CDC 2019

  43. FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics Simulation

    Authors: Winter Guerra, Ezra Tal, Varun Murali, Gilhyun Ryou, Sertac Karaman

    Abstract: FlightGoggles is a photorealistic sensor simulator for perception-driven robotic vehicles. The key contributions of FlightGoggles are twofold. First, FlightGoggles provides photorealistic exteroceptive sensor simulation using graphics assets generated with photogrammetry. Second, it provides the ability to combine (i) synthetic exteroceptive measurements generated in silico in real time and (ii) v… ▽ More

    Submitted 28 May, 2021; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Initial version appeared at IROS 2019. Supplementary material can be found at https://flightgoggles.mit.edu. Revision includes description of new FlightGoggles features, such as a photogrammetric model of the MIT Stata Center, new rendering settings, and a Python API

  44. arXiv:1905.02238  [pdf, other

    cs.RO cs.IT

    FSMI: Fast computation of Shannon Mutual Information for information-theoretic map**

    Authors: Zhengdong Zhang, Trevor Henderson, Sertac Karaman, Vivienne Sze

    Abstract: Exploration tasks are embedded in many robotics applications, such as search and rescue and space exploration. Information-based exploration algorithms aim to find the most informative trajectories by maximizing an information-theoretic metric, such as the mutual information between the map and potential future measurements. Unfortunately, most existing information-based exploration algorithms are… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  45. arXiv:1904.04968  [pdf, other

    cs.RO eess.SY math.OC

    Asymptotic Optimality of a Time Optimal Path Parametrization Algorithm

    Authors: Igor Spasojevic, Varun Murali, Sertac Karaman

    Abstract: Time Optimal Path Parametrization is the problem of minimizing the time interval during which an actuation constrained agent can traverse a given path. Recently, an efficient linear-time algorithm for solving this problem was proposed. However, its optimality was proved for only a strict subclass of problems solved optimally by more computationally intensive approaches based on convex programming.… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

  46. arXiv:1903.03273  [pdf, other

    cs.CV cs.RO

    FastDepth: Fast Monocular Depth Estimation on Embedded Systems

    Authors: Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, Vivienne Sze

    Abstract: Depth sensing is a critical function for robotic tasks such as localization, map** and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow f… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 tables

  47. arXiv:1903.01545  [pdf, other

    cs.CV cs.IR cs.MM

    Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval

    Authors: Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang

    Abstract: We propose an unsupervised hashing method which aims to produce binary codes that preserve the ranking induced by a real-valued representation. Such compact hash codes enable the complete elimination of real-valued feature storage and allow for significant reduction of the computation complexity and storage cost of large-scale image retrieval applications. Specifically, we learn a neural network-b… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

  48. arXiv:1811.11683  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

    Authors: Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

    Abstract: We address the problem of phrase grounding by lear ing a multi-level common semantic space shared by the textual and visual modalities. We exploit multiple levels of feature maps of a Deep Convolutional Neural Network, as well as contextualized word and sentence embeddings extracted from a character-based language model. Following dedicated non-linear map**s for visual features at each level, wo… ▽ More

    Submitted 29 May, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: Accepted in CVPR 2019

  49. Variational End-to-End Navigation and Localization

    Authors: Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

    Abstract: Deep learning has revolutionized the ability to learn "end-to-end" autonomous vehicle control directly from raw sensory data. While there have been recent extensions to handle forms of navigation instruction, these works are unable to capture the full distribution of possible actions that could be taken and to reason about localization of the robot within the environment. In this paper, we extend… ▽ More

    Submitted 11 June, 2019; v1 submitted 25 November, 2018; originally announced November 2018.

    Comments: Published in IEEE International Conference on Robotics and Automation (ICRA) 2019. Best Paper Award Finalist

    Journal ref: 2019 International Conference on Robotics and Automation (ICRA)

  50. arXiv:1810.04371  [pdf, other

    cs.NI

    Can Determinacy Minimize Age of Information?

    Authors: Rajat Talak, Sertac Karaman, Eytan Modiano

    Abstract: Age-of-information (AoI) is a newly proposed performance metric of information freshness. It differs from the traditional delay metric, because it is destination centric and measures the time that elapsed since the last received fresh information update was generated at the source. AoI has been analyzed for several queueing models, and the problem of optimizing AoI over arrival and service rates h… ▽ More

    Submitted 14 January, 2019; v1 submitted 10 October, 2018; originally announced October 2018.