Skip to main content

Showing 1–15 of 15 results for author: Pantofaru, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2205.04334  [pdf, other

    cs.CV

    Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

    Authors: Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

    Abstract: We present Panoptic Neural Fields (PNF), an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff). Each object is represented by an oriented 3D bounding box and a multi-layer perceptron (MLP) that takes position, direction, and time and outputs density and radiance. The background stuff is represented by a similar MLP that additional… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 paper. See project page at https://abhijitkundu.info/projects/pnf

  2. arXiv:2202.04901  [pdf, other

    cs.CV

    FILM: Frame Interpolation for Large Motion

    Authors: Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless

    Abstract: We present a frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion. Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis. This is often complex and requires scarce optical flow or depth ground-truth. In this work, we present a single unified network, distin… ▽ More

    Submitted 16 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: Accepted to ECCV 2022. Project website: https://film-net.github.io. Code: https://github.com/google-research/frame-interpolation. YouTube: https://www.youtube.com/watch?v=OAD-BieIjH4

  3. arXiv:2110.11325  [pdf, other

    cs.CV

    Learning 3D Semantic Segmentation with only 2D Image Supervision

    Authors: Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

    Abstract: With the recent growth of urban map** and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast,… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted to 3DV 2021 (Oral)

  4. A Step Toward More Inclusive People Annotations for Fairness

    Authors: Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru

    Abstract: The Open Images Dataset contains approximately 9 million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Journal ref: AIES (2021)

  5. arXiv:2007.13138  [pdf, other

    cs.CV eess.IV

    Virtual Multi-view Fusion for 3D Semantic Segmentation

    Authors: Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru

    Abstract: Semantic segmentation of 3D meshes is an important problem for 3D scene understanding. In this paper we revisit the classic multiview representation of 3D meshes and study several techniques that make them effective for 3D semantic segmentation of meshes. Given a 3D mesh reconstructed from RGBD sensors, our method effectively chooses different virtual views of the 3D mesh and renders multiple 2D c… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

    Comments: To appear in ECCV 2020

  6. arXiv:2007.12392  [pdf, other

    cs.CV cs.LG eess.IV

    An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

    Authors: Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A Ross, Thomas Funkhouser, Alireza Fathi

    Abstract: Detecting objects in 3D LiDAR data is a core technology for autonomous driving and other robotics applications. Although LiDAR data is acquired over time, most of the 3D object detection algorithms propose object bounding boxes independently for each frame and neglect the useful information available in the temporal domain. To address this problem, in this paper we propose a sparse LSTM-based mult… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: To appear in ECCV 2020

  7. arXiv:2007.10323  [pdf, other

    cs.CV cs.LG cs.RO

    Pillar-based Object Detection for Autonomous Driving

    Authors: Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon

    Abstract: We present a simple and flexible object detection framework optimized for autonomous driving. Building on the observation that point clouds in this application are extremely sparse, we propose a practical pillar-based approach to fix the imbalance issue caused by anchors. In particular, our algorithm incorporates a cylindrical projection into multi-view feature learning, predicts bounding box para… ▽ More

    Submitted 26 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: Accepted to ECCV2020

  8. arXiv:2004.01170  [pdf, other

    cs.CV

    DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

    Authors: Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

    Abstract: We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that b… ▽ More

    Submitted 6 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  9. arXiv:1901.01342  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

    Authors: Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru

    Abstract: Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made com… ▽ More

    Submitted 24 May, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

  10. arXiv:1808.00606  [pdf, other

    cs.SD eess.AS

    AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

    Authors: Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

    Abstract: Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or… ▽ More

    Submitted 23 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Interspeech, 2018

  11. arXiv:1706.00079  [pdf, other

    cs.MM cs.CV cs.SD

    Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

    Authors: Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy

    Abstract: In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals. The thesis underlying our work is that an extremely simple approach to generating (weak) speech clusters can be combined with visual signals to effectively associate faces and voices by aggregating statistics across a video. This approach does not need any traini… ▽ More

    Submitted 31 May, 2017; originally announced June 2017.

  12. arXiv:1705.08421  [pdf, other

    cs.CV

    AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

    Authors: Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

    Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual… ▽ More

    Submitted 30 April, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for details

  13. arXiv:1606.08955  [pdf, other

    cs.MM

    Leveraging Contextual Cues for Generating Basketball Highlights

    Authors: Vinay Bettadapura, Caroline Pantofaru, Irfan Essa

    Abstract: The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cues derived from the environment that the game is be… ▽ More

    Submitted 29 June, 2016; originally announced June 2016.

    Comments: Proceedings of ACM Multimedia 2016

    ACM Class: H.3.1; I.2.10

  14. Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

    Authors: Vinay Bettadapura, Irfan Essa, Caroline Pantofaru

    Abstract: We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person's field-of-view in a given environment and transferring this information onto a reference corpus of images and videos of the same space, hence det… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: 8 pages in Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision (WACV 2015)

  15. arXiv:1507.00302  [pdf, other

    cs.CV

    Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

    Authors: Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

    Abstract: We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation c… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.