Skip to main content

Showing 1–50 of 58 results for author: Schwing, A G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07991  [pdf, other

    cs.CV

    GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

    Authors: **g Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang

    Abstract: We introduce GoMAvatar, a novel approach for real-time, memory-efficient, high-quality animatable human modeling. GoMAvatar takes as input a single monocular video to create a digital avatar capable of re-articulation in new poses and real-time rendering from novel viewpoints, while seamlessly integrating with rasterization-based graphics pipelines. Central to our method is the Gaussians-on-Mesh r… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024; project page: https://wenj.github.io/GoMAvatar/

  2. arXiv:2404.03657  [pdf, other

    cs.CV cs.AI

    OW-VISCap: Open-World Video Instance Segmentation and Captioning

    Authors: Anwesa Choudhuri, Girish Chowdhary, Alexander G. Schwing

    Abstract: Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page: https://anwesachoudhuri.github.io/OpenWorldVISCap/

  3. arXiv:2312.02189  [pdf, other

    cs.CV cs.AI

    StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

    Authors: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang, Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma

    Abstract: In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  4. arXiv:2311.01331  [pdf, other

    cs.LG cs.AI

    Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

    Authors: Kai Yan, Alexander G. Schwing, Yu-xiong Wang

    Abstract: In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction E… ▽ More

    Submitted 9 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: 25 pages. Accepted to ICML 2024

  5. arXiv:2311.01329  [pdf, other

    cs.LG cs.AI

    A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

    Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

    Abstract: Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy bet… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 35 pages; Accepted as a poster for NeurIPS2023

  6. arXiv:2310.08587  [pdf, other

    cs.CV

    Pseudo-Generalized Dynamic View Synthesis from a Video

    Authors: Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing

    Abstract: Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best… ▽ More

    Submitted 19 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?"; Project page: https://xiaoming-zhao.github.io/projects/pgdvs

  7. arXiv:2305.13650  [pdf, other

    cs.LG cs.AI

    Robust Model-Based Optimization for Challenging Fitness Landscapes

    Authors: Saba Ghaffari, Ehsan Saleh, Alexander G. Schwing, Yu-Xiong Wang, Martin D. Burke, Saurabh Sinha

    Abstract: Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recogni… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  8. arXiv:2210.09496  [pdf, other

    cs.LG cs.AI

    CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations

    Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

    Abstract: Although reinforcement learning has found widespread use in dense reward settings, training autonomous agents with sparse rewards remains challenging. To address this difficulty, prior work has shown promising results when using not only task-specific demonstrations but also task-agnostic albeit somewhat related demonstrations. In most cases, the available demonstrations are distilled into an impl… ▽ More

    Submitted 21 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 27 pages; published as NeurIPS 2022 poster paper

  9. arXiv:2210.08001  [pdf, other

    cs.CV cs.AI cs.LG

    Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Alexander G. Schwing, Minh N. Do, Raymond A. Yeh

    Abstract: We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on ima… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  10. arXiv:2210.05825  [pdf, other

    cs.CV cs.AI

    Controllable Radiance Fields for Dynamic Face Synthesis

    Authors: Peiye Zhuang, Liqian Ma, Oluwasanmi Koyejo, Alexander G. Schwing

    Abstract: Recent work on 3D-aware image synthesis has achieved compelling results using advances in neural rendering. However, 3D-aware synthesis of face dynamics hasn't received much attention. Here, we study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion (e.g., facial expression change), while simultaneously ensuring 3D-awareness. For this we propose a Co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to 3DV 2022. 13 pages, 15 figures

  11. arXiv:2210.04287  [pdf, other

    cs.CV

    Learning to Decompose Visual Features with Latent Textual Prompts

    Authors: Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing, Heng Ji

    Abstract: Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case of inaccurate text descriptions during retrieval-based inference (the challenge for zero-shot protocol); or 2) breaking the well-establi… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

  12. arXiv:2208.02817  [pdf, other

    cs.CV cs.AI

    Occupancy Planes for Single-view RGB-D Human Reconstruction

    Authors: Xiaoming Zhao, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

    Abstract: Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification. Specifically, a set of 3D locations within the view-frustum of the camera are first projected independently onto the image and a corresponding feature is subsequently extracted for each 3D location. The feature of each 3D location is then used to classify independently whether the corres… ▽ More

    Submitted 1 December, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: AAAI2023; Code: https://github.com/Xiaoming-Zhao/oplanes

  13. arXiv:2207.14289  [pdf, other

    cs.CV cs.AI

    Initialization and Alignment for Adversarial Texture Optimization

    Authors: Xiaoming Zhao, Zhizhen Zhao, Alexander G. Schwing

    Abstract: While recovery of geometry from image and video data has received a lot of attention in computer vision, methods to capture the texture for a given geometry are less mature. Specifically, classical methods for texture generation often assume clean geometry and reasonably well-aligned image data. While very recent methods, e.g., adversarial texture optimization, better handle lower-quality data obt… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Project Page: https://xiaoming-zhao.github.io/projects/advtex_init_align/

  14. arXiv:2207.10642  [pdf, other

    cs.CV cs.AI

    Generative Multiplane Images: Making a 2D GAN 3D-Aware

    Authors: Xiaoming Zhao, Fangchang Ma, David Güera, Zhile Ren, Alexander G. Schwing, Alex Colburn

    Abstract: What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a 'ge… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV2022; Project Page: https://xiaoming-zhao.github.io/projects/gmpi/

  15. arXiv:2207.07115  [pdf, other

    cs.CV

    XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

    Authors: Ho Kei Cheng, Alexander G. Schwing

    Abstract: We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin… ▽ More

    Submitted 18 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Project page: https://hkchengrex.github.io/XMem

  16. arXiv:2205.14929  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Volumetric Object Selection

    Authors: Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

    Abstract: We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views. To achieve this result, we propose a novel voxel fe… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 camera ready

  17. arXiv:2205.06111  [pdf, other

    cs.AI cs.CL

    Asking for Knowledge: Training RL Agents to Query External Knowledge Using Language

    Authors: Iou-Jen Liu, Xingdi Yuan, Marc-Alexandre Côté, Pierre-Yves Oudeyer, Alexander G. Schwing

    Abstract: To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two n… ▽ More

    Submitted 3 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: ICML 2022; Project page: https://ioujenliu.github.io/AFK/

  18. arXiv:2204.03643  [pdf, other

    cs.CV

    Total Variation Optimization Layers for Computer Vision

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

    Abstract: Optimization within a layer of a deep-net has emerged as a new direction for deep-net layer design. However, there are two main challenges when applying these layers to computer vision tasks: (a) which optimization problem within a layer is useful?; (b) how to ensure that computation within a layer remains efficient? To study question (a), in this work, we propose total variation (TV) minimization… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  19. arXiv:2204.03640  [pdf, other

    cs.LG cs.CV

    Equivariance Discovery by Learned Parameter-Sharing

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing

    Abstract: Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: AISTATS 2022

  20. arXiv:2112.10764  [pdf, other

    cs.CV cs.AI cs.LG

    Mask2Former for Video Instance Segmentation

    Authors: Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

    Abstract: We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: Code and models: https://github.com/facebookresearch/Mask2Former

  21. arXiv:2112.02091  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Class-agnostic Reconstruction of Dynamic Objects from Videos

    Authors: Zhongzheng Ren, Xiaoming Zhao, Alexander G. Schwing

    Abstract: We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics i… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021

  22. arXiv:2112.01527  [pdf, other

    cs.CV cs.AI cs.LG

    Masked-attention Mask Transformer for Universal Image Segmentation

    Authors: Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

    Abstract: Image segmentation is about grou** pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmenta… ▽ More

    Submitted 15 June, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Project page/code/models: https://bowenc0221.github.io/mask2former

  23. arXiv:2108.03319  [pdf, other

    cs.AI

    Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

    Authors: Iou-Jen Liu, Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Solving complex real-world tasks, e.g., autonomous fleet control, often involves a coordinated team of multiple agents which learn strategies from visual inputs via reinforcement learning. Many existing multi-agent reinforcement learning (MARL) algorithms however don't scale to environments where agents operate on visual inputs. To address this issue, algorithmically, recent works have focused on… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: IROS 2021; Project page: https://ioujenliu.github.io/SemanticTracklets/

  24. arXiv:2107.11444  [pdf, other

    cs.AI

    Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

    Authors: Iou-Jen Liu, Unnat Jain, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. However, existing methods suffer from a common challenge: agents struggle to ide… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: ICML 2021; Project Page: https://ioujenliu.github.io/CMAE/

  25. arXiv:2107.06278  [pdf, other

    cs.CV

    Per-Pixel Classification is Not All You Need for Semantic Segmentation

    Authors: Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

    Abstract: Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this ob… ▽ More

    Submitted 31 October, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021, Spotlight. Project page: https://bowenc0221.github.io/maskformer

  26. arXiv:2105.14710  [pdf, other

    cs.LG stat.ML

    Robustifying $\ell_\infty$ Adversarial Training to the Union of Perturbation Models

    Authors: Ameya D. Patil, Michael Tuttle, Alexander G. Schwing, Naresh R. Shanbhag

    Abstract: Classical adversarial training (AT) frameworks are designed to achieve high adversarial accuracy against a single attack type, typically $\ell_\infty$ norm-bounded perturbations. Recent extensions in AT have focused on defending against the union of multiple perturbations but this benefit is obtained at the expense of a significant (up to $10\times$) increase in training complexity over single-att… ▽ More

    Submitted 11 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

  27. arXiv:2105.08612  [pdf, other

    cs.CV cs.GR cs.LG

    SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data

    Authors: Yuan-Ting Hu, Jiahong Wang, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Extracting detailed 3D information of objects from video data is an important goal for holistic scene understanding. While recent methods have shown impressive results when reconstructing meshes of objects from a single image, results often remain ambiguous as part of the object is unobserved. Moreover, existing image-based datasets for mesh reconstruction don't permit to study models which integr… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral

  28. arXiv:2105.06461  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    3D Spatial Recognition without Spatially Labeled 3D

    Authors: Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar

    Abstract: We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-in… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: CVPR 2021

  29. arXiv:2105.06441  [pdf, other

    cs.CV cs.AI cs.IR

    DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

    Authors: Safa Messaoud, Ismini Lourentzou, Assma Boughoula, Mona Zehni, Zhizhen Zhao, Chengxiang Zhai, Alexander G. Schwing

    Abstract: The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this work, we introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS, that jointly optimizes multiple… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  30. arXiv:2102.01187  [pdf, other

    cs.CV

    Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

    Authors: Peiye Zhuang, Oluwasanmi Koyejo, Alexander G. Schwing

    Abstract: Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, glo… ▽ More

    Submitted 28 March, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to ICLR 2021. 14 pages, 15 figures

  31. arXiv:2012.09849  [pdf, other

    cs.LG cs.AI

    High-Throughput Synchronous Deep RL

    Authors: Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted to NeurIPS 2020; Project page: https://ioujenliu.github.io/HTS-RL/

  32. arXiv:2010.10804  [pdf, other

    cs.CV cs.AI cs.LG

    UFO$^2$: A Unified Framework towards Omni-supervised Object Detection

    Authors: Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz

    Abstract: Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags. However, real-world annotations are often diverse in form, which challenges these existing works. In this paper, we present UFO$^2$, a unified object detection framework that can handle different forms o… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: ECCV 2020

  33. arXiv:2007.01293  [pdf, other

    cs.LG cs.CV stat.ML

    Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

    Authors: Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights -- as done in prior work -- is no longer possible. Instead, we adjus… ▽ More

    Submitted 29 October, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: NeurIPS camera ready

  34. arXiv:2005.01508  [pdf, other

    cs.CV

    Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning?

    Authors: Safa Messaoud, Maghav Kumar, Alexander G. Schwing

    Abstract: Combinatorial optimization is frequently used in computer vision. For instance, in applications like semantic segmentation, human pose estimation and action recognition, programs are formulated for solving inference in Conditional Random Fields (CRFs) to produce a structured output that is consistent with visual features of the image. However, solving inference in CRFs is in general intractable, a… ▽ More

    Submitted 4 May, 2020; v1 submitted 27 April, 2020; originally announced May 2020.

    Comments: CVPR 2020 (Oral)

    MSC Class: I.4.6; I.2.6

  35. arXiv:2004.04725  [pdf, other

    cs.CV cs.LG eess.IV

    Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

    Authors: Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz

    Abstract: Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training. However, major challenges remain: (1) differentiation of object instances can be ambiguous; (2) detectors tend to focus on discriminative parts rather than entire objects; (3) without ground truth, object proposals have to be redundant for high recalls, caus… ▽ More

    Submitted 21 October, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  36. arXiv:1911.00029  [pdf, other

    cs.CV cs.LG

    Chirality Nets for Human Pose Regression

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Alexander G. Schwing

    Abstract: We propose Chirality Nets, a family of deep nets that is equivariant to the "chirality transform," i.e., the transformation to create a chiral pair. Through parameter sharing, odd and even symmetry, we propose and prove variants of standard building blocks of deep nets that satisfy the equivariance property, including fully connected layers, convolutional layers, batch-normalization, and LSTM/GRU… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accepted to NeurIPS2019

  37. arXiv:1911.00025  [pdf, other

    cs.LG cs.CV stat.ML

    PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

    Authors: Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Sample efficiency and scalability to a large number of agents are two important goals for multi-agent reinforcement learning systems. Recent works got us closer to those goals, addressing non-stationarity of the environment from a single agent's perspective by utilizing a deep net critic which depends on all observations and actions. The critic input concatenates agent observations and actions in… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accepted to CORL2019

  38. arXiv:1910.14673  [pdf, other

    cs.CV cs.LG stat.ML

    Co-Generation with GANs using AIS based HMC

    Authors: Tiantian Fang, Alexander G. Schwing

    Abstract: Inferring the most likely configuration for a subset of variables of a joint distribution given the remaining ones - which we refer to as co-generation - is an important challenge that is computationally demanding for all but the simplest settings. This task has received a considerable amount of attention, particularly for classical ways of modeling distributions like structured prediction. In con… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2019

  39. arXiv:1910.14671  [pdf, other

    cs.CV cs.CL cs.LG

    TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

    Authors: **gxiang Lin, Unnat Jain, Alexander G. Schwing

    Abstract: Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense rea… ▽ More

    Submitted 9 January, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2019. Project page: https://deanplayerljx.github.io/tabvcr

  40. arXiv:1907.06134  [pdf, other

    cs.CV cs.LG eess.IV

    FMRI data augmentation via synthesis

    Authors: Peiye Zhuang, Alexander G. Schwing, Sanmi Koyejo

    Abstract: We present an empirical evaluation of fMRI data augmentation via synthesis. For synthesis we use generative mod-els trained on real neuroimaging data to produce novel task-dependent functional brain images. Analyzed generative mod-els include classic approaches such as the Gaussian mixture model (GMM), and modern implicit generative models such as the generative adversarial network (GAN) and the v… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  41. arXiv:1904.05878  [pdf, other

    cs.LG stat.ML

    Knowledge Flow: Improve Upon Your Teachers

    Authors: Iou-Jen Liu, Jian Peng, Alexander G. Schwing

    Abstract: A zoo of deep nets is available these days for almost any given task, and it is increasingly unclear which net to start with when addressing a new task, or which net to use as an initialization for fine-tuning a new model. To address this issue, in this paper, we develop knowledge flow which moves 'knowledge' from multiple deep nets, referred to as teachers, to a new deep net model, called the stu… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: Accepted to ICLR 2019

  42. arXiv:1811.00538  [pdf, other

    cs.CV

    Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

    Authors: Medhini Narasimhan, Svetlana Lazebnik, Alexander G. Schwing

    Abstract: Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entitie… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: Accepted to NIPS 2018

  43. arXiv:1809.02129  [pdf, other

    cs.CV cs.LG

    Structural Consistency and Controllability for Diverse Colorization

    Authors: Safa Messaoud, David Forsyth, Alexander G. Schwing

    Abstract: Colorizing a given gray-level image is an important task in the media and advertising industry. Due to the ambiguity inherent to colorization (many shades are often plausible), recent approaches started to explicitly model diversity. However, one of the most obvious artifacts, structural inconsistency, is rarely considered by existing methods which predict chrominance independently for every pixel… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: Accepted to ECCV 2018

  44. arXiv:1809.01125  [pdf, other

    cs.CV

    Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation

    Authors: Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing

    Abstract: Unsupervised video segmentation plays an important role in a wide variety of applications from object identification to compression. However, to date, fast motion, motion blur and occlusions pose significant challenges. To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and e… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted to ECCV 2018

  45. arXiv:1809.01124  [pdf, other

    cs.CV cs.AI

    Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering

    Authors: Medhini Narasimhan, Alexander G. Schwing

    Abstract: Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment. Many existing methods focus on observation-based questions, ignoring our ability to seamlessly combine observed content with general knowledge. To understand interactions with a knowledge base, a dataset has been int… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted to ECCV 2018

  46. arXiv:1809.01123  [pdf, other

    cs.CV cs.LG

    VideoMatch: Matching based Video Object Segmentation

    Authors: Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing

    Abstract: Video object segmentation is challenging yet important in a wide variety of applications for video analysis. Recent works formulate video object segmentation as a prediction task using deep nets to achieve appealing state-of-the-art performance. Due to the formulation as a prediction task, most of these methods require fine-tuning during test time, such that the deep nets memorize the appearance o… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted to ECCV 2018

  47. arXiv:1809.00681  [pdf, other

    cs.CV

    Diverse and Coherent Paragraph Generation from Images

    Authors: Moitreya Chatterjee, Alexander G. Schwing

    Abstract: Paragraph generation from images, which has gained popularity recently, is an important task for video summarization, editing, and support of the disabled. Traditional image captioning methods fall short on this front, since they aren't designed to generate long informative descriptions. Moreover, the vanilla approach of simply concatenating multiple short sentences, possibly synthesized from a cl… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: Camera Ready Version of ECCV 2018 paper; Coupled with supplementary

  48. arXiv:1803.11209  [pdf, other

    cs.CV

    Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

    Authors: Raymond A. Yeh, **jun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all po… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to NIPS 2017

  49. arXiv:1803.11187  [pdf, other

    cs.CV

    MaskRNN: Instance Level Video Object Segmentation

    Authors: Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing

    Abstract: Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance -- a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent c… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to NIPS 2017

  50. arXiv:1803.11185  [pdf, other

    cs.CV

    Unsupervised Textual Grounding: Linking Words to Image Concepts

    Authors: Raymond A. Yeh, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale da… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR 2018