Skip to main content

Showing 101–150 of 321 results for author: Yuille, A

.
  1. arXiv:2111.13241  [pdf, other

    cs.CV

    Learning from Temporal Gradient for Semi-supervised Action Recognition

    Authors: Junfei Xiao, Longlong **g, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li

    Abstract: Semi-supervised video action recognition tends to enable deep neural networks to achieve remarkable performance even with very limited labeled data. However, existing methods are mainly transferred from current image-based methods (e.g., FixMatch). Without specifically utilizing the temporal dynamics and inherent multimodal attributes, their results could be suboptimal. To better leverage the enco… ▽ More

    Submitted 23 April, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: CVPR 2022

  2. arXiv:2111.09833  [pdf, other

    cs.CV

    TransMix: Attend to Mix for Vision Transformers

    Authors: Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

    Abstract: Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that the linearly interpolated ratio of targets should be kept the same as the ratio proposed in input interpolation. This may lead to a strange phenomenon t… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Code will be made publicly available at https://github.com/Beckschen/TransMix

  3. arXiv:2111.07950  [pdf, other

    cs.CV

    Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

    Authors: Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

    Abstract: Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario. OVIS consists of 296k high-quality instance masks and… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: Accepted by NeurIPS 2021 Datasets and Benchmarks Track. arXiv admin note: text overlap with arXiv:2102.01558

    MSC Class: 68T07; 68T45

  4. arXiv:2111.07832  [pdf, other

    cs.CV

    iBOT: Image BERT Pre-Training with Online Tokenizer

    Authors: **ghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

    Abstract: The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces. In this work, we study masked image modeling (MIM) and indicate the advantages and challenges of using a semantically meaningful visual tokenizer. We present a self-supervised framework iBOT that can perform masked… ▽ More

    Submitted 27 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

  5. arXiv:2111.07547  [pdf, other

    cs.CV

    Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

    Authors: Huai** Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille

    Abstract: Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. However, existing self-attention models are often hand-designed, modified from CNNs, and obtained by stacking one operator only. A wider range of architecture space which combines different self-attention operators and convolution is rarely explored. In this paper, we explore this… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: BMVC 2021

  6. arXiv:2111.05464  [pdf, other

    cs.CV

    Are Transformers More Robust Than CNNs?

    Authors: Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie

    Abstract: Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared a… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  7. arXiv:2110.14213  [pdf, other

    cs.CV

    Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

    Authors: Angtian Wang, Shenxiao Mei, Alan Yuille, Adam Kortylewski

    Abstract: We study the problem of learning to estimate the 3D object pose from a few labelled examples and a collection of unlabelled data. Our main contribution is a learning framework, neural view synthesis and matching, that can transfer the 3D pose annotation from the labelled to unlabelled images reliably, despite unseen 3D views and nuisance variations such as the object shape, texture, illumination o… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; Code is available under https://github.com/Angtian/NeuralVS

  8. arXiv:2110.13846  [pdf, other

    cs.CV

    A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation

    Authors: Yixiao Zhang, Adam Kortylewski, Qing Liu, Seyoun Park, Benjamin Green, Elizabeth Engle, Guillermo Almodovar, Ryan Walk, Sigfredo Soto-Diaz, Janis Taube, Alex Szalay, Alan Yuille

    Abstract: The field of computational pathology has witnessed great advancements since deep neural networks have been widely applied. These networks usually require large numbers of annotated data to train vast parameters. However, it takes significant effort to annotate a large histopathology dataset. We introduce a light-weight and interpretable model for nuclei detection and weakly-supervised segmentation… ▽ More

    Submitted 9 August, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  9. arXiv:2110.07118  [pdf, other

    cs.CV

    Nuisance-Label Supervision: Robustness Improvement by Free Labels

    Authors: Xinyue Wei, Weichao Qiu, Yi Zhang, Zihao Xiao, Alan Yuille

    Abstract: In this paper, we present a Nuisance-label Supervision (NLS) module, which can make models more robust to nuisance factor variations. Nuisance factors are those irrelevant to a task, and an ideal model should be invariant to them. For example, an activity recognition model should perform consistently regardless of the change of clothes and background. But our experiments show existing models are f… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 Workshop

  10. arXiv:2110.00519  [pdf, other

    cs.CV cs.CL

    Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

    Authors: Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille

    Abstract: While neural symbolic methods demonstrate impressive performance in visual question answering on synthetic images, their performance suffers on real images. We identify that the long-tail distribution of visual concepts and unequal importance of reasoning steps in real data are the two key obstacles that limit the models' real-world potentials. To address these challenges, we propose a new paradig… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: To appear in ICCV2021; Code at https://github.com/Lizw14/CaliCO.git

  11. arXiv:2109.12265  [pdf, other

    cs.CV cs.AI

    Label-Assemble: Leveraging Multiple Datasets with Partial Labels

    Authors: Mintong Kang, Bowen Li, Zengle Zhu, Yongyi Lu, Elliot K. Fishman, Alan L. Yuille, Zongwei Zhou

    Abstract: The success of deep learning relies heavily on large labeled datasets, but we often only have access to several small datasets associated with partial labels. To address this problem, we propose a new initiative, "Label-Assemble", that aims to unleash the full potential of partial labels from an assembly of public datasets. We discovered that learning from negative examples facilitates both comput… ▽ More

    Submitted 14 May, 2023; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: ISBI 2023

  12. arXiv:2109.11572  [pdf, other

    eess.IV cs.CV

    SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings

    Authors: Fengze Liu, Ke Yan, Adam Harrison, Dazhou Guo, Le Lu, Alan Yuille, Lingyun Huang, Guotong Xie, **g Xiao, Xianghua Ye, Dakai **

    Abstract: In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep d… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  13. arXiv:2109.05211  [pdf, other

    cs.CV

    RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

    Authors: Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, Dacheng Tao

    Abstract: Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and develo** robust DNNs. T… ▽ More

    Submitted 13 January, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

  14. arXiv:2108.10312  [pdf, other

    cs.CV

    Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

    Authors: Chenxu Luo, Xiaodong Yang, Alan Yuille

    Abstract: 3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles. Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a heuristic matching step for the detection association. In this paper, we present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection an… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  15. arXiv:2107.05637  [pdf, other

    cs.CV

    Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms

    Authors: Chenglin Yang, Siyuan Qiao, Adam Kortylewski, Alan Yuille

    Abstract: Self-Attention has become prevalent in computer vision models. Inspired by fully connected Conditional Random Fields (CRFs), we decompose self-attention into local and context terms. They correspond to the unary and binary terms in CRF and are implemented by attention mechanisms with projection matrices. We observe that the unary terms only make small contributions to the outputs, and meanwhile st… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  16. arXiv:2106.09748  [pdf, other

    cs.CV

    DeepLab2: A TensorFlow Library for Deep Labeling

    Authors: Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

    Abstract: DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the sta… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 4-page technical report. The first three authors contributed equally to this work

  17. arXiv:2106.05554  [pdf, other

    cs.CV

    Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement

    Authors: Zefan Li, Chenxi Liu, Alan Yuille, Bingbing Ni, Wenjun Zhang, Wen Gao

    Abstract: Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the entire network. But one distinct advantage of unsupervised over supervised learning is that the former possesses more variety and freedom in designing the objective. In this work, we explore new dimensions of unsupervised learning by prop… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted by the IEEE conference on computer vision and pattern recognition. 2021

  18. arXiv:2106.04569  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Simulated Adversarial Testing of Face Recognition Models

    Authors: Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

    Abstract: Most machine learning models are validated and tested on fixed datasets. This can give an incomplete picture of the capabilities and weaknesses of the model. Such weaknesses can be revealed at test time in the real world. The risks involved in such failures can be loss of profits, loss of time or even loss of life in certain critical applications. In order to alleviate this issue, simulators can b… ▽ More

    Submitted 31 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  19. arXiv:2106.02277  [pdf, other

    cs.CV

    Glance-and-Gaze Vision Transformer

    Authors: Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan Yuille, Wei Shen

    Abstract: Recently, there emerges a series of vision Transformers, which show superior performance with a more compact model size than conventional convolutional neural networks, thanks to the strong ability of Transformers to model long-range dependencies. However, the advantages of vision Transformers also come with a price: Self-attention, the core part of Transformer, has a quadratic complexity to the i… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: codes and models will be made available at https://github.com/yucornetto/GG-Transformer

  20. arXiv:2106.00209  [pdf, other

    cs.CV

    Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning

    Authors: Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille

    Abstract: Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. However, most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations. In particular, we decouple the trainin… ▽ More

    Submitted 10 December, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  21. Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction

    Authors: Yan Wang, Peng Tang, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille

    Abstract: Pancreatic ductal adenocarcinoma (PDAC) is the third most common cause of cancer death in the United States. Predicting tumors like PDACs (including both classification and segmentation) from medical images by deep learning is becoming a growing trend, but usually a large number of annotated data are required for training, which is very labor-intensive and time-consuming. In this paper, we conside… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

  22. arXiv:2105.07065  [pdf

    cs.AI cs.LG

    Visual analogy: Deep learning versus compositional models

    Authors: Nicholas Ichien, Qing Liu, Shuhao Fu, Keith J. Holyoak, Alan Yuille, Hong**g Lu

    Abstract: Is analogical reasoning a task that must be learned to solve from scratch by applying deep learning models to massive numbers of reasoning problems? Or are analogies solved by computing similarities between structured representations of analogs? We address this question by comparing human performance on visual analogies created using images of familiar three-dimensional objects (cars and their sub… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  23. arXiv:2104.10195  [pdf, other

    eess.IV cs.CV

    Auto-FedAvg: Learnable Federated Averaging for Multi-Institutional Medical Image Segmentation

    Authors: Yingda Xia, Dong Yang, Wenqi Li, Andriy Myronenko, Daguang Xu, Hirofumi Obinata, Hitoshi Mori, Peng An, Stephanie Harmon, Evrim Turkbey, Baris Turkbey, Bradford Wood, Francesca Patella, Elvira Stellato, Gianpaolo Carrafiello, Anna Ierardi, Alan Yuille, Holger Roth

    Abstract: Federated learning (FL) enables collaborative model training while preserving each participant's privacy, which is particularly beneficial to the medical field. FedAvg is a standard algorithm that uses fixed weights, often originating from the dataset sizes at each client, to aggregate the distributed learned models on a server during the FL process. However, non-identical data distribution across… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  24. arXiv:2104.08683  [pdf, other

    cs.CV

    Self-Supervised Pillar Motion Learning for Autonomous Driving

    Authors: Chenxu Luo, Xiaodong Yang, Alan Yuille

    Abstract: Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest in estimating class-agnostic motion directly from point clouds. Current motion estimation methods usually require vast amount of annotated training data from self-driving scenes. However, manually labeling p… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: cvpr2021

  25. arXiv:2104.07645  [pdf, other

    cs.CV

    A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

    Authors: Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

    Abstract: Recent work has made significant progress on using implicit functions, as a continuous representation for 3D rigid object shape reconstruction. However, much less effort has been devoted to modeling general articulated objects. Compared to rigid objects, articulated objects have higher degrees of freedom, which makes it hard to generalize to unseen shapes. To deal with the large shape variance, we… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Our project page is available at: https://jitengmu.github.io/A-SDF/

  26. arXiv:2103.15858  [pdf, other

    eess.IV cs.CV

    CateNorm: Categorical Normalization for Robust Medical Image Segmentation

    Authors: Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou

    Abstract: Batch normalization (BN) uniformly shifts and scales the activations based on the statistics of a batch of images. However, the intensity distribution of the background pixels often dominates the BN statistics because the background accounts for a large proportion of the entire image. This paper focuses on enhancing BN with the intensity distribution of foreground pixels, the one that really matte… ▽ More

    Submitted 4 August, 2022; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: Accepted by MICCAI 2022 Workshop on Domain Adaptation and Representation Transfer (DART)

  27. arXiv:2103.14098  [pdf, other

    cs.CV

    Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles

    Authors: Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille

    Abstract: Part segmentations provide a rich and detailed part-level description of objects. However, their annotation requires an enormous amount of work, which makes it difficult to apply standard deep learning methods. In this paper, we propose the idea of learning part segmentation through unsupervised domain adaptation (UDA) from synthetic data. We first introduce UDA-Part, a comprehensive part segmenta… ▽ More

    Submitted 3 April, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: CVPR 2022 (Oral)

  28. arXiv:2103.12886  [pdf, other

    cs.CV

    Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

    Authors: Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang

    Abstract: Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions. We show that these issues can be better addressed by training with weakly labeled videos instead of images. In videos… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 14 pages, 8 figures, accepted by CVPR 2021

  29. arXiv:2103.05525  [pdf, other

    eess.IV cs.CV

    Multi-phase Deformable Registration for Time-dependent Abdominal Organ Variations

    Authors: Seyoun Park, Elliot K. Fishman, Alan L. Yuille

    Abstract: Human body is a complex dynamic system composed of various sub-dynamic parts. Especially, thoracic and abdominal organs have complex internal shape variations with different frequencies by various reasons such as respiration with fast motion and peristalsis with slower motion. CT protocols for abdominal lesions are multi-phase scans for various tumor detection to use different vascular contrast, h… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  30. arXiv:2103.05170  [pdf, other

    cs.CV

    Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining

    Authors: Jieneng Chen, Ke Yan, Yu-Dong Zhang, Youbao Tang, Xun Xu, Shuwen Sun, Qiu** Liu, Lingyun Huang, **g Xiao, Alan L. Yuille, Ya Zhang, Le Lu

    Abstract: The boundary of tumors (hepatocellular carcinoma, or HCC) contains rich semantics: capsular invasion, visibility, smoothness, folding and protuberance, etc. Capsular invasion on tumor boundary has proven to be clinically correlated with the prognostic indicator, microvascular invasion (MVI). Investigating tumor boundary semantics has tremendous clinical values. In this paper, we propose the first… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  31. arXiv:2102.11343  [pdf, other

    cs.LG cs.CV

    Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Map**

    Authors: Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille

    Abstract: Catastrophic forgetting in neural networks is a significant problem for continual learning. A majority of the current methods replay previous data during training, which violates the constraints of an ideal continual learning system. Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i.e. the worsening ability to discriminate between data fro… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  32. arXiv:2102.09559  [pdf, other

    cs.CV

    CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

    Authors: Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, Fan Yang

    Abstract: Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised learning (SSL) methods are known to perform poorly on minority classes, we find that they still generate high precision pseudo-labels on minority classes. By exploiting this property, in this work, we propose Class-Rebalancing Self-Training (CReST), a simple yet e… ▽ More

    Submitted 17 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: To appear in CVPR 2021. Code release: https://github.com/google-research/crest

  33. arXiv:2102.04306  [pdf, other

    cs.CV

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Authors: Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, Yuyin Zhou

    Abstract: Medical image segmentation is an essential prerequisite for develo** healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrate… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: 13 pages, 3 figures

  34. arXiv:2102.01558  [pdf, other

    cs.CV

    Occluded Video Instance Segmentation: A Benchmark

    Authors: Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

    Abstract: Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usua… ▽ More

    Submitted 17 May, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: IJCV 2022. Project page at https://songbai.site/ovis

    MSC Class: 68T07; 68T45

  35. arXiv:2101.12378  [pdf, other

    cs.CV

    NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

    Authors: Angtian Wang, Adam Kortylewski, Alan Yuille

    Abstract: 3D pose estimation is a challenging but important task in computer vision. In this work, we show that standard deep learning approaches to 3D pose estimation are not robust when objects are partially occluded or viewed from a previously unseen pose. Inspired by the robustness of generative vision models to partial occlusion, we propose to integrate deep neural networks with 3D generative represent… ▽ More

    Submitted 4 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: Accepted by ICLR 2021. Code is publicly available

  36. arXiv:2101.11878  [pdf, other

    cs.CV

    CORL: Compositional Representation Learning for Few-Shot Classification

    Authors: Ju He, Adam Kortylewski, Alan Yuille

    Abstract: Few-shot image classification consists of two consecutive learning processes: 1) In the meta-learning stage, the model acquires a knowledge base from a set of training classes. 2) During meta-testing, the acquired knowledge is used to recognize unseen classes from very few examples. Inspired by the compositional representation of objects in humans, we train a neural network architecture that expli… ▽ More

    Submitted 16 December, 2022; v1 submitted 28 January, 2021; originally announced January 2021.

  37. arXiv:2012.07181  [pdf, other

    cs.CV

    Meticulous Object Segmentation

    Authors: Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zhe Lin, Alan Yuille

    Abstract: Compared with common image segmentation tasks targeted at low-resolution images, higher resolution detailed image segmentation receives much less attention. In this paper, we propose and study a task named Meticulous Object Segmentation (MOS), which is focused on segmenting well-defined foreground objects with elaborate shapes in high resolution images (e.g. 2k - 4k). To this end, we propose the M… ▽ More

    Submitted 13 December, 2020; originally announced December 2020.

  38. arXiv:2012.06722  [pdf, other

    cs.CV

    Mask Guided Matting via Progressive Refinement Network

    Authors: Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille

    Abstract: We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance. MG Matting leverages a network (PRN) design which encourages the matting model to provide self-guidance to progressively refine the uncertain regions through the decoding process. A series of guidance mask perturbation operations are also introduced in the training to further enhance its r… ▽ More

    Submitted 1 April, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: CVPR 2021, code available at https://github.com/yucornetto/MGMatting

  39. arXiv:2012.05258  [pdf, other

    cs.CV

    ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

    Authors: Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    Abstract: In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective image sequences while providing each point with instance-level semantic interpretations. Solving this problem requires the vision models to predict the spatial location, semantic class, and… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Video: https://youtu.be/XR4HFiwwao0 GitHub: https://github.com/joe-siyuan-qiao/ViP-DeepLab

  40. arXiv:2012.02107  [pdf, other

    cs.CV

    Robust Instance Segmentation through Reasoning about Multi-Object Occlusion

    Authors: Xiaoding Yuan, Adam Kortylewski, Yihong Sun, Alan Yuille

    Abstract: Analyzing complex scenes with Deep Neural Networks is a challenging task, particularly when images contain multiple objects that partially occlude each other. Existing approaches to image analysis mostly process objects independently and do not take into account the relative occlusion of nearby objects. In this paper, we propose a deep network for multi-object instance segmentation that is robust… ▽ More

    Submitted 1 April, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: Accepted by CVPR 2021

  41. arXiv:2012.00759  [pdf, other

    cs.CV

    MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

    Authors: Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    Abstract: We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-De… ▽ More

    Submitted 12 July, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  42. arXiv:2012.00558  [pdf, other

    cs.CV

    Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

    Authors: Christian Cosgrove, Adam Kortylewski, Chenglin Yang, Alan Yuille

    Abstract: Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. While progress has been made in defending against imperceptible attacks, it remains unclear how patch-based attacks can be resisted. In this work, we study two different approaches for defending against black-box patch attacks. First, we show that adversarial training, which is… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  43. arXiv:2012.00313  [pdf, other

    cs.CV

    Unsupervised Part Discovery via Feature Alignment

    Authors: Mengqi Guo, Yutong Bai, Zhishuai Zhang, Adam Kortylewski, Alan Yuille

    Abstract: Understanding objects in terms of their individual parts is important, because it enables a precise understanding of the objects' geometrical structure, and enhances object recognition when the object is seen in a novel pose or under partial occlusion. However, the manual annotation of parts in large scale datasets is time consuming and expensive. In this paper, we aim at discovering object parts… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 10 pages, 9 figures, submitted to CVPR 2021

  44. arXiv:2012.00088  [pdf, other

    cs.CV cs.RO

    Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation

    Authors: Qihao Liu, Weichao Qiu, Weiyao Wang, Gregory D. Hager, Alan L. Yuille

    Abstract: We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori, and then adapt it to the task of category-independent articulated object pose estimation. We combine a classical geometric formulation with deep learning and extend the use of epipolar constraint to multi-rigid-body systems… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: 10 pages, 3 figures

  45. arXiv:2011.14150  [pdf, other

    cs.CV

    Batch Normalization with Enhanced Linear Transformation

    Authors: Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

    Abstract: Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions. In this paper, we demonstrate properly enhancing this linear transformation module can effectively improve the ability of BN. Specifically, rather than using a single neuron, we propose to additionally con… ▽ More

    Submitted 28 November, 2020; originally announced November 2020.

    Comments: 12 pages. The code is available at https://github.com/yuhuixu1993/BNET

  46. arXiv:2011.13046  [pdf, other

    cs.CV

    Can Temporal Information Help with Contrastive Self-Supervised Learning?

    Authors: Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille

    Abstract: Leveraging temporal information has been regarded as essential for develo** video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive self-supervised learning (CSL) framework remains unclear. As an intuitive solution, we find that directly applying temporal augmentations does not help, or even impair… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  47. Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-fine Framework and Its Adversarial Examples

    Authors: Yingwei Li, Zhuotun Zhu, Yuyin Zhou, Yingda Xia, Wei Shen, Elliot K. Fishman, Alan L. Yuille

    Abstract: Although deep neural networks have been a dominant method for many 2D vision tasks, it is still challenging to apply them to 3D tasks, such as medical image segmentation, due to the limited amount of annotated 3D data and limited computational resources. In this chapter, by rethinking the strategy to apply 3D Convolutional Neural Networks to segment medical images, we propose a novel 3D-based coar… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.00201

  48. arXiv:2010.13175  [pdf, other

    cs.CV

    Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model

    Authors: Yihong Sun, Adam Kortylewski, Alan Yuille

    Abstract: Amodal completion is a visual task that humans perform easily but which is difficult for computer vision algorithms. The aim is to segment those object boundaries which are occluded and hence invisible. This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distributio… ▽ More

    Submitted 9 July, 2022; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: CVPR 2022

  49. arXiv:2010.05981  [pdf, other

    cs.CV

    Shape-Texture Debiased Neural Network Training

    Authors: Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

    Abstract: Shape and texture are two prominent and complementary cues for recognizing objects. Nonetheless, Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset. Our ablation shows that such bias degenerates model performance. Motivated by this observation, we develop a simple algorithm for shape-texture debiased learning. To prevent models from ex… ▽ More

    Submitted 30 March, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: ICLR 2021. The code is available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining

  50. arXiv:2010.02217  [pdf, other

    cs.CV

    CO2: Consistent Contrast for Unsupervised Visual Representation Learning

    Authors: Chen Wei, Huiyu Wang, Wei Shen, Alan Yuille

    Abstract: Contrastive learning has been adopted as a core method for unsupervised visual representation learning. Without human annotation, the common practice is to perform an instance discrimination task: Given a query image crop, this task labels crops from the same image as positives, and crops from other randomly sampled images as negatives. An important limitation of this label assignment strategy is… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.