Skip to main content

Showing 1–27 of 27 results for author: Kirillov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.05666  [pdf, other

    cs.CV

    YaART: Yet Another ART Rendering Technology

    Authors: Sergey Kastryulin, Artem Konev, Alexander Shishenya, Eugene Lyapustin, Artem Khurshudov, Alexander Tselousov, Nikita Vinokurov, Denis Kuznedelev, Alexander Markovich, Grigoriy Livshits, Alexey Kirillov, Anastasiia Tabisheva, Liubov Chubarova, Marina Kaminskaia, Alexander Ustyuzhanin, Artemii Shvetsov, Daniil Shlenskii, Valerii Startsev, Dmitrii Kornilov, Mikhail Romanov, Artem Babenko, Sergei Ovcharenko, Valentin Khrulkov

    Abstract: In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

  2. arXiv:2306.05411  [pdf, other

    cs.CV

    R-MAE: Regions Meet Masked Autoencoders

    Authors: Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

    Abstract: In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions. Specifically, we design an architecture which efficiently addresses the one-to-many map** between images and regions,… ▽ More

    Submitted 4 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  3. arXiv:2304.02643  [pdf, other

    cs.CV cs.AI cs.LG

    Segment Anything

    Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

    Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project web-page: https://segment-anything.com

  4. arXiv:2202.04639  [pdf, other

    cs.CV

    Point-Level Region Contrast for Object Detection Pre-Training

    Authors: Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg

    Abstract: In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. This approach is motivated by the two key factors in detection: localization and recognition. While accurate localization favors models that operate at the pixel- or point-level, correct recognition typically relies on a more holistic, region-level view of objects. Incorpo… ▽ More

    Submitted 18 April, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: CVPR 2022 (Oral)

  5. arXiv:2112.12750  [pdf, other

    cs.CV

    SLIP: Self-supervision meets Language-Image Pre-training

    Authors: Norman Mu, Alexander Kirillov, David Wagner, Saining Xie

    Abstract: Recent work has shown that self-supervised pre-training leads to improvements over supervised learning on challenging visual recognition tasks. CLIP, an exciting new approach to learning with language supervision, demonstrates promising performance on a wide variety of benchmarks. In this work, we explore whether self-supervised learning can aid in the use of language supervision for visual repres… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: Code: https://github.com/facebookresearch/SLIP

  6. arXiv:2112.10764  [pdf, other

    cs.CV cs.AI cs.LG

    Mask2Former for Video Instance Segmentation

    Authors: Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

    Abstract: We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: Code and models: https://github.com/facebookresearch/Mask2Former

  7. arXiv:2112.01527  [pdf, other

    cs.CV cs.AI cs.LG

    Masked-attention Mask Transformer for Universal Image Segmentation

    Authors: Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

    Abstract: Image segmentation is about grou** pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmenta… ▽ More

    Submitted 15 June, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Project page/code/models: https://bowenc0221.github.io/mask2former

  8. arXiv:2112.01520  [pdf, other

    cs.CV

    Recognizing Scenes from Novel Viewpoints

    Authors: Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari

    Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoint… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  9. arXiv:2107.06278  [pdf, other

    cs.CV

    Per-Pixel Classification is Not All You Need for Semantic Segmentation

    Authors: Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

    Abstract: Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this ob… ▽ More

    Submitted 31 October, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021, Spotlight. Project page: https://bowenc0221.github.io/maskformer

  10. arXiv:2104.06404  [pdf, other

    cs.CV

    Pointly-Supervised Instance Segmentation

    Authors: Bowen Cheng, Omkar Parkhi, Alexander Kirillov

    Abstract: We propose an embarrassingly simple point annotation scheme to collect weak supervision for instance segmentation. In addition to bounding boxes, we collect binary labels for a set of points uniformly sampled inside each bounding box. We show that the existing instance segmentation models developed for full mask supervision can be seamlessly trained with point-based supervision collected via our s… ▽ More

    Submitted 15 June, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: CVPR 2022, Oral. Project page: https://bowenc0221.github.io/point-sup

  11. arXiv:2103.16562  [pdf, other

    cs.CV

    Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

    Authors: Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

    Abstract: We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects. The new quality me… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021, project page: https://bowenc0221.github.io/boundary-iou

  12. arXiv:2102.11273  [pdf, other

    cs.CV cs.LG

    On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness

    Authors: Eric Mintun, Alexander Kirillov, Saining Xie

    Abstract: Invariance to a broad array of image corruptions, such as war**, noise, or color shifts, is an important aspect of building robust models in computer vision. Recently, several new data augmentations have been proposed that significantly improve performance on ImageNet-C, a benchmark of such corruptions. However, there is still a lack of basic understanding on the relationship between data augmen… ▽ More

    Submitted 19 November, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: Code available at https://github.com/facebookresearch/augmentation-corruption

  13. arXiv:2102.01066  [pdf, other

    cs.CV

    Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

    Authors: Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick

    Abstract: By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On one hand, this is desirable as it treats all classes equally. On the other hand, it ignores cross-category confidence calibration, a key property in real-world use cases. Unfortunately, under important conditions (i.e., large vocabulary, high in… ▽ More

    Submitted 15 March, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

  14. arXiv:2101.02702  [pdf, other

    cs.CV

    TrackFormer: Multi-Object Tracking with Transformers

    Authors: Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer

    Abstract: The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture. Our model achieves data association between frames via atten… ▽ More

    Submitted 29 April, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

  15. arXiv:2005.12872  [pdf, other

    cs.CV

    End-to-End Object Detection with Transformers

    Authors: Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

    Abstract: We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DET… ▽ More

    Submitted 28 May, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  16. arXiv:1912.08193  [pdf, other

    cs.CV

    PointRend: Image Segmentation as Rendering

    Authors: Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick

    Abstract: We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network… ▽ More

    Submitted 16 February, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

    Comments: Technical Report

  17. arXiv:1904.01569  [pdf, other

    cs.CV cs.LG

    Exploring Randomly Wired Neural Networks for Image Recognition

    Authors: Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He

    Abstract: Neural networks for image recognition have evolved through extensive manual design from simple chain-like models to structures with multiple wiring paths. The success of ResNets and DenseNets is due in large part to their innovative wiring plans. Now, neural architecture search (NAS) studies are exploring the joint optimization of wiring and operation types, however, the space of possible wirings… ▽ More

    Submitted 8 April, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

    Comments: Technical report

  18. arXiv:1901.02446  [pdf, other

    cs.CV

    Panoptic Feature Pyramid Networks

    Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we… ▽ More

    Submitted 10 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: accepted to CVPR 2019

  19. arXiv:1805.09559  [pdf, other

    cs.IR cs.CL

    WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

    Authors: Alexander Kirillov, Natalia Krizhanovsky, Andrew Krizhanovsky

    Abstract: The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new… ▽ More

    Submitted 18 June, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences

    MSC Class: 68T50 ACM Class: I.5.3; H.3.1; H.3.3

    Journal ref: Transactions of Karelian Research Centre RAS. No. 7. 2018. P. 149-163

  20. arXiv:1803.01580  [pdf, ps, other

    cs.CL cs.IR

    Calculated attributes of synonym sets

    Authors: Andrew Krizhanovsky, Alexander Kirillov

    Abstract: The goal of formalization, proposed in this paper, is to bring together, as near as possible, the theoretic linguistic problem of synonym conception and the computer linguistic methods based generally on empirical intuitive unjustified factors. Using the word vector representation we have proposed the geometric approach to mathematical modeling of synonym set (synset). The word embedding is based… ▽ More

    Submitted 5 March, 2018; originally announced March 2018.

    Comments: 6 pages, 2 tables, 2 figures, preprint

    MSC Class: 68T50 ACM Class: I.5.3; H.3.1; H.3.3

  21. arXiv:1801.00868  [pdf, other

    cs.CV

    Panoptic Segmentation

    Authors: Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár

    Abstract: We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems.… ▽ More

    Submitted 10 April, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

    Comments: accepted to CVPR 2019

  22. Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

    Authors: Omid Hosseini Jafari, Oliver Groth, Alexander Kirillov, Michael Ying Yang, Carsten Rother

    Abstract: This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most previous works solely focus… ▽ More

    Submitted 26 February, 2017; originally announced February 2017.

    Comments: Accepted to ICRA 2017

  23. arXiv:1612.02287  [pdf, other

    cs.CV

    Global Hypothesis Generation for 6D Object Pose Estimation

    Authors: Frank Michel, Alexander Kirillov, Eric Brachmann, Alexander Krull, Stefan Gumhold, Bogdan Savchynskyy, Carsten Rother

    Abstract: This paper addresses the task of estimating the 6D pose of a known 3D object from a single RGB-D image. Most modern approaches solve this task in three steps: i) Compute local features; ii) Generate a pool of pose-hypotheses; iii) Select and refine a pose from the pool. This work focuses on the second step. While all existing approaches generate the hypotheses pool via local reasoning, e.g. RANSAC… ▽ More

    Submitted 2 January, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

  24. arXiv:1611.08272  [pdf, other

    cs.CV

    InstanceCut: from Edges to Instances with MultiCut

    Authors: Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, Carsten Rother

    Abstract: This work addresses the task of instance-aware semantic segmentation. Our key motivation is to design a simple method with a new modelling-paradigm, which therefore has a different trade-off between advantages and disadvantages compared to known approaches. Our approach, we term InstanceCut, represents the problem by two output modalities: (i) an instance-agnostic semantic segmentation and (ii) al… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

    Comments: The code would be released at https://github.com/alexander-kirillov/InstanceCut

  25. arXiv:1611.04399  [pdf, other

    cs.CV cs.DM

    Joint Graph Decomposition and Node Labeling: Problem, Algorithms, Applications

    Authors: Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, Bjoern Andres

    Abstract: We state a combinatorial optimization problem whose feasible solutions define both a decomposition and a node labeling of a given graph. This problem offers a common mathematical abstraction of seemingly unrelated computer vision tasks, including instance-separating semantic segmentation, articulated human body pose estimation and multiple object tracking. Conceptually, the problem we state genera… ▽ More

    Submitted 21 February, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

  26. arXiv:1606.07015  [pdf, other

    cs.CV

    Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization

    Authors: Alexander Kirillov, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy

    Abstract: We consider the problem of jointly inferring the M-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric… ▽ More

    Submitted 23 June, 2016; v1 submitted 22 June, 2016; originally announced June 2016.

  27. arXiv:1511.05067  [pdf, other

    cs.CV

    Joint Training of Generic CNN-CRF Models with Stochastic Optimization

    Authors: Alexander Kirillov, Dmitrij Schlesinger, Shuai Zheng, Bogdan Savchynskyy, Philip H. S. Torr, Carsten Rother

    Abstract: We propose a new CNN-CRF end-to-end learning framework, which is based on joint stochastic optimization with respect to both Convolutional Neural Network (CNN) and Conditional Random Field (CRF) parameters. While stochastic gradient descent is a standard technique for CNN training, it was not used for joint models so far. We show that our learning method is (i) general, i.e. it applies to arbitrar… ▽ More

    Submitted 14 September, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: ACCV2016