Skip to main content

Showing 1–50 of 109 results for author: Belongie, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19238  [pdf, other

    cs.CL cs.CY cs.LG

    Revealing Fine-Grained Values and Opinions in Large Language Models

    Authors: Dustin Wright, Arnav Arora, Nadav Borenstein, Srishti Yadav, Serge Belongie, Isabelle Augenstein

    Abstract: Uncovering latent values and opinions in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by presenting LLMs with survey questions and quantifying their stances towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 28 pages, 20 figures, 7 tables

  2. arXiv:2406.04898  [pdf, other

    cs.CV

    Labeled Data Selection for Category Discovery

    Authors: Bingchen Zhao, Nico Lang, Serge Belongie, Oisin Mac Aodha

    Abstract: Category discovery methods aim to find novel categories in unlabeled visual data. At training time, a set of labeled and unlabeled images are provided, where the labels correspond to the categories present in the images. The labeled data provides guidance during training by indicating what types of visual properties and features are relevant for performing discovery in the unlabeled data. As a res… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.04332  [pdf, other

    cs.CV cs.LG

    Coarse-To-Fine Tensor Trains for Compact Visual Representations

    Authors: Sebastian Loeschcke, Dan Wang, Christian Leth-Espensen, Serge Belongie, Michael J. Kastoryano, Sagie Benaim

    Abstract: The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sebulo.github.io/PuTT_website/

  4. arXiv:2405.16528  [pdf, other

    cs.LG cs.CL

    LoQT: Low Rank Adapters for Quantized Training

    Authors: Sebastian Loeschcke, Mads Toftrup, Michael J. Kastoryano, Serge Belongie, Vésteinn Snæbjarnarson

    Abstract: Training of large neural networks requires significant computational resources. Despite advances using low-rank adapters and quantization, pretraining of models such as LLMs on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose LoQT, a method for efficiently training quantized models. L… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2405.02771  [pdf, other

    cs.CV cs.AI cs.LG

    MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

    Authors: Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang

    Abstract: The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scal… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Data and code is available on the project page: https://vishalned.github.io/mmearth

  6. arXiv:2403.00592  [pdf, other

    cs.CV

    Rethinking Few-shot 3D Point Cloud Semantic Segmentation

    Authors: Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie

    Abstract: This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS), with a focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. The former arises from non-uniform point sampling, allowing models to distinguish the density disparities between foreground and background for easier segmentation. The latter results from sampling only 2,048 p… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  7. arXiv:2311.05006  [pdf, other

    cs.CV cs.LG

    Familiarity-Based Open-Set Recognition Under Adversarial Attacks

    Authors: Philip Enevoldsen, Christian Gundersen, Nico Lang, Serge Belongie, Christian Igel

    Abstract: Open-set recognition (OSR), the identification of novel categories, can be a critical component when deploying classification models in real-world applications. Recent work has shown that familiarity-based scoring rules such as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are strong baselines when the closed-set accuracy is high. However, one of the potential weaknesses o… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Published in: The 2nd Workshop and Challenges for Out-of-Distribution Generalization in Computer Vision, ICCV 2023

  8. arXiv:2309.10359  [pdf, other

    cs.CL

    Prompt, Condition, and Generate: Classification of Unsupported Claims with In-Context Learning

    Authors: Peter Ebert Christensen, Srishti Yadav, Serge Belongie

    Abstract: Unsupported and unfalsifiable claims we encounter in our daily lives can influence our view of the world. Characterizing, summarizing, and -- more generally -- making sense of such claims, however, can be challenging. In this work, we focus on fine-grained debate topics and formulate a new task of distilling, from such claims, a countable set of narratives. We present a crowdsourced dataset of 12… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  9. arXiv:2308.16900  [pdf, other

    cs.LG

    Learning to Taste: A Multimodal Wine Dataset

    Authors: Thoranna Bender, Simon Moe Sørensen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg

    Abstract: We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor anno… ▽ More

    Submitted 15 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted to NeurIPS 2023. See project page: https://thoranna.github.io/learning_to_taste/

  10. arXiv:2305.02360  [pdf, other

    cs.CV cs.AI

    Fashionpedia-Ads: Do Your Favorite Advertisements Reveal Your Fashion Taste?

    Authors: Mengyun Shi, Claire Cardie, Serge Belongie

    Abstract: Consumers are exposed to advertisements across many different domains on the internet, such as fashion, beauty, car, food, and others. On the other hand, fashion represents second highest e-commerce shop** category. Does consumer digital record behavior on various fashion ad images reveal their fashion taste? Does ads from other domains infer their fashion taste as well? In this paper, we study… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  11. arXiv:2305.02307  [pdf, other

    cs.CV cs.AI cs.DB

    Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste

    Authors: Mengyun Shi, Serge Belongie, Claire Cardie

    Abstract: Existing fashion datasets do not consider the multi-facts that cause a consumer to like or dislike a fashion image. Even two consumers like a same fashion image, they could like this image for total different reasons. In this paper, we study the reason why a consumer like a certain fashion image. Towards this goal, we introduce an interpretability dataset, Fashionpedia-taste, consist of rich annot… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  12. arXiv:2303.17155  [pdf, other

    cs.CV cs.AI

    Discriminative Class Tokens for Text-to-Image Diffusion Models

    Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

    Abstract: Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised da… ▽ More

    Submitted 10 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  13. arXiv:2302.04862  [pdf, other

    cs.CV cs.LG

    Polynomial Neural Fields for Subband Decomposition and Manipulation

    Authors: Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

    Abstract: Neural fields have emerged as a new paradigm for representing signals, thanks to their ability to do it compactly while being easy to optimize. In most applications, however, neural fields are treated like black boxes, which precludes many signal manipulation tasks. In this paper, we propose a new class of neural fields called polynomial neural fields (PNFs). The key advantage of a PNF is that it… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to NeurIPS 2022

  14. arXiv:2212.10564  [pdf, other

    cs.CL cs.AI cs.LG

    Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

    Authors: Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

    Abstract: Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger… ▽ More

    Submitted 12 April, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: NAACL Findings 2024

  15. arXiv:2211.15673  [pdf, other

    cs.LG

    PyTorch Adapt

    Authors: Kevin Musgrave, Serge Belongie, Ser-Nam Lim

    Abstract: PyTorch Adapt is a library for domain adaptation, a type of machine learning algorithm that re-purposes existing models to work in new domains. It is a fully-featured toolkit, allowing users to create a complete train/test pipeline in a few lines of code. It is also modular, so users can import just the parts they need, and not worry about being locked into a framework. One defining feature of thi… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  16. arXiv:2211.09782  [pdf, other

    cs.CV cs.CR cs.LG

    Assessing Neural Network Robustness via Adversarial Pivotal Tuning

    Authors: Peter Ebert Christensen, Vésteinn Snæbjarnarson, Andrea Dittadi, Serge Belongie, Sagie Benaim

    Abstract: The robustness of image classifiers is essential to their deployment in the real world. The ability to assess this resilience to manipulations or deviations from the training data is thus crucial. These modifications have traditionally consisted of minimal changes that still manage to fool classifiers, and modern approaches are increasingly robust to them. Semantic manipulations that modify elemen… ▽ More

    Submitted 6 January, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Major changes include new experiments in Table 1 on page 5 and Table 2-4 on page 6, new figure 5 on page 8. Paper accepted at WACV (oral)

  17. arXiv:2209.00495  [pdf, other

    cs.CL cs.LG cs.SI

    Searching for Structure in Unfalsifiable Claims

    Authors: Peter Ebert Christensen, Frederik Warburg, Menglin Jia, Serge Belongie

    Abstract: Social media platforms give rise to an abundance of posts and comments on every topic imaginable. Many of these posts express opinions on various aspects of society, but their unfalsifiable nature makes them ill-suited to fact-checking pipelines. In this work, we aim to distill such posts into a small set of narratives that capture the essential claims related to a given topic. Understanding and v… ▽ More

    Submitted 19 August, 2022; originally announced September 2022.

    Comments: 30 pages, 9 main Figures, 5 main Tables Website: https://captaine.github.io/Searching-for-Structure-in-Unfalsifiable-Claims/ Github repo: https://github.com/captainE/Searching-for-Structure-in-Unfalsifiable-Claims

  18. arXiv:2208.07360  [pdf, other

    cs.CV cs.LG

    Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation

    Authors: Kevin Musgrave, Serge Belongie, Ser-Nam Lim

    Abstract: Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the evaluation of model checkpoints, which is done through the use of "validators". In a supervised setting, these validators evaluate checkpoints by computing accuracy on… ▽ More

    Submitted 17 May, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: This paper was previously titled Benchmarking Validation Methods for Unsupervised Domain Adaptation. This version contains new experiments, analysis, and figures

  19. arXiv:2207.10664  [pdf, other

    cs.CV cs.LG

    Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

    Authors: Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie

    Abstract: We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorization on images, the counterparts in audio and video fine-grained categorization are relatively unexplored. To encourage advancements in this space, we have carefully constructed the SSW60 datas… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 Camera Ready

  20. arXiv:2207.10225  [pdf, other

    cs.CV cs.LG

    On Label Granularity and Object Localization

    Authors: Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

    Abstract: Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation w… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  21. arXiv:2207.07646  [pdf, other

    cs.CV cs.LG

    Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

    Authors: Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui

    Abstract: Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition. In this work, we extend this paradigm by leveraging motion and audio that naturally exist in video. We present \textbf{MOV}, a simple yet effective method for \textbf{M}ultimodal \textbf{O}pen-\textbf{V}ocabulary video classification. In M… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  22. arXiv:2206.12396  [pdf, other

    cs.CV

    Text-Driven Stylization of Video Objects

    Authors: Sebastian Loeschcke, Serge Belongie, Sagie Benaim

    Abstract: We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resulting stylization must preserve both the global semantics of the object and its fine-grained details,… ▽ More

    Submitted 27 June, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

  23. arXiv:2206.02776  [pdf, other

    cs.CV

    Volumetric Disentanglement for 3D Scene Manipulation

    Authors: Sagie Benaim, Frederik Warburg, Peter Ebert Christensen, Serge Belongie

    Abstract: Recently, advances in differential volumetric rendering enabled significant breakthroughs in the photo-realistic and fine-detailed reconstruction of complex 3D scenes, which is key for many virtual reality applications. However, in the context of augmented reality, one may also wish to effect semantic manipulations or augmentations of objects within a scene. To this end, we propose a volumetric fr… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  24. arXiv:2203.12119  [pdf, other

    cs.CV

    Visual Prompt Tuning

    Authors: Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, Ser-Nam Lim

    Abstract: The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amo… ▽ More

    Submitted 20 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: ECCV2022

  25. arXiv:2202.04036  [pdf, other

    cs.CV

    Residual Aligned: Gradient Optimization for Non-Negative Image Synthesis

    Authors: Flora Yu Shen, Katie Luo, Guandao Yang, Harald Haraldsson, Serge Belongie

    Abstract: In this work, we address an important problem of optical see through (OST) augmented reality: non-negative image synthesis. Most of the image generation methods fail under this condition, since they assume full control over each pixel and cannot create darker pixels by adding light. In order to solve the non-negative image generation problem in AR image synthesis, prior works have attempted to uti… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  26. arXiv:2202.00659  [pdf, other

    cs.CV cs.GR

    Stay Positive: Non-Negative Image Synthesis for Augmented Reality

    Authors: Katie Luo, Guandao Yang, Wenqi Xian, Harald Haraldsson, Bharath Hariharan, Serge Belongie

    Abstract: In applications such as optical see-through and projector augmented reality, producing images amounts to solving non-negative image generation, where one can only add light to an existing image. Most image generation methods, however, are ill-suited to this problem setting, as they make the assumption that one can assign arbitrary color to each pixel. In fact, naive application of existing methods… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10050-10060

  27. arXiv:2201.03546  [pdf, other

    cs.CV cs.CL cs.LG

    Language-driven Semantic Segmentation

    Authors: Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl

    Abstract: We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding… ▽ More

    Submitted 2 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  28. arXiv:2112.08459  [pdf, other

    cs.CV

    Rethinking Nearest Neighbors for Visual Classification

    Authors: Menglin Jia, Bor-Chun Chen, Zuxuan Wu, Claire Cardie, Serge Belongie, Ser-Nam Lim

    Abstract: Neural network classifiers have become the de-facto choice for current "pre-train then fine-tune" paradigms of visual classification. In this paper, we investigate k-Nearest-Neighbor (k-NN) classifiers, a classical model-free learning method from the pre-deep learning era, as an augmentation to modern neural network based approaches. As a lazy learning method, k-NN simply aggregates the distance b… ▽ More

    Submitted 17 December, 2021; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Modified paragraph spacing

  29. arXiv:2112.04480  [pdf, other

    cs.CV cs.LG

    Exploring Temporal Granularity in Self-Supervised Video Representation Learning

    Authors: Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

    Abstract: This work presents a self-supervised learning framework named TeG to explore Temporal Granularity in learning video representations. In TeG, we sample a long clip from a video and a short clip that lies inside the long clip. We then extract their dense temporal embeddings. The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between co… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  30. arXiv:2111.15672  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation: A Reality Check

    Authors: Kevin Musgrave, Serge Belongie, Ser-Nam Lim

    Abstract: Interest in unsupervised domain adaptation (UDA) has surged in recent years, resulting in a plethora of new algorithms. However, as is often the case in fast-moving fields, baseline algorithms are not tested to the extent that they should be. Furthermore, little attention has been paid to validation methods, i.e. the methods for estimating the accuracy of a model in the absence of target domain la… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

  31. arXiv:2111.07950  [pdf, other

    cs.CV

    Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

    Authors: Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

    Abstract: Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario. OVIS consists of 296k high-quality instance masks and… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: Accepted by NeurIPS 2021 Datasets and Benchmarks Track. arXiv admin note: text overlap with arXiv:2102.01558

    MSC Class: 68T07; 68T45

  32. arXiv:2111.06119  [pdf, other

    cs.CV cs.LG

    Fine-Grained Image Analysis with Deep Learning: A Survey

    Authors: Xiu-Shen Wei, Yi-Zhe Song, Oisin Mac Aodha, Jianxin Wu, Yuxin Peng, **hui Tang, Jian Yang, Serge Belongie

    Abstract: Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it… ▽ More

    Submitted 19 November, 2021; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE TPAMI

  33. arXiv:2109.02765  [pdf, other

    cs.CV cs.CR cs.LG

    Robustness and Generalization via Generative Adversarial Training

    Authors: Omid Poursaeed, Tianxing Jiang, Harry Yang, Serge Belongie, SerNam Lim

    Abstract: While deep neural networks have achieved remarkable success in various computer vision tasks, they often fail to generalize to new domains and subtle variations of input images. Several defenses have been proposed to improve the robustness against these variations. However, current defenses can only withstand the specific attack used in training, and the models often remain vulnerable to other inp… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021. arXiv admin note: substantial text overlap with arXiv:1911.09058

  34. arXiv:2108.13246  [pdf, other

    cs.CV

    LUAI Challenge 2021 on Learning to Understand Aerial Images

    Authors: Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang , et al. (11 additional authors not shown)

    Abstract: This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This cha… ▽ More

    Submitted 17 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: 7 pages, 2 figures, accepted by ICCVW 2021

  35. arXiv:2106.13804  [pdf, other

    cs.CV cs.AI cs.LG

    SITTA: Single Image Texture Translation for Data Augmentation

    Authors: Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie

    Abstract: Recent advances in data augmentation enable one to translate images by learning the map** between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of image synthesis methods for recognition ta… ▽ More

    Submitted 14 January, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Learning from Limited and Imperfect Data (L2ID) Workshop, ECCV 2022

  36. arXiv:2105.13808  [pdf, other

    cs.CV

    The Herbarium 2021 Half-Earth Challenge Dataset

    Authors: Riccardo de Lutio, Damon Little, Barbara Ambrose, Serge Belongie

    Abstract: Herbarium sheets present a unique view of the world's botanical history, evolution, and diversity. This makes them an all-important data source for botanical research. With the increased digitisation of herbaria worldwide and the advances in the fine-grained classification domain that can facilitate automatic identification of herbarium specimens, there are a lot of opportunities for supporting re… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: FGVC8 Workshop at CVPR 2021

  37. arXiv:2105.05837  [pdf, other

    cs.CV cs.LG

    When Does Contrastive Visual Representation Learning Work?

    Authors: Elijah Cole, Xuan Yang, Kimberly Wilber, Oisin Mac Aodha, Serge Belongie

    Abstract: Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on ImageNet are now relatively well understood, the field still lacks widely accepted best practices for replicating this success on other datasets. As a first step in this direction, we study contrastive… ▽ More

    Submitted 4 April, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: CVPR 2022

  38. arXiv:2104.07767  [pdf, other

    cs.CV cs.LG

    Exploring Visual Engagement Signals for Representation Learning

    Authors: Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim

    Abstract: Visual engagement in social media platforms comprises interactions with photo posts including comments, shares, and likes. In this paper, we leverage such visual engagement clues as supervisory signals for representation learning. However, learning from engagement signals is non-trivial as it is not clear how to bridge the gap between low-level visual information and high-level social interactions… ▽ More

    Submitted 14 August, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: ICCV2021 camera ready

  39. arXiv:2104.07659  [pdf, other

    cs.CV

    GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds

    Authors: Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu

    Abstract: We present GANcraft, an unsupervised neural rendering framework for generating photorealistic images of large 3D block worlds such as those created in Minecraft. Our method takes a semantic block world as input, where each block is assigned a semantic label such as dirt, grass, or water. We represent the world as a continuous volumetric function and train our model to render view-consistent photor… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  40. arXiv:2103.16483  [pdf, other

    cs.CV

    Benchmarking Representation Learning for Natural World Image Collections

    Authors: Grant Van Horn, Elijah Cole, Sara Beery, Kimberly Wilber, Serge Belongie, Oisin Mac Aodha

    Abstract: Recent progress in self-supervised learning has resulted in models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority of these approaches have restricted themselves to training on standard benchmark datasets such as ImageNet. We argue that fine-grained visual categorization problems, such a… ▽ More

    Submitted 8 June, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  41. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges

    Authors: Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Micheal Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang

    Abstract: In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper,we prese… ▽ More

    Submitted 4 December, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Accepted to IEEE TPAMI

    ACM Class: I.4.8

  42. arXiv:2102.01558  [pdf, other

    cs.CV

    Occluded Video Instance Segmentation: A Benchmark

    Authors: Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

    Abstract: Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usua… ▽ More

    Submitted 17 May, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: IJCV 2022. Project page at https://songbai.site/ovis

    MSC Class: 68T07; 68T45

  43. arXiv:2011.05558  [pdf, other

    cs.CV cs.SI

    Intentonomy: a Dataset and Study towards Human Intent Understanding

    Authors: Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim

    Abstract: An image is worth a thousand words, conveying information that goes beyond the physical visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can help the recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These imag… ▽ More

    Submitted 27 March, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: CVPR2021

  44. arXiv:2008.09164  [pdf, other

    cs.CV cs.LG

    PyTorch Metric Learning

    Authors: Kevin Musgrave, Serge Belongie, Ser-Nam Lim

    Abstract: Deep metric learning algorithms have a wide variety of applications, but implementing these algorithms can be tedious and time consuming. PyTorch Metric Learning is an open source library that aims to remove this barrier for both researchers and practitioners. The modular and flexible design allows users to easily try out different combinations of algorithms in their existing code. It also comes w… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Code and documentation is available at https://www.github.com/KevinMusgrave/pytorch-metric-learning

  45. arXiv:2008.06520  [pdf, other

    cs.CV cs.LG

    Learning Gradient Fields for Shape Generation

    Authors: Ruo** Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, Bharath Hariharan

    Abstract: In this work, we propose a novel technique to generate shapes from point cloud data. A point cloud can be viewed as samples from a distribution of 3D points whose density is concentrated near the surface of the shape. Point cloud generation thus amounts to moving randomly sampled points to high-density areas. We generate point clouds by performing stochastic gradient ascent on an unnormalized prob… ▽ More

    Submitted 18 August, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: Published in ECCV 2020 (Spotlight); Project page: https://www.cs.cornell.edu/~ruo**/ShapeGF/

  46. arXiv:2008.03800  [pdf, other

    cs.CV cs.LG

    Spatiotemporal Contrastive Video Representation Learning

    Authors: Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

    Abstract: We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn spatiotemporal visual representations from unlabeled videos. Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away. We study what makes for good data augmen… ▽ More

    Submitted 5 April, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: CVPR2021 Camera ready

  47. arXiv:2004.12276  [pdf, other

    cs.CV cs.LG eess.IV

    Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

    Authors: Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie

    Abstract: In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes). The proposed task requires both localizing an object and describing its properties. To illustrate the various aspects of this task, we focus on th… ▽ More

    Submitted 18 July, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: eccv2020

  48. arXiv:2004.11958  [pdf

    cs.CV cs.AI cs.LG eess.IV

    The Plant Pathology 2020 challenge dataset to classify foliar disease of apples

    Authors: Ranjita Thapa, Noah Snavely, Serge Belongie, Awais Khan

    Abstract: Apple orchards in the U.S. are under constant threat from a large number of pathogens and insects. Appropriate and timely deployment of disease management depends on early disease detection. Incorrect and delayed diagnosis can result in either excessive or inadequate use of chemicals, with increased production costs, environmental, and health impacts. We have manually captured 3,651 high-quality,… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: 11 pages, 5 figures, Kaggle competition website: https://www.kaggle.com/c/plant-pathology-2020-fgvc7, CVPR fine-grained visual categorization website: https://sites.google.com/view/fgvc7/competitions

    ACM Class: I.2.1; I.2.10

  49. arXiv:2004.03080  [pdf, other

    cs.CV eess.IV

    End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

    Authors: Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao

    Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stere… ▽ More

    Submitted 14 May, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020)

  50. arXiv:2004.02869  [pdf, other

    cs.CV cs.LG

    DualSDF: Semantic Shape Manipulation using a Two-Level Representation

    Authors: Zekun Hao, Hadar Averbuch-Elor, Noah Snavely, Serge Belongie

    Abstract: We are seeing a Cambrian explosion of 3D shape representations for use in machine learning. Some representations seek high expressive power in capturing high-resolution detail. Other approaches seek to represent shapes as compositions of simple parts, which are intuitive for people to understand and easy to edit and manipulate. However, it is difficult to achieve both fidelity and interpretability… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: Published in CVPR 2020