Skip to main content

Showing 1–50 of 79 results for author: Vasconcelos, N

.
  1. arXiv:2406.18554  [pdf, other

    cs.CV cs.LG

    Planted: a dataset for planted forest identification from multi-satellite time series

    Authors: Luis Miguel Pazos-Outón, Cristina Nader Vasconcelos, Anton Raichuk, Anurag Arnab, Dan Morris, Maxim Neumann

    Abstract: Protecting and restoring forest ecosystems is critical for biodiversity conservation and carbon sequestration. Forest monitoring on a global scale is essential for prioritizing and assessing conservation efforts. Satellite-based remote sensing is the only viable solution for providing global coverage, but to date, large-scale forest monitoring is limited to single modalities and single time points… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

  2. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  3. arXiv:2405.03190  [pdf, other

    cs.CV

    Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval

    Authors: Jiacheng Cheng, Hijung Valentina Shin, Nuno Vasconcelos, Bryan Russell, Fabian Caba Heilbron

    Abstract: In the recent years, the dual-encoder vision-language models (\eg CLIP) have achieved remarkable text-to-image retrieval performance. However, we discover that these models usually results in very different retrievals for a pair of paraphrased queries. Such behavior might render the retrieval system less predictable and lead to user frustration. In this work, we consider the task of paraphrased te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2404.16029  [pdf, other

    cs.CV

    Editable Image Elements for Controllable Synthesis

    Authors: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park

    Abstract: Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Project page: https://jitengmu.github.io/Editable_Image_Elements/

  5. arXiv:2403.20236  [pdf, other

    cs.CV

    Long-Tailed Anomaly Detection with Learnable Class Names

    Authors: Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

    Abstract: Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these c… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: This paper is accepted to CVPR 2024. The supplementary material is included. The long-tailed dataset split is available at https://zenodo.org/records/10854201

  6. arXiv:2401.13992  [pdf, other

    cs.CV

    Diffusion-based Data Augmentation for Object Counting Problems

    Authors: Zhen Wang, Yuelei Li, Jia Wan, Nuno Vasconcelos

    Abstract: Crowd counting is an important problem in computer vision due to its wide range of applications in image understanding. Currently, this problem is typically addressed using deep learning approaches, such as Convolutional Neural Networks (CNNs) and Transformers. However, deep networks are data-driven and are prone to overfitting, especially when the available labeled crowd dataset is limited. To ov… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  7. arXiv:2312.07286  [pdf, ps, other

    q-bio.NC physics.bio-ph

    State-dependent complexity of the local field potential in the primary visual cortex

    Authors: Rafael M. Jungmann, Thaís Feliciano, Leandro A. A. Aguiar, Carina Soares-Cunha, Bárbara Coimbra, Ana João Rodrigues, Mauro Copelli, Fernanda S. Matias, Nivaldo A. P. de Vasconcelos, Pedro V. Carelli

    Abstract: The local field potential (LFP) is as a measure of the combined activity of neurons within a region of brain tissue. While biophysical modeling schemes for LFP in cortical circuits are well established, there is a paramount lack of understanding regarding the LFP properties along the states assumed in cortical circuits over long periods. Here we use a symbolic information approach to determine the… ▽ More

    Submitted 13 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

  8. arXiv:2312.00412  [pdf, other

    cs.CV

    SCHEME: Scalable Channel Mixer for Vision Transformers

    Authors: Deepak Sridhar, Yunsheng Li, Nuno Vasconcelos

    Abstract: Vision Transformers have achieved impressive performance in many vision tasks. While the token mixer or attention block has been studied in great detail, much less research has been devoted to the channel mixer or feature mixing block (FFN or MLP), which accounts for a significant portion of of the model parameters and computation. In this work, we show that the dense MLP connections can be replac… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Preprint

  9. arXiv:2306.08200  [pdf, other

    cs.CV cs.LG

    POP: Prompt Of Prompts for Continual Learning

    Authors: Zhiyuan Hu, Jiancheng Lyu, Dashan Gao, Nuno Vasconcelos

    Abstract: Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datas… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  10. arXiv:2306.05689  [pdf, other

    cs.CV

    Single-Stage Visual Relationship Learning using Conditional Queries

    Authors: Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2022

  11. arXiv:2306.02240  [pdf, other

    cs.CV

    ProTeCt: Prompt Tuning for Taxonomic Open Set Classification

    Authors: Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos

    Abstract: Visual-language foundation models, like CLIP, learn generalized representations that enable zero-shot open-set classification. Few-shot adaptation methods, based on prompt tuning, have been shown to further improve performance on downstream datasets. However, these methods do not fare well in the taxonomic open set (TOS) setting, where the classifier is asked to make predictions from label sets ac… ▽ More

    Submitted 28 March, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2024

  12. arXiv:2304.14401  [pdf, other

    cs.CV

    ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs

    Authors: Jiteng Mu, Shen Sang, Nuno Vasconcelos, Xiaolong Wang

    Abstract: While NeRF-based human representations have shown impressive novel view synthesis results, most methods still rely on a large number of images / views for training. In this work, we propose a novel animatable NeRF called ActorsNeRF. It is first pre-trained on diverse human subjects, and then adapted with few-shot monocular video frames for a new actor with unseen poses. Building on previous genera… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Project Page : https://jitengmu.github.io/ActorsNeRF/

  13. arXiv:2304.08809  [pdf, other

    cs.CV

    SViTT: Temporal Learning of Sparse Video-Text Transformers

    Authors: Yi Li, Kyle Min, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-t… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  14. arXiv:2304.05547  [pdf, other

    cs.LG cs.CV

    Taxonomic Class Incremental Learning

    Authors: Yuzhao Chen, Zonghuan Li, Zhiyuan Hu, Nuno Vasconcelos

    Abstract: The problem of continual learning has attracted rising attention in recent years. However, few works have questioned the commonly used learning setup, based on a task curriculum of random class. This differs significantly from human continual learning, which is guided by taxonomic curricula. In this work, we propose the Taxonomic Class Incremental Learning (TCIL) problem. In TCIL, the task sequenc… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  15. arXiv:2303.12696  [pdf, other

    cs.CV cs.AI

    Dense Network Expansion for Class Incremental Learning

    Authors: Zhiyuan Hu, Yunsheng Li, Jiancheng Lyu, Dashan Gao, Nuno Vasconcelos

    Abstract: The problem of class incremental learning (CIL) is considered. State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. While effective from a computational standpoint, these methods lead to models that grow quickly with the number of tasks. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  16. arXiv:2303.05068  [pdf, other

    cs.CV

    Toward Unsupervised Realistic Visual Question Answering

    Authors: Yuwei Zhang, Chih-Hui Ho, Nuno Vasconcelos

    Abstract: The problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs), is studied. We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training. To resolve the first drawback, we propose a new testing dataset, RGQA, which combi… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: Yuwei Zhang and Chih-Hui Ho contributed equally to this work

  17. arXiv:2212.05630  [pdf, other

    cs.CV

    DISCO: Adversarial Defense with Local Implicit Functions

    Authors: Chih-Hui Ho, Nuno Vasconcelos

    Abstract: The problem of adversarial defenses for image classification, where the goal is to robustify a classifier against adversarial examples, is considered. Inspired by the hypothesis that these examples lie beyond the natural image manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is proposed to remove adversarial perturbations by localized manifold projections. DISCO consumes… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

    Comments: Accepted to Neurips 2022

  18. arXiv:2211.07912  [pdf, other

    cs.CV

    YORO -- Lightweight End to End Visual Grounding

    Authors: Chih-Hui Ho, Srikar Appalaraju, Bhavan Jasani, R. Manmatha, Nuno Vasconcelos

    Abstract: We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task. This task involves localizing, in an image, an object referred via natural language. Unlike the recent trend in the literature of using multi-stage approaches that sacrifice speed for accuracy, YORO seeks a better trade-off between speed an accuracy by embracing a single-stage design, without… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted to ECCVW on International Challenge on Compositional and Multimodal Perception

  19. arXiv:2207.03520  [pdf, other

    cs.CV

    Should All Proposals be Treated Equally in Object Detection?

    Authors: Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, **g Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos

    Abstract: The complexity-precision trade-off of an object detector is a critical problem for resource constrained vision tasks. Previous works have emphasized detectors implemented with efficient backbones. The impact on this trade-off of proposal processing by the detection head is investigated in this work. It is hypothesized that improved detection efficiency requires a paradigm shift, towards the unequa… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  20. arXiv:2206.14801  [pdf, other

    cs.LG

    Meta-Learning over Time for Destination Prediction Tasks

    Authors: Mark Tenzer, Zeeshan Rasheed, Khurram Shafique, Nuno Vasconcelos

    Abstract: A need to understand and predict vehicles' behavior underlies both public and private goals in the transportation domain, including urban planning and management, ride-sharing services, and intelligent transportation systems. Individuals' preferences and intended destinations vary throughout the day, week, and year: for example, bars are most popular in the evenings, and beaches are most popular i… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 10 pages, 8 figures. Submitted to SIGSPATIAL 2022

  21. arXiv:2206.00100  [pdf, other

    cs.CV cs.CL

    VALHALLA: Visual Hallucination for Machine Translation

    Authors: Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu Chen, Rogerio Feris, David Cox, Nuno Vasconcelos

    Abstract: Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

    Comments: CVPR 2022

  22. arXiv:2204.03634  [pdf, other

    cs.CV cs.LG

    Class-Incremental Learning with Strong Pre-trained Models

    Authors: Tz-Ying Wu, Gurumurthy Swaminathan, Zhizhong Li, Avinash Ravichandran, Nuno Vasconcelos, Rahul Bhotika, Stefano Soatto

    Abstract: Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be d… ▽ More

    Submitted 12 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022, code is available at https://github.com/amazon-research/sp-cil

  23. arXiv:2203.16521  [pdf, other

    cs.CV

    CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

    Authors: Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

    Abstract: Recent advances show that Generative Adversarial Networks (GANs) can synthesize images with smooth variations along semantically meaningful latent directions, such as pose, expression, layout, etc. While this indicates that GANs implicitly learn pixel-level correspondences across images, few studies explored how to extract them explicitly. In this work, we introduce Coordinate GAN (CoordGAN), a st… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Project page: https://jitengmu.github.io/CoordGAN/

  24. arXiv:2203.16089  [pdf, other

    cs.CV

    Omni-DETR: Omni-Supervised Object Detection with Transformers

    Authors: Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto

    Abstract: We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection. Under this unified architecture, differen… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2022

  25. arXiv:2110.12108  [pdf, other

    cs.LG cs.AI cs.CV

    ConformalLayers: A non-linear sequential neural network with associative layers

    Authors: Eduardo Vera Sousa, Leandro A. F. Fernandes, Cristina Nader Vasconcelos

    Abstract: Convolutional Neural Networks (CNNs) have been widely applied. But as the CNNs grow, the number of arithmetic operations and memory footprint also increase. Furthermore, typical non-linear activation functions do not allow associativity of the operations encoded by consecutive layers, preventing the simplification of intermediate steps by combining them. We present a new activation function that a… ▽ More

    Submitted 9 November, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Best Paper on Pattern Recognition and Related Field at SIBGRAPI 2021 -- 34th Conference on Graphics, Patterns and Images

  26. arXiv:2110.04931  [pdf, other

    cs.CV

    BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning

    Authors: Zhirui Dai, Yuepeng Jiang, Yi Li, Bo Liu, Antoni B. Chan, Nuno Vasconcelos

    Abstract: Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird's eye view (BEV… ▽ More

    Submitted 12 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at International Conference on Computer Vision, 2021

  27. arXiv:2108.10992  [pdf, other

    cs.CV

    OOWL500: Overcoming Dataset Collection Bias in the Wild

    Authors: Brandon Leung, Chih-Hui Ho, Amir Persekian, David Orozco, Yen Chang, Erik Sandstrom, Bo Liu, Nuno Vasconcelos

    Abstract: The hypothesis that image datasets gathered online "in the wild" can produce biased object recognizers, e.g. preferring professional photography or certain viewing angles, is studied. A new "in the lab" data collection infrastructure is proposed consisting of a drone which captures images as it circles around objects. Crucially, the control provided by this setup and the natural camera shake inher… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  28. arXiv:2108.09911  [pdf, other

    cs.CV

    Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

    Authors: Brandon Leung, Chih-Hui Ho, Nuno Vasconcelos

    Abstract: Much recent progress has been made in reconstructing the 3D shape of an object from an image of it, i.e. single view 3D reconstruction. However, it has been suggested that current methods simply adopt a "nearest-neighbor" strategy, instead of genuinely understanding the shape behind the input image. In this paper, we rigorously show that for many state of the art methods, this issue manifests as (… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  29. arXiv:2108.09668  [pdf, other

    cs.CV

    Learning of Visual Relations: The Devil is in the Tails

    Authors: Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  30. arXiv:2108.07903  [pdf, other

    cs.CV cs.GR

    Spatially and color consistent environment lighting estimation using deep neural networks for mixed reality

    Authors: Bruno Augusto Dorta Marques, Esteban Walter Gonzalez Clua, Anselmo Antunes Montenegro, Cristina Nader Vasconcelos

    Abstract: The representation of consistent mixed reality (XR) environments requires adequate real and virtual illumination composition in real-time. Estimating the lighting of a real scenario is still a challenge. Due to the ill-posed nature of the problem, classical inverse-rendering techniques tackle the problem for simple lighting setups. However, those assumptions do not satisfy the current state-of-art… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

  31. arXiv:2108.05894  [pdf, other

    cs.CV cs.LG

    MicroNet: Improving Image Recognition with Extremely Low FLOPs

    Authors: Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

    Abstract: This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy. The former avoids the significant reduction of network width, while the latter mitigates the detriment of reduction in n… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: ICCV 2021, code is available at https://github.com/liyunsheng13/micronet}{https://github.com/liyunsheng13/micronet. arXiv admin note: substantial text overlap with arXiv:2011.12289

  32. arXiv:2105.00133  [pdf, other

    cs.CV

    Semi-supervised Long-tailed Recognition using Alternate Sampling

    Authors: Bo Liu, Haoxiang Li, Hao Kang, Nuno Vasconcelos, Gang Hua

    Abstract: Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. While techniques have been proposed to achieve a more balanced training loss and to improve tail classes data variations with synthesized samples, we resort to leverage readily available unlabeled data to boost recognition accuracy. The idea leads to a new recognition sett… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  33. arXiv:2105.00131  [pdf, other

    cs.CV

    GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

    Authors: Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

    Abstract: The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. It is hypothesized that the well known tendency of standard classifier training to overfit to popular classes can be exploited for effective transfer learning. Rather than eliminating this overfitting, e.g. by adopting popular class-balanced sampling methods, the learning algorithm s… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  34. arXiv:2105.00127  [pdf, other

    cs.CV

    Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

    Authors: Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

    Abstract: The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. While training with class-balanced sampling has been shown effective for this problem, it is known to over-fit to few-shot classes. It is hypothesized that this is due to the repeated sampling of examples and can be addressed by feature space augmentation. A new feature augmentation… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  35. arXiv:2105.00125  [pdf, other

    cs.CV

    Sparse Pose Trajectory Completion

    Authors: Bo Liu, Mandar Dixit, Roland Kwitt, Gang Hua, Nuno Vasconcelos

    Abstract: We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views (e.g. Pix3D), the ability to synthesize a pose trajectory for an arbitrary reference image. This is achieved with a cross-modal pose trajectory transfer mechanism. First, a domain transfer function is trained to predict, from an RGB image of the object, its 2D depth map. Then, a set of image view… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  36. arXiv:2104.07645  [pdf, other

    cs.CV

    A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

    Authors: Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

    Abstract: Recent work has made significant progress on using implicit functions, as a continuous representation for 3D rigid object shape reconstruction. However, much less effort has been devoted to modeling general articulated objects. Compared to rigid objects, articulated objects have higher degrees of freedom, which makes it hard to generalize to unseen shapes. To deal with the large shape variance, we… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Our project page is available at: https://jitengmu.github.io/A-SDF/

  37. arXiv:2104.05895  [pdf, other

    cs.CV

    IMAGINE: Image Synthesis by Image-Guided Model Inversion

    Authors: Pei Wang, Yijun Li, Krishna Kumar Singh, **gwan Lu, Nuno Vasconcelos

    Abstract: We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations via matching multi-level feature representations in the classifier, associated with adversarial training with an external… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Published in CVPR2021

  38. arXiv:2104.05623  [pdf, other

    cs.CV eess.IV

    Rethinking and Improving the Robustness of Image Style Transfer

    Authors: Pei Wang, Yijun Li, Nuno Vasconcelos

    Abstract: Extensive research in neural style transfer methods has shown that the correlation between features extracted by a pre-trained VGG network has a remarkable ability to capture the visual style of an image. Surprisingly, however, this stylization quality is not robust and often degrades significantly when applied to features from more advanced and lightweight networks, such as those in the ResNet fa… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: Published in CVPR2021 (Oral)

  39. arXiv:2103.15916  [pdf, other

    cs.CV

    Robust Audio-Visual Instance Discrimination

    Authors: Pedro Morgado, Ishan Misra, Nuno Vasconcelos

    Abstract: We present a self-supervised learning method to learn audio and video representations. Prior work uses the natural correspondence between audio and video to define a standard cross-modal instance discrimination task, where a model is trained to match representations from the two modalities. However, the standard approach introduces two sources of training noise. First, audio-visual correspondences… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  40. arXiv:2103.10583  [pdf, other

    cs.CV

    Dynamic Transfer for Multi-Source Domain Adaptation

    Authors: Yunsheng Li, Lu Yuan, Yinpeng Chen, Pei Wang, Nuno Vasconcelos

    Abstract: Recent works of multi-source domain adaptation focus on learning a domain-agnostic model, of which the parameters are static. However, such a static model is difficult to handle conflicts across multiple domains, and suffers from a performance degradation in both source domains and target domain. In this paper, we present dynamic transfer to address domain conflicts, where the model parameters are… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR 2021

  41. arXiv:2103.08756  [pdf, other

    cs.CV

    Revisiting Dynamic Convolution via Matrix Decomposition

    Authors: Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

    Abstract: Recent research in dynamic convolution shows substantial performance boost for efficient CNNs, due to the adaptive aggregation of K static convolution kernels. It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging. In this paper, we revisit it from a new perspective… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

    Comments: Accepted by ICLR 2021

  42. arXiv:2011.12289  [pdf, other

    cs.CV cs.LG

    MicroNet: Towards Image Recognition with Extremely Low FLOPs

    Authors: Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

    Abstract: In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e.g. 6 MFLOPs on ImageNet classification). Such a low cost network is highly desired on edge devices, yet usually suffers from a significant performance degradation. We handle the extremely low FLOPs based upon two design principles: (a) avoiding the reduction of network w… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  43. arXiv:2011.01819  [pdf, other

    cs.CV

    Learning Representations from Audio-Visual Spatial Alignment

    Authors: Pedro Morgado, Yi Li, Nuno Vasconcelos

    Abstract: We introduce a novel self-supervised pretext task for learning representations from audio-visual content. Prior work on audio-visual representation learning leverages correspondences at the video level. Approaches based on audio-visual correspondence (AVC) predict whether audio and video clips originate from the same or different video instances. Audio-visual temporal synchronization (AVTS) furthe… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: To appear at Advances in Neural Information Processing Systems (NeurIPS), 2020

  44. arXiv:2010.12050  [pdf, other

    cs.CV

    Contrastive Learning with Adversarial Examples

    Authors: Chih-Hui Ho, Nuno Vasconcelos

    Abstract: Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. It uses pairs of augmentations of unlabeled training examples to define a classification task for pretext learning of a deep embedding. Despite extensive works in augmentation procedures, prior works do not address the selection of challenging negative pairs, as images within a sampled ba… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  45. arXiv:2007.13912  [pdf, other

    cs.CV

    Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings

    Authors: Pedro Morgado, Yunsheng Li, Jose Costa Pereira, Mohammad Saberian, Nuno Vasconcelos

    Abstract: Image hash codes are produced by binarizing the embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. While proxy embeddings achieve good performance on both tasks, they are non-trivial to binarize, due to a rotational ambiguity that encourages non-binary embeddings. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted at International Journal of Computer Vision

  46. arXiv:2007.13813  [pdf, other

    q-bio.NC cond-mat.dis-nn cond-mat.stat-mech nlin.AO physics.bio-ph

    Subsampled directed-percolation models explain scaling relations experimentally observed in the brain

    Authors: Tawan T. A. Carvalho, Antonio J. Fontenele, Mauricio Girardi-Schappo, Thais Feliciano, Leandro A. A. Aguiar, Thais P. L. Silva, Nivaldo A. P. de Vasconcelos, Pedro V. Carelli, Mauro Copelli

    Abstract: Recent experimental results on spike avalanches measured in the urethane-anesthetized rat cortex have revealed scaling relations that indicate a phase transition at a specific level of cortical firing rate variability. The scaling relations point to critical exponents whose values differ from those of a branching process, which has been the canonical model employed to understand brain criticality.… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: 15 pages, 9 figures, submitted to Frontiers Neural Circuits

    Journal ref: Front. Neural Circuits 14, 83 (2021)

  47. arXiv:2007.09898  [pdf, other

    cs.CV

    Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier

    Authors: Tz-Ying Wu, Pedro Morgado, Pei Wang, Chih-Hui Ho, Nuno Vasconcelos

    Abstract: Long-tail recognition tackles the natural non-uniformly distributed data in real-world scenarios. While modern classifiers perform well on populated classes, its performance degrades significantly on tail classes. Humans, however, are less affected by this since, when confronted with uncertain examples, they simply opt to provide coarser predictions. Motivated by this, a deep realistic taxonomic c… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: Accepted to ECCV 2020

  48. arXiv:2005.13713  [pdf, other

    cs.CV

    Few-Shot Open-Set Recognition using Meta-Learning

    Authors: Bo Liu, Hao Kang, Haoxiang Li, Gang Hua, Nuno Vasconcelos

    Abstract: The problem of open-set recognition is considered. While previous approaches only consider this problem in the context of large-scale classifier training, we seek a unified solution for this and the low-shot classification setting. It is argued that the classic softmax classifier is a poor solution for open-set recognition, since it tends to overfit on the training classes. Randomization is then p… ▽ More

    Submitted 7 June, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

  49. Cooperative Monitoring and Dissemination of Urban Events Supported by Dynamic Clustering of Vehicles

    Authors: Everaldo Andrade, Kevin Veloso, Nathália Vasconcelos, Aldri Santos, Fernando Matos

    Abstract: Critical urban events take places at a random way and they need to be dealt with by public authorities quickly to maintain the proper operation of cities. The main challenges for an efficient handling of an event fall precisely in its random nature, and in the speed and accuracy of the notification of its manifestation to the authority. The pervasiveness of vehicles in urban environments, and thei… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Comments: This work has been submitted to the Pervasive and Mobile Computing (PMC) journal for possible publication

  50. arXiv:2004.12943  [pdf, other

    cs.CV

    Audio-Visual Instance Discrimination with Cross-Modal Agreement

    Authors: Pedro Morgado, Nuno Vasconcelos, Ishan Misra

    Abstract: We present a self-supervised learning approach to learn audio-visual representations from video and audio. Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa. We show that optimizing for cross-modal discrimination, rather than within-modal discrimination, is important to learn good representations from video and audio. With this simple but powerf… ▽ More

    Submitted 29 March, 2021; v1 submitted 27 April, 2020; originally announced April 2020.