Skip to main content

Showing 1–50 of 78 results for author: Del Bimbo, A

.
  1. arXiv:2405.02951  [pdf, other

    cs.CV cs.IR

    iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

    Authors: Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto Del Bimbo

    Abstract: Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The reliance of supervised methods on labor-intensive manually labeled datasets hinders their broad applicability. In this work, we introduce a new task, Zero-Shot… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Extended version of the ICCV2023 paper arXiv:2303.15247

  2. arXiv:2405.02581  [pdf, other

    cs.CV

    Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements

    Authors: Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo

    Abstract: Learning compatible representations enables the interchangeable use of semantic features as models are updated over time. This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model. While recent research has shown promising empirical evidence, there is still a lack of comprehensive theoretical understanding a… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR24 as Poster Highlight

  3. arXiv:2402.11631  [pdf, other

    cs.CV cs.ET

    Neuromorphic Face Analysis: a Survey

    Authors: Federico Becattini, Lorenzo Berlincioni, Luca Cultrera, Alberto Del Bimbo

    Abstract: Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These… ▽ More

    Submitted 22 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to Patter Recognition Letters

  4. arXiv:2402.11627  [pdf, other

    cs.CV cs.IR

    Interactive Garment Recommendation with User in the Loop

    Authors: Federico Becattini, Xiaolin Chen, Andrea Puccia, Haokun Wen, Xuemeng Song, Liqiang Nie, Alberto Del Bimbo

    Abstract: Recommending fashion items often leverages rich user profiles and makes targeted suggestions based on past history and previous purchases. In this paper, we work under the assumption that no prior knowledge is given about a user. We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit. We present a reinforcement learning… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2401.16058  [pdf, other

    cs.CV

    Neuromorphic Valence and Arousal Estimation

    Authors: Lorenzo Berlincioni, Luca Cultrera, Federico Becattini, Alberto Del Bimbo

    Abstract: Recognizing faces and their underlying emotions is an important aspect of biometrics. In fact, estimating emotional states from faces has been tackled from several angles in the literature. In this paper, we follow the novel route of using neuromorphic data to predict valence and arousal values from faces. Due to the difficulty of gathering event-based annotated videos, we leverage an event camera… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Submitted to Journal of Ambient Intelligence and Humanized Computing

  6. arXiv:2311.04263  [pdf, other

    cs.CV

    Perceptual Quality Improvement in Videoconferencing using Keyframes-based GAN

    Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

    Abstract: In the latest years, videoconferencing has taken a fundamental role in interpersonal relations, both for personal and business purposes. Lossy video compression algorithms are the enabling technology for videoconferencing, as they reduce the bandwidth required for real-time video streaming. However, lossy video compression decreases the perceived visual quality. Thus, many techniques for reducing… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: IEEE Transactions on Multimedia 2023 (IEEE TMM 2023)

  7. arXiv:2311.04261  [pdf, other

    cs.CV cs.MM

    Restoration of Analog Videos Using Swin-UNet

    Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

    Abstract: In this paper, we present a system to restore analog videos of historical archives. These videos often contain severe visual degradation due to the deterioration of their tape supports that require costly and slow manual interventions to recover the original content. The proposed system uses a multi-frame approach and is able to deal with severe tape mistracking, which results in completely scramb… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: ACM MM 2022 (Demo)

  8. arXiv:2310.20650  [pdf, other

    cs.CV cs.RO

    Addressing Limitations of State-Aware Imitation Learning for Autonomous Driving

    Authors: Luca Cultrera, Federico Becattini, Lorenzo Seidenari, Pietro Pala, Alberto Del Bimbo

    Abstract: Conditional Imitation learning is a common and effective approach to train autonomous driving agents. However, two issues limit the full potential of this approach: (i) the inertia problem, a special case of causal confusion where the agent mistakenly correlates low speed with no acceleration, and (ii) low correlation between offline and online performance due to the accumulation of small errors t… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Submitted to IEEE Transactions on Intelligent Vehicles

  9. arXiv:2310.20621  [pdf, other

    cs.CV

    Deepfake detection by exploiting surface anomalies: the SurFake approach

    Authors: Andrea Ciamarra, Roberto Caldelli, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: The ever-increasing use of synthetically generated content in different sectors of our everyday life, one for all media information, poses a strong need for deepfake detection tools in order to avoid the proliferation of altered messages. The process to identify manipulated content, in particular images and videos, is basically performed by looking for the presence of some inconsistencies and/or a… ▽ More

    Submitted 17 April, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

  10. arXiv:2310.20593  [pdf, other

    cs.CV

    FLODCAST: Flow and Depth Forecasting via Multimodal Recurrent Architectures

    Authors: Andrea Ciamarra, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: Forecasting motion and spatial positions of objects is of fundamental importance, especially in safety-critical settings such as autonomous driving. In this work, we address the issue by forecasting two different modalities that carry complementary information, namely optical flow and depth. To this end we propose FLODCAST a flow and depth forecasting model that leverages a multitask recurrent arc… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Submitted to Pattern Recognition

  11. arXiv:2310.14926  [pdf, other

    cs.CV cs.MM

    Reference-based Restoration of Digitized Analog Videotapes

    Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

    Abstract: Analog magnetic tapes have been the main video data storage device for several decades. Videos stored on analog videotapes exhibit unique degradation patterns caused by tape aging and reader device malfunctioning that are different from those observed in film and digital video restoration tasks. In this work, we present a reference-based approach for the resToration of digitized Analog videotaPEs… ▽ More

    Submitted 3 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: WACV2024

  12. arXiv:2310.14918  [pdf, other

    cs.CV

    ARNIQA: Learning Distortion Manifold for Image Quality Assessment

    Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

    Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to develop methods to measure image quality in alignment with human perception without the need for a high-quality reference image. In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsi… ▽ More

    Submitted 4 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: WACV2024

  13. arXiv:2310.08368  [pdf, other

    cs.CV

    Map** Memes to Words for Multimodal Hateful Meme Classification

    Authors: Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto Del Bimbo

    Abstract: Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions. However, some memes take a malicious turn, promoting hateful content and perpetuating discrimination. Detecting hateful memes within this multimodal context is a challenging task that requires understanding the intertwin… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: ICCV2023 CLVL Workshop

  14. Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval

    Authors: Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo

    Abstract: Given the recent advances in multimodal image pretraining where visual models trained with semantically dense textual supervision tend to have better generalization capabilities than those trained using categorical attributes or through unsupervised techniques, in this work we investigate how recent CLIP model can be applied in several tasks in artwork domain. We perform exhaustive experiments on… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Proc. of Florence Heri-Tech 2022: The Future of Heritage Science and Technologies: ICT and Digital Heritage, 2022

  15. DiffDefense: Defending against Adversarial Attacks via Diffusion Models

    Authors: Hondamunige Prasanna Silva, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: This paper presents a novel reconstruction method that leverages Diffusion Models to protect machine learning classifiers against adversarial attacks, all without requiring any modifications to the classifiers themselves. The susceptibility of machine learning models to minor input perturbations renders them vulnerable to adversarial attacks. While diffusion-based methods are typically disregarded… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Paper published at ICIAP23

    Journal ref: ICIAP 2023

  16. arXiv:2308.12914  [pdf, other

    cs.CV

    3D Pose Nowcasting: Forecast the Future to Improve the Present

    Authors: Alessandro Simoni, Francesco Marchetti, Guido Borghi, Federico Becattini, Lorenzo Seidenari, Roberto Vezzani, Alberto Del Bimbo

    Abstract: Technologies to enable safe and effective collaboration and coexistence between humans and robots have gained significant importance in the last few years. A critical component useful for realizing this collaborative paradigm is the understanding of human and robot 3D poses using non-invasive systems. Therefore, in this paper, we propose a novel vision-based system leveraging depth data to accurat… ▽ More

    Submitted 18 November, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  17. arXiv:2308.11485  [pdf, other

    cs.CV

    Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

    Authors: Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

    Abstract: Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP mo… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted in ACM Transactions on Multimedia Computing Communications and Applications (TOMM)

  18. arXiv:2308.07151  [pdf, other

    cs.CV

    Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage

    Authors: Dario Cioni, Lorenzo Berlincioni, Federico Becattini, Alberto del Bimbo

    Abstract: Cultural heritage applications and advanced machine learning models are creating a fruitful synergy to provide effective and accessible ways of interacting with artworks. Smart audio-guides, personalized art-related content and gamification approaches are just a few examples of how technology can be exploited to provide additional value to artists or exhibitions. Nonetheless, from a machine learni… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023 4th Workshop on e-Heritage

  19. arXiv:2307.14063  [pdf, other

    cs.CV

    ECO: Ensembling Context Optimization for Vision-Language Models

    Authors: Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto Del Bimbo

    Abstract: Image recognition has recently witnessed a paradigm shift, where vision-language models are now used to perform few-shot classification based on textual prompts. Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space. This has paved the way for several works that focus on engineering or learning text… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  20. arXiv:2306.01081  [pdf, other

    cs.CV cs.AI cs.MM

    4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks

    Authors: Lorenzo Berlincioni, Stefano Berretti, Marco Bertini, Alberto Del Bimbo

    Abstract: Time varying sequences of 3D point clouds, or 4D point clouds, are now being acquired at an increasing pace in several applications (e.g., LiDAR in autonomous or assisted driving). In many cases, such volume of data is transmitted, thus requiring that proper compression tools are applied to either reduce the resolution or the bandwidth. In this paper, we propose a new solution for upscaling and re… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  21. arXiv:2304.08098  [pdf, other

    cs.CV cs.IR

    Transformer-based Graph Neural Networks for Outfit Generation

    Authors: Federico Becattini, Federico Maria Teotini, Alberto Del Bimbo

    Abstract: Suggesting complementary clothing items to compose an outfit is a process of emerging interest, yet it involves a fine understanding of fashion trends and visual aesthetics. Previous works have mainly focused on recommendation by scoring visual appeal and representing garments as ordered sequences or as collections of pairwise-compatible items. This limits the full usage of relations among clothes… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IEEE Transactions on Emerging Topics in Computing

  22. arXiv:2304.06351  [pdf, other

    cs.CV

    Neuromorphic Event-based Facial Expression Recognition

    Authors: Lorenzo Berlincioni, Luca Cultrera, Chiara Albisani, Lisa Cresti, Andrea Leonardo, Sara Picchioni, Federico Becattini, Alberto Del Bimbo

    Abstract: Recently, event cameras have shown large applicability in several computer vision fields especially concerning tasks that require high temporal resolution. In this work, we investigate the usage of such kind of data for emotion recognition by presenting NEFER, a dataset for Neuromorphic Event-based Facial Expression Recognition. NEFER is composed of paired RGB and event videos representing human f… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  23. arXiv:2304.00500  [pdf, other

    cs.CV cs.AI cs.MM

    Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images

    Authors: Roberto Amoroso, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Alberto Del Bimbo, Rita Cucchiara

    Abstract: Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. While these models have numerous benefits across various sectors, they have also raised concerns about the potential misuse of fake images and cast new pressures on fake image detection. In this work, we pioneer a systematic study on deepfake detection generated by s… ▽ More

    Submitted 21 May, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: ACM Transactions on Multimedia Computing, Communications and Applications (2024)

  24. arXiv:2303.15247  [pdf, other

    cs.CV cs.CL cs.IR

    Zero-Shot Composed Image Retrieval with Textual Inversion

    Authors: Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto Del Bimbo

    Abstract: Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images. The high effort and cost required for labeling datasets for CIR hamper the widespread usage of existing methods, as they rely on supervised learning. In this work, we propose a new task, Zero-Shot CIR (ZS-CIR), th… ▽ More

    Submitted 19 August, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: ICCV2023

  25. arXiv:2301.06116  [pdf, other

    cs.CV cs.LG

    Maximally Compact and Separated Features with Regular Polytope Networks

    Authors: Federico Pernici, Matteo Bruni, Claudio Baecchi, Alberto Del Bimbo

    Abstract: Convolutional Neural Networks (CNNs) trained with the Softmax loss are widely used classification models for several vision tasks. Typically, a learnable transformation (i.e. the classifier) is placed at the end of such models returning class scores that are further normalized into probabilities by Softmax. This learnable transformation has a fundamental role in determining the network internal fe… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: DEEPVISION 2019 CVPR 2019, LONG BEACH Sunday, 16th June @ Room "Terrace Theater" https://sites.google.com/view/deepvision2019/program https://openaccess.thecvf.com/content_CVPRW_2019/html/Deep_Vision_Workshop/Pernici_Maximally_Compact_and_Separated_Features_with_Regular_Polytope_Networks_CVPRW_2019_paper.html. arXiv admin note: text overlap with arXiv:1902.10441

  26. arXiv:2211.09032  [pdf, other

    cs.CV cs.LG

    CL2R: Compatible Lifelong Learning Representations

    Authors: Niccolo Biondi, Federico Pernici, Matteo Bruni, Daniele Mugnai, Alberto Del Bimbo

    Abstract: In this paper, we propose a method to partially mimic natural intelligence for the problem of lifelong learning representations that are compatible. We take the perspective of a learning agent that is interested in recognizing object instances in an open dynamic universe in a way in which any update to its internal feature representation does not render the features in the gallery unusable for vis… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Published on ACM TOMM 2022

  27. Forecasting Future Instance Segmentation with Learned Optical Flow and War**

    Authors: Andrea Ciamarra, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: For an autonomous vehicle it is essential to observe the ongoing dynamics of a scene and consequently predict imminent future scenarios to ensure safety to itself and others. This can be done using different sensors and modalities. In this paper we investigate the usage of optical flow for predicting future semantic segmentations. To do so we propose a model that forecasts flow fields autoregressi… ▽ More

    Submitted 6 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Paper published as Poster at ICIAP21

    Journal ref: ICIAP 2022

  28. Learning advisor networks for noisy image classification

    Authors: Simone Ricci, Tiberio Uricchio, Alberto Del Bimbo

    Abstract: In this paper, we introduced the novel concept of advisor network to address the problem of noisy labels in image classification. Deep neural networks (DNN) are prone to performance reduction and overfitting problems on training data with noisy annotations. Weighting loss methods aim to mitigate the influence of noisy labels during the training, completely removing their contribution. This discard… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Paper published as Poster at ICIAP21

    Journal ref: ICIAP 2022

  29. arXiv:2210.16807  [pdf, other

    cs.CV

    The Florence 4D Facial Expression Dataset

    Authors: F. Principi, S. Berretti, C. Ferrari, N. Otberdout, M. Daoudi, A. Del Bimbo

    Abstract: Human facial expressions change dynamically, so their recognition / analysis should be conducted by accounting for the temporal evolution of face deformations either in 2D or 3D. While abundant 2D video data do exist, this is not the case in 3D, where few 3D dynamic (4D) datasets were released for public use. The negative consequence of this scarcity of data is amplified by current deep learning b… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  30. arXiv:2209.01813  [pdf, other

    cs.CV

    Automatic Estimation of Self-Reported Pain by Trajectory Analysis in the Manifold of Fixed Rank Positive Semi-Definite Matrices

    Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal

    Abstract: We propose an automatic method to estimate self-reported pain based on facial landmarks extracted from videos. For each video sequence, we decompose the face into four different regions and the pain intensity is measured by modeling the dynamics of facial movement using the landmarks of these regions. A formulation based on Gram matrices is used for representing the trajectory of landmarks on the… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

    Comments: To appear in IEEE Transactions On Affective Computing, it is an extension of our paper arXiv:2006.13882

  31. arXiv:2208.00725  [pdf, other

    cs.CV cs.IR

    Fashion Recommendation Based on Style and Social Events

    Authors: Federico Becattini, Lavinia De Divitiis, Claudio Baecchi, Alberto Del Bimbo

    Abstract: Fashion recommendation is often declined as the task of finding complementary items given a query garment or retrieving outfits that are suitable for a given user. In this work we address the problem by adding an additional semantic layer based on the style of the proposed dressing. We model style according to two important aspects: the mood and the emotion concealed behind color combination patte… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: submitted to Multimedia Tools and Applications. Data available at: https://github.com/fedebecat/Fashion4Events

  32. arXiv:2208.00050  [pdf, other

    cs.CV

    Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories

    Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

    Abstract: In this work, we address the problem of 4D facial expressions generation. This is usually addressed by animating a neutral 3D face to reach an expression peak, and then get back to the neutral state. In the real world though, people show more complex expressions, and switch from one expression to another. We thus propose a new model that generates transitions between different expressions, and syn… ▽ More

    Submitted 18 May, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

    Comments: This preprint is an extension of CVPR 2022 paper arXiv:2105.07463

  33. arXiv:2207.12101  [pdf, other

    cs.CV cs.CL

    Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

    Authors: Pietro Bongini, Federico Becattini, Alberto Del Bimbo

    Abstract: The use of Deep Learning and Computer Vision in the Cultural Heritage domain is becoming highly relevant in the last few years with lots of applications about audio smart guides, interactive museums and augmented reality. All these technologies require lots of data to work effectively and be useful for the user. In the context of artworks, such data is annotated by experts in an expensive and time… ▽ More

    Submitted 19 May, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

  34. arXiv:2206.11759  [pdf, other

    cs.CV cs.GR

    What makes you, you? Analyzing Recognition by Swap** Face Parts

    Authors: Claudio Ferrari, Matteo Serpentoni, Stefano Berretti, Alberto Del Bimbo

    Abstract: Deep learning advanced face recognition to an unprecedented accuracy. However, understanding how local parts of the face affect the overall recognition performance is still mostly unclear. Among others, face swap has been experimented to this end, but just for the entire face. In this paper, we propose to swap facial parts as a way to disentangle the recognition relevance of different face parts,… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted for publication at 26TH International Conference on Pattern Recognition (ICPR), 2022

  35. arXiv:2206.03086  [pdf, other

    cs.CV

    Online Deep Clustering with Video Track Consistency

    Authors: Alessandra Alfani, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful sour… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted at ICPR2022 as oral

  36. Contrastive Supervised Distillation for Continual Representation Learning

    Authors: Tommaso Barletti, Niccolo' Biondi, Federico Pernici, Matteo Bruni, Alberto Del Bimbo

    Abstract: In this paper, we propose a novel training procedure for the continual representation learning problem in which a neural network model is sequentially learned to alleviate catastrophic forgetting in visual search tasks. Our method, called Contrastive Supervised Distillation (CSD), reduces feature forgetting while learning discriminative features. This is achieved by leveraging labels information i… ▽ More

    Submitted 10 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Paper published as Oral and awarded as Best Student Paper at ICIAP21

    Journal ref: ICIAP 2021

  37. arXiv:2203.12446  [pdf, other

    cs.CV

    SMEMO: Social Memory for Trajectory Forecasting

    Authors: Francesco Marchetti, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: Effective modeling of human interactions is of utmost importance when forecasting behaviors such as future trajectories. Each individual, with its motion, influences surrounding agents since everyone obeys to social non-written rules such as collision avoidance or group following. In this paper we model such interactions, which constantly evolve through time, by looking at the problem from an algo… ▽ More

    Submitted 18 February, 2024; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted for publication in IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI)

  38. arXiv:2202.12690  [pdf, other

    cs.CV

    On Modality Bias Recognition and Reduction

    Authors: Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan Kankanhalli, Alberto Del Bimbo

    Abstract: Making each modality in multi-modal data contribute is of vital importance to learning a versatile multi-modal model. Existing methods, however, are often dominated by one or few of modalities during model training, resulting in sub-optimal performance. In this paper, we refer to this problem as modality bias and attempt to study it in the context of multi-modal classification systematically and c… ▽ More

    Submitted 26 September, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: Accepted by ToMM

  39. CoReS: Compatible Representations via Stationarity

    Authors: Niccolo Biondi, Federico Pernici, Matteo Bruni, Alberto Del Bimbo

    Abstract: Compatible features enable the direct comparison of old and new learned features allowing to use them interchangeably over time. In visual search systems, this eliminates the need to extract new features from the gallery-set when the representation model is upgraded with novel data. This has a big value in real applications as re-indexing the gallery-set can be computationally expensive when the g… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. Code: https://github.com/NiccoBiondi/cores-compatibility

  40. arXiv:2110.05848  [pdf, other

    cs.CV

    Fine-Grained Adversarial Semi-supervised Learning

    Authors: Daniele Mugnai, Federico Pernici, Francesco Turchini, Alberto Del Bimbo

    Abstract: In this paper we exploit Semi-Supervised Learning (SSL) to increase the amount of training data to improve the performance of Fine-Grained Visual Categorization (FGVC). This problem has not been investigated in the past in spite of prohibitive annotation costs that FGVC requires. Our approach leverages unlabeled data with an adversarial optimization strategy in which the internal features represen… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Journal ref: ACM Transactions on Multimedia Computing, Communications, and Applications 2021

  41. arXiv:2106.13603  [pdf, other

    cs.CV

    Partially fake it till you make it: mixing real and fake thermal images for improved object detection

    Authors: Francesco Bongini, Lorenzo Berlincioni, Marco Bertini, Alberto Del Bimbo

    Abstract: In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes. We show the performance of the proposed system in the context of object detection in thermal videos, a domain where 1) training datasets are very limited compared to visible spectrum datasets and 2) creating full realistic s… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  42. arXiv:2105.07463  [pdf, other

    cs.CV cs.AI

    Sparse to Dense Dynamic 3D Facial Expression Generation

    Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

    Abstract: In this paper, we propose a solution to the task of generating dynamic 3D facial expressions from a neutral 3D face and an expression label. This involves solving two sub-problems: (i)modeling the temporal dynamics of expressions, and (ii) deforming the neutral mesh to obtain the expressive counterpart. We represent the temporal evolution of expressions using the motion of a sparse set of 3D landm… ▽ More

    Submitted 3 March, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: paper accepted at CVPR 2022

  43. Learning Group Activities from Skeletons without Individual Action Labels

    Authors: Fabio Zappardino, Tiberio Uricchio, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: To understand human behavior we must not just recognize individual actions but model possibly complex group activity and interactions. Hierarchical models obtain the best results in group activity recognition but require fine grained individual action annotations at the actor level. In this paper we show that using only skeletal data we can train a state-of-the art end-to-end system using only gro… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: ICPR 2020

  44. arXiv:2105.01993  [pdf, other

    cs.CV

    AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

    Authors: Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto Del Bimbo

    Abstract: A number of studies point out that current Visual Question Answering (VQA) models are severely affected by the language prior problem, which refers to blindly making predictions based on the language shortcut. Some efforts have been devoted to overcoming this issue with delicate models. However, there is no research to address it from the angle of the answer feature space learning, despite of the… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  45. Regular Polytope Networks

    Authors: Federico Pernici, Matteo Bruni, Claudio Baecchi, Alberto Del Bimbo

    Abstract: Neural networks are widely used as a model for classification in a large variety of tasks. Typically, a learnable transformation (i.e. the classifier) is placed at the end of such models returning a value for each class used for classification. This transformation plays an important role in determining how the generated features change during the learning process. In this work, we argue that this… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1902.10441

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2021

  46. arXiv:2102.02005  [pdf, other

    cs.CV

    Robust pedestrian detection in thermal imagery using synthesized images

    Authors: My Kieu, Lorenzo Berlincioni, Leonardo Galteri, Marco Bertini, Andrew D. Bagdanov, Alberto Del Bimbo

    Abstract: In this paper we propose a method for improving pedestrian detection in the thermal domain using two stages: first, a generative data augmentation approach is used, then a domain adaptation method using generated data adapts an RGB pedestrian detector. Our model, based on the Least-Squares Generative Adversarial Network, is trained to synthesize realistic thermal versions of input RGB images which… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted at ICPR2020

  47. arXiv:2012.06200  [pdf, other

    cs.CV cs.IR

    Garment Recommendation with Memory Augmented Neural Networks

    Authors: Lavinia De Divitiis, Federico Becattini, Claudio Baecchi, Alberto Del Bimbo

    Abstract: Fashion plays a pivotal role in society. Combining garments appropriately is essential for people to communicate their personality and style. Also different events require outfits to be thoroughly chosen to comply with underlying social clothing rules. Therefore, combining garments appropriately might not be trivial. The fashion industry has turned this into a massive source of income, relying on… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  48. arXiv:2010.08948  [pdf, other

    cs.CV cs.RO

    Multiple Future Prediction Leveraging Synthetic Trajectories

    Authors: Lorenzo Berlincioni, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: Trajectory prediction is an important task, especially in autonomous driving. The ability to forecast the position of other moving agents can yield to an effective planning, ensuring safety for the autonomous vehicle as well for the observed entities. In this work we propose a data driven approach based on Markov Chains to generate synthetic trajectories, which are useful for training a multiple f… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

    Comments: Accepted at ICPR2020

  49. arXiv:2010.08946  [pdf, other

    cs.CV

    Temporal Binary Representation for Event-Based Action Recognition

    Authors: Simone Undri Innocenti, Federico Becattini, Federico Pernici, Alberto Del Bimbo

    Abstract: In this paper we present an event aggregation strategy to convert the output of an event camera into frames processable by traditional Computer Vision algorithms. The proposed method first generates sequences of intermediate binary representations, which are then losslessly transformed into a compact format by simply applying a binary-to-decimal conversion. This strategy allows us to encode tempor… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

    Comments: Accepted at ICPR2020

  50. Class-incremental Learning with Pre-allocated Fixed Classifiers

    Authors: Federico Pernici, Matteo Bruni, Claudio Baecchi, Francesco Turchini, Alberto Del Bimbo

    Abstract: In class-incremental learning, a learning agent faces a stream of data with the goal of learning new classes while not forgetting previous ones. Neural networks are known to suffer under this setting, as they forget previously acquired knowledge. To address this problem, effective methods exploit past data stored in an episodic memory while expanding the final classifier nodes to accommodate the n… ▽ More

    Submitted 5 August, 2023; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: ICPR 2021 (figure and typos fixed)