Skip to main content

Showing 1–50 of 68 results for author: Del Bue, A

.
  1. arXiv:2407.05003  [pdf, other

    cs.HC

    DCitizens Roles Unveiled: SIG Navigating Identities in Digital Civics and the Spectrum of Societal Impact

    Authors: Anna R. L. Carter, Kyle Montague, Reem Talhouk, Shaun Lawson, Hugo Nicolau, Ana Cristina Pires, Markus Rohde, Alessio Del Bue, Tiffany Knearem

    Abstract: The DCitizens SIG aims to navigate ethical dimensions in forthcoming Digital Civics projects, ensuring enduring benefits and community resilience. Additionally, it seeks to shape the future landscape of digital civics for ethical and sustainable interventions. As we dive into these interactive processes, a challenge arises of discerning authentic intentions and validating perspectives. This explor… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  2. arXiv:2406.15833  [pdf, other

    cs.RO

    XBG: End-to-end Imitation Learning for Autonomous Behaviour in Human-Robot Interaction and Collaboration

    Authors: Carlos Cardenas-Perez, Giulio Romualdi, Mohamed Elobaid, Stefano Dafarra, Giuseppe L'Erario, Silvio Traversaro, Pietro Morerio, Alessio Del Bue, Daniele Pucci

    Abstract: This paper presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for a whole-body autonomous humanoid robot used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution of this paper is an architecture for learning HRI behaviours using a data-driven approach. Through teleoperation, a diverse dataset is collected, comprising d… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Submitted to RA-L https://ami-iit.github.io/xbg/

  3. arXiv:2406.05080  [pdf, other

    cs.RO cs.AI cs.CL

    I2EDL: Interactive Instruction Error Detection and Localization

    Authors: Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

    Abstract: In the Vision-and-Language Navigation in Continuous Environments (VLN-CE) task, the human user guides an autonomous agent to reach a target goal via a series of low-level actions following a textual instruction in natural language. However, most existing methods do not address the likely case where users may make mistakes when providing such instruction (e.g. "turn left" instead of "turn right").… ▽ More

    Submitted 23 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE RO-MAN 2024

  4. arXiv:2404.12784  [pdf, other

    cs.CV cs.LG

    Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation

    Authors: Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue

    Abstract: We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $α$ blending their… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  5. arXiv:2404.10574  [pdf, other

    cs.CV cs.AI cs.LG

    Uncertainty-guided Open-Set Source-Free Unsupervised Domain Adaptation with Target-private Class Segregation

    Authors: Mattia Litrico, Davide Talon, Sebastiano Battiato, Alessio Del Bue, Mario Valerio Giuffrida, Pietro Morerio

    Abstract: Standard Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target but usually requires simultaneous access to both source and target data. Moreover, UDA approaches commonly assume that source and target domains share the same labels space. Yet, these two assumptions are hardly satisfied in real-world scenarios. This paper considers the mor… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  6. arXiv:2404.01053  [pdf, other

    cs.CV

    HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

    Authors: David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue

    Abstract: We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to app… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  7. arXiv:2403.12682  [pdf, other

    cs.CV cs.RO

    IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model

    Authors: Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

    Abstract: We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted ICRA 2024, Project page: https://mbortolon97.github.io/iffnerf/

  8. arXiv:2403.10700  [pdf, other

    cs.RO cs.AI cs.CL

    Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

    Authors: Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

    Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 3 figures, 8 pages

  9. arXiv:2403.09830  [pdf, other

    cs.LG cs.AI

    Towards the Reusability and Compositionality of Causal Representations

    Authors: Davide Talon, Phillip Lippe, Stuart James, Alessio Del Bue, Sara Magliacane

    Abstract: Causal Representation Learning (CRL) aims at identifying high-level causal factors and their relationships from high-dimensional observations, e.g., images. While most CRL works focus on learning causal representations in a single environment, in this work we instead propose a first step towards learning causal representations from temporal sequences of images that can be adapted in a new environm… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to the 3rd Conference on Causal Learning and Reasoning (CLeaR 2024)

  10. arXiv:2403.08586  [pdf, other

    cs.CV

    PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections

    Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

    Abstract: Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in… ▽ More

    Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  11. arXiv:2402.19302  [pdf, other

    cs.CV

    DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

    Authors: Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue

    Abstract: Reassembly tasks play a fundamental role in many fields and multiple approaches exist to solve specific reassembly problems. In this context, we posit that a general unified model can effectively address them all, irrespective of the input data type (images, 3D, etc.). We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted at CVPR2024

  12. arXiv:2311.04058  [pdf, other

    cs.CV

    mmFUSION: Multimodal Fusion for 3D Objects Detection

    Authors: Javed Ahmad, Alessio Del Bue

    Abstract: Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems. Camera and LiDAR are the most commonly used sensors, and usually, their fusion happens at the early or late stages of 3D detectors with the help of regions of interest (RoIs). On the other hand, fusion at the intermediate level is more adaptive because it does not need RoIs from modalities but is complex as… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: 10 pages,

  13. arXiv:2310.02201  [pdf, other

    cs.CV

    Learnable Data Augmentation for One-Shot Unsupervised Domain Adaptation

    Authors: Julio Ivan Davila Carrazco, Pietro Morerio, Alessio Del Bue, Vittorio Murino

    Abstract: This paper presents a classification framework based on learnable data augmentation to tackle the One-Shot Unsupervised Domain Adaptation (OS-UDA) problem. OS-UDA is the most challenging setting in Domain Adaptation, as only one single unlabeled target sample is assumed to be available for model adaptation. Driven by such single sample, our method LearnAug-UDA learns how to augment source data, ma… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to The 34th British Machine Vision Conference (BMVC 2023)

  14. arXiv:2308.08303  [pdf, other

    cs.CV

    Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos

    Authors: Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue

    Abstract: Objects are crucial for understanding human-object interactions. By identifying the relevant objects, one can also predict potential future interactions or actions that may occur with these objects. In this paper, we study the problem of Short-Term Object interaction anticipation (STA) and propose NAOGAT (Next-Active-Object Guided Anticipation Transformer), a multi-modal end-to-end transformer net… ▽ More

    Submitted 5 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted in WACV'24

  15. arXiv:2308.05410  [pdf, other

    cs.CV cs.GR cs.RO

    SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data

    Authors: Mohammad Zohaib, Alessio Del Bue

    Abstract: This paper proposes a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated. Our proposed model adheres to the following principles: i) keypoints inference is fully unsupervised (no annotation given), ii) keypoints position error should be low and resilient to PCD perturbations (robustness)… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted in International Conference on Computer Vision (ICCV) 2023. For code and data, please refer to the following GitHub page: https://github.com/IITPAVIS/SC3K

  16. arXiv:2308.04402  [pdf

    cs.CV

    Person Re-Identification without Identification via Event Anonymization

    Authors: Shafiq Ahmad, Pietro Morerio, Alessio Del Bue

    Abstract: Wide-scale use of visual surveillance in public spaces puts individual privacy at stake while increasing resource consumption (energy, bandwidth, and computation). Neuromorphic vision sensors (event-cameras) have been recently considered a valid solution to the privacy issue because they do not capture detailed RGB visual information of the subjects in the scene. However, recent deep learning arch… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at International Conference on Computer Vision (ICCV), 2023

  17. arXiv:2305.16066  [pdf, other

    cs.CV

    Guided Attention for Next Active Object @ EGO4D STA Challenge

    Authors: Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue

    Abstract: In this technical report, we describe the Guided-Attention mechanism based solution for the short-term anticipation (STA) challenge for the EGO4D challenge. It combines the object detections, and the spatiotemporal features extracted from video clips, enhancing the motion and contextual information, and further decoding the object-centric and motion-centric information to address the problem of ST… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Winner of CVPR@2023 Ego4D STA challenge. arXiv admin note: substantial text overlap with arXiv:2305.12953

  18. arXiv:2305.12953  [pdf, other

    cs.CV

    Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention

    Authors: Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue

    Abstract: Short-term action anticipation (STA) in first-person videos is a challenging task that involves understanding the next active object interactions and predicting future actions. Existing action anticipation methods have primarily focused on utilizing features extracted from video clips, but often overlooked the importance of objects and their interactions. To this end, we propose a novel approach t… ▽ More

    Submitted 23 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE ICIP 2023, see project page here : https://sanketsans.github.io/guided-attention-egocentric.html

  19. arXiv:2305.04628  [pdf, other

    cs.CV

    Target-driven One-Shot Unsupervised Domain Adaptation

    Authors: Julio Ivan Davila Carrazco, Suvarna Kishorkumar Kadam, Pietro Morerio, Alessio Del Bue, Vittorio Murino

    Abstract: In this paper, we introduce a novel framework for the challenging problem of One-Shot Unsupervised Domain Adaptation (OSUDA), which aims to adapt to a target domain with only a single unlabeled target sample. Unlike existing approaches that rely on large labeled source and unlabeled target data, our Target-driven One-Shot UDA (TOS-UDA) approach employs a learnable augmentation strategy guided by t… ▽ More

    Submitted 17 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to 22nd International Conference on IMAGE ANALYSIS AND PROCESSING (ICIAP) 2023

    Journal ref: 22nd International Conference on IMAGE ANALYSIS AND PROCESSING (ICIAP) 2023

  20. arXiv:2304.06373  [pdf, other

    cs.CV

    3DoF Localization from a Single Image and an Object Map: the Flatlandia Problem and Dataset

    Authors: Matteo Toso, Matteo Taiana, Stuart James, Alessio Del Bue

    Abstract: Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient them… ▽ More

    Submitted 8 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  21. arXiv:2303.11120  [pdf, other

    cs.CV

    Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

    Authors: Francesco Giuliari, Gianluca Scarpellini, Stuart James, Yiming Wang, Alessio Del Bue

    Abstract: Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the nois… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  22. arXiv:2303.03770  [pdf, other

    cs.CV cs.AI cs.LG

    Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation

    Authors: Mattia Litrico, Alessio Del Bue, Pietro Morerio

    Abstract: Standard Unsupervised Domain Adaptation (UDA) methods assume the availability of both source and target data during the adaptation. In this work, we investigate Source-free Unsupervised Domain Adaptation (SF-UDA), a specific case of UDA where a model is adapted to a target domain without access to source data. We propose a novel approach for the SF-UDA setting based on a loss reweighting strategy… ▽ More

    Submitted 17 March, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: To be published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  23. arXiv:2303.03155  [pdf, other

    cs.RO

    Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

    Authors: Francesco Taioli, Francesco Giuliari, Yiming Wang, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

    Abstract: We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor m… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 12 pages,8 figures. Submitted for review at IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2009.08140

  24. arXiv:2302.10624  [pdf, other

    cs.CV

    Self-improving object detection via disagreement reconciliation

    Authors: Gianluca Scarpellini, Stefano Rosa, Pietro Morerio, Lorenzo Natale, Alessio Del Bue

    Abstract: Object detectors often experience a drop in performance when new environmental conditions are insufficiently represented in the training data. This paper studies how to automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., in a self-supervised fashion. In our setting, an agent initially explores… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: This article is a conference paper related to arXiv:2302.03566 and is currently under review

  25. arXiv:2302.06358  [pdf, other

    cs.CV

    Anticipating Next Active Objects for Egocentric Videos

    Authors: Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue

    Abstract: This paper addresses the problem of anticipating the next-active-object location in the future, for a given egocentric video clip where the contact might happen, before any action takes place. The problem is considerably hard, as we aim at estimating the position of such objects in a scenario where the observed clip and the action segment are separated by the so-called ``time to contact'' (TTC) se… ▽ More

    Submitted 1 May, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE ACCESS, this paper carries the Manuscript DOI: 10.1109/ACCESS.2024.3395282. The complete peer-reviewed version is available via this DOI, while the arXiv version is a post-author manuscript without peer-review

  26. arXiv:2302.03566  [pdf, other

    cs.CV

    Look Around and Learn: Self-Training Object Detection by Exploration

    Authors: Gianluca Scarpellini, Stefano Rosa, Pietro Morerio, Lorenzo Natale, Alessio Del Bue

    Abstract: When an object detector is deployed in a novel setting it often experiences a drop in performance. This paper studies how an embodied agent can automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., a fully self-supervised approach. In our setting, an agent initially learns to explore the environ… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Paper accepted at ECCV2024

  27. arXiv:2301.00866  [pdf, other

    cs.RO cs.AI

    3DSGrasp: 3D Shape-Completion for Robotic Grasp

    Authors: Seyed S. Mohammadi, Nuno F. Duarte, Dimitris Dimou, Yiming Wang, Matteo Taiana, Pietro Morerio, Atabak Dehban, Plinio Moreno, Alexandre Bernardino, Alessio Del Bue, Jose Santos-Victor

    Abstract: Real-world robotic gras** can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the gras** action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel gras** strategy, named 3DSGrasp, that predicts the missing geometry fr… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  28. arXiv:2211.00562  [pdf, other

    cs.CV

    Leveraging commonsense for object localisation in partial scenes

    Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Alessio Del Bue, Yiming Wang

    Abstract: We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept no… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.05380

  29. arXiv:2210.04214  [pdf, other

    cs.CV cs.GR

    VM-NeRF: Tackling Sparsity in NeRF with View Morphing

    Authors: Matteo Bortolon, Alessio Del Bue, Fabio Poiesi

    Abstract: NeRF aims to learn a continuous neural scene representation by using a finite set of input images taken from various viewpoints. A well-known limitation of NeRF methods is their reliance on data: the fewer the viewpoints, the higher the likelihood of overfitting. This paper addresses this issue by introducing a novel method to generate geometrically consistent image transitions between viewpoints… ▽ More

    Submitted 16 August, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: ICIAP 2023

  30. arXiv:2209.03638  [pdf, other

    cs.LG cs.CL cs.SI

    Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding

    Authors: Hebatallah A. Mohamed, Sebastiano Vascon, Feliks Hibraj, Stuart James, Diego Pilutti, Alessio Del Bue, Marcello Pelillo

    Abstract: Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we f… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  31. arXiv:2208.10238  [pdf, other

    cs.CV

    Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Sajid Javed, Muhammad Haroon Yousaf, Alessio Del Bue

    Abstract: Recent years have seen an increased interest in establishing association between faces and voices of celebrities leveraging audio-visual information from YouTube. Prior works adopt metric learning methods to learn an embedding space that is amenable for associated matching and verification tasks. Albeit showing some progress, such formulations are, however, restrictive due to dependency on distanc… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Submitted: IEEE Transactions on Multimedia. arXiv admin note: substantial text overlap with arXiv:2112.10483

  32. arXiv:2207.10574  [pdf, other

    cs.HC cs.AI cs.CV cs.LG cs.MM

    Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey

    Authors: Cigdem Beyan, Alessandro Vinciarelli, Alessio Del Bue

    Abstract: Automated co-located human-human interaction analysis has been addressed by the use of nonverbal communication as measurable evidence of social and psychological phenomena. We survey the computing studies (since 2010) detecting phenomena related to social traits (e.g., leadership, dominance, personality traits), social roles/relations, and interaction dynamics (e.g., group cohesion, engagement, ra… ▽ More

    Submitted 4 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, https://doi.org/10.1145/3626516

  33. arXiv:2207.09445  [pdf, other

    cs.CV

    PoserNet: Refining Relative Camera Poses Exploiting Object Detections

    Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

    Abstract: The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pai… ▽ More

    Submitted 21 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022

  34. arXiv:2207.05634  [pdf, other

    cs.CV

    GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image

    Authors: Davide Talon, Alessio Del Bue, Stuart James

    Abstract: Puzzle solving is a combinatorial challenge due to the difficulty of matching adjacent pieces. Instead, we infer a mental image from all pieces, which a given piece can then be matched against avoiding the combinatorial explosion. Exploiting advancements in Generative Adversarial methods, we learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint emb… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted at International Conference of Image Processing (ICIP22)

  35. arXiv:2204.10312  [pdf, other

    cs.CV

    Unsupervised Human Action Recognition with Skeletal Graph Laplacian and Self-Supervised Viewpoints Invariance

    Authors: Giancarlo Paoletti, Jacopo Cavazza, Cigdem Beyan, Alessio Del Bue

    Abstract: This paper presents a novel end-to-end method for the problem of skeleton-based unsupervised human action recognition. We propose a new architecture with a convolutional autoencoder that uses graph Laplacian regularization to model the skeletal geometry across the temporal dynamics of actions. Our approach is robust towards viewpoint variations by including a self-supervised gradient reverse layer… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Journal ref: The 32nd British Machine Vision Conference (BMVC) 2021

  36. arXiv:2203.05380  [pdf, other

    cs.CV

    Spatial Commonsense Graph for Object Localisation in Partial Scenes

    Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, Alessio Del Bue

    Abstract: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a co… ▽ More

    Submitted 14 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022, project website: http://fgiuliari.github.io/projects/SpatialCommonsenseGraph/

  37. arXiv:2201.00577  [pdf, other

    cs.CV

    Semantically Grounded Visual Embeddings for Zero-Shot Learning

    Authors: Shah Nawaz, Jacopo Cavazza, Alessio Del Bue

    Abstract: Zero-shot learning methods rely on fixed visual and semantic embeddings, extracted from independent vision and language models, both pre-trained for other large-scale tasks. This is a weakness of current zero-shot learning frameworks as such disjoint embeddings fail to adequately associate visual and textual information to their shared semantic content. Therefore, we propose to learn semantically… ▽ More

    Submitted 10 April, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

    Comments: Accepted at CVPRW

  38. arXiv:2112.10483  [pdf, other

    cs.CV

    Fusion and Orthogonal Projection for Improved Face-Voice Association

    Authors: Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, Alessio Del Bue

    Abstract: We study the problem of learning association between face and voice, which is gaining interest in the computer vision community lately. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks. Albeit showing some progress, such loss formulations are, however, restrictive due to dependency on distance-dependent marg… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

  39. arXiv:2108.03257  [pdf, other

    cs.CV

    (Just) A Spoonful of Refinements Helps the Registration Error Go Down

    Authors: Sérgio Agostinho, Aljoša Ošep, Alessio Del Bue, Laura Leal-Taixé

    Abstract: We tackle data-driven 3D point cloud registration. Given point correspondences, the standard Kabsch algorithm provides an optimal rotation estimate. This allows to train registration models in an end-to-end manner by differentiating the SVD operation. However, given the initial rotation estimate supplied by Kabsch, we show we can improve point correspondence learning during model training by exten… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 (Oral)

  40. arXiv:2107.00914  [pdf, other

    cs.RO

    POMP++: Pomcp-based Active Visual Search in unknown indoor environments

    Authors: Francesco Giuliari, Alberto Castellini, Riccardo Berra, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti, Yiming Wang

    Abstract: In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belie… ▽ More

    Submitted 5 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  41. arXiv:2104.10609  [pdf, other

    cs.CV

    Lifting Monocular Events to 3D Human Poses

    Authors: Gianluca Scarpellini, Pietro Morerio, Alessio Del Bue

    Abstract: This paper presents a novel 3D human pose estimation approach using a single stream of asynchronous events as input. Most of the state-of-the-art approaches solve this task with RGB cameras, however struggling when subjects are moving fast. On the other hand, event-based 3D pose estimation benefits from the advantages of event-cameras, especially their efficiency and robustness to appearance chang… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  42. arXiv:2101.10772  [pdf, other

    cs.CV

    LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view

    Authors: Mohamed Dahy Elkhouly, Theodore Tsesmelis, Alessio Del Bue, Stuart James

    Abstract: Specular highlights are commonplace in images, however, methods for detecting them and in turn removing the phenomenon are particularly challenging. A reason for this, is due to the difficulty of creating a dataset for training or evaluation, as in the real-world we lack the necessary control over the environment. Therefore, we propose a novel physically-based rendered LIGHT Specularity (LIGHTS) D… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  43. arXiv:2101.10734  [pdf, other

    cs.CV

    Consistent Mesh Colors for Multi-View Reconstructed 3D Scenes

    Authors: Mohamed Dahy Elkhouly, Alessio Del Bue, Stuart James

    Abstract: We address the issue of creating consistent mesh texture maps captured from scenes without color calibration. We find that the method for aggregation of the multiple views is crucial for creating spatially consistent meshes without the need to explicitly optimize for spatial consistency. We compute a color prior from the cross-correlation of observable view faces and the faces per view to identify… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  44. arXiv:2012.06531  [pdf, other

    eess.IV cs.CV cs.LG

    AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

    Authors: Paolo Soda, Natascha Claudia D'Amico, Jacopo Tessadori, Giovanni Valbusa, Valerio Guarrasi, Chandra Bortolotto, Muhammad Usman Akbar, Rosa Sicilia, Ermanno Cordelli, Deborah Fazzini, Michaela Cellina, Giancarlo Oliva, Giovanni Callea, Silvia Panella, Maurizio Cariati, Diletta Cozzi, Vittorio Miele, Elvira Stellato, Gian Paolo Carrafiello, Giulia Castorani, Annalisa Simeone, Lorenzo Preda, Giulio Iannello, Alessio Del Bue, Fabio Tedoldi , et al. (3 additional authors not shown)

    Abstract: Recent epidemiological data report that worldwide more than 53 million people have been infected by SARS-CoV-2, resulting in 1.3 million deaths. The disease has been spreading very rapidly and few months after the identification of the first infected, shortage of hospital resources quickly became a problem. In this work we investigate whether chest X-ray (CXR) can be used as a possible tool for th… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  45. arXiv:2011.14669  [pdf, other

    cs.CV cs.RO

    Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration

    Authors: Yiming Wang, Alessio Del Bue

    Abstract: In this work we address the problem of autonomous 3D exploration of an unknown indoor environment using a depth camera. We cast the problem as the estimation of the Next Best View (NBV) that maximises the coverage of the unknown area. We do this by re-formulating NBV estimation as a classification problem and we propose a novel learning-based metric that encodes both, the current 3D observation (a… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: published on European Conference on Computer Vision, 2020

  46. arXiv:2011.02018  [pdf, other

    cs.CV

    Single Image Human Proxemics Estimation for Visual Social Distancing

    Authors: Maya Aghaei, Matteo Bustreo, Yiming Wang, Gianluca Bailo, Pietro Morerio, Alessio Del Bue

    Abstract: In this work, we address the problem of estimating the so-called "Social Distancing" given a single uncalibrated image in unconstrained scenarios. Our approach proposes a semi-automatic solution to approximate the homography matrix between the scene ground and image plane. With the estimated homography, we then leverage an off-the-shelf pose detector to detect body poses on the image and to reason… ▽ More

    Submitted 5 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Paper accepted at WACV 2021 conference

  47. arXiv:2010.09557  [pdf, other

    cs.CV

    A Versatile Crack Inspection Portable System based on Classifier Ensemble and Controlled Illumination

    Authors: Milind G. Padalkar, Carlos Beltrán-González, Matteo Bustreo, Alessio Del Bue, Vittorio Murino

    Abstract: This paper presents a novel setup for automatic visual inspection of cracks in ceramic tile as well as studies the effect of various classifiers and height-varying illumination conditions for this task. The intuition behind this setup is that cracks can be better visualized under specific lighting conditions than others. Our setup, which is designed for field work with constraints in its maximum d… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted in ICPR 2020

  48. arXiv:2010.08428  [pdf, other

    cs.SD eess.AS

    Are Multiple Cross-Correlation Identities better than just Two? Improving the Estimate of Time Differences-of-Arrivals from Blind Audio Signals

    Authors: Danilo Greco, Jacopo Cavazza, Alessio Del Bue

    Abstract: Given an unknown audio source, the estimation of time differences-of-arrivals (TDOAs) can be efficiently and robustly solved using blind channel identification and exploiting the cross-correlation identity (CCI). Prior "blind" works have improved the estimate of TDOAs by means of different algorithmic solutions and optimization strategies, while always sticking to the case N = 2 microphones. But w… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  49. arXiv:2009.08140  [pdf, other

    cs.RO cs.CV

    POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments

    Authors: Yiming Wang, Francesco Giuliari, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

    Abstract: In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decisi… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted at BMVC2020

  50. Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

    Authors: Giancarlo Paoletti, Jacopo Cavazza, Cigdem Beyan, Alessio Del Bue

    Abstract: This paper tackles the problem of human action recognition, defined as classifying which action is displayed in a trimmed sequence, from skeletal data. Albeit state-of-the-art approaches designed for this application are all supervised, in this paper we pursue a more challenging direction: Solving the problem with unsupervised learning. To this end, we propose a novel subspace clustering method, w… ▽ More

    Submitted 21 June, 2020; originally announced June 2020.

    Journal ref: 25th International Conference on Pattern Recognition (ICPR) 2020