Skip to main content

Showing 1–21 of 21 results for author: Conti, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12321  [pdf, other

    cs.AI cs.CL cs.CV

    Automatic benchmarking of large multimodal models via iterative experiment programming

    Authors: Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci

    Abstract: Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation process tedious and costly. In this paper, we present APEx, Automatic Programming of Experiments, the first framework for automatic benchmarking of LMMs. Gi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 31 pages, 6 figures, code is available at https://github.com/altndrr/apex

  2. arXiv:2406.04345  [pdf, other

    cs.CV

    Stereo-Depth Fusion through Virtual Pattern Projection

    Authors: Luca Bartolomei, Matteo Poggi, Fabio Tosi, Andrea Conti, Stefano Mattoccia

    Abstract: This paper presents a novel general-purpose stereo and depth data fusion paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: extended version of ICCV 2023: "Active Stereo Without Pattern Projector"

  3. arXiv:2404.10864  [pdf, other

    cs.CV

    Vocabulary-free Image Classification and Semantic Segmentation

    Authors: Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Large vision-language models revolutionized image classification and semantic segmentation paradigms. However, they typically assume a pre-defined set of categories, or vocabulary, at test time for composing textual prompts. This assumption is impractical in scenarios with unknown or evolving semantic context. Here, we address this issue and introduce the Vocabulary-free Image Classification (VIC)… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Under review, 22 pages, 10 figures, code is available at https://github.com/altndrr/vicss. arXiv admin note: text overlap with arXiv:2306.00917

  4. arXiv:2404.07560  [pdf, other

    cs.RO cs.AI

    Socially Pertinent Robots in Gerontological Healthcare

    Authors: Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard , et al. (19 additional authors not shown)

    Abstract: Despite the many recent achievements in develo** and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilitie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  5. arXiv:2404.05426  [pdf, other

    cs.CV

    Test-Time Zero-Shot Temporal Action Localization

    Authors: Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Zero-Shot Temporal Action Localization (ZS-TAL) seeks to identify and locate actions in untrimmed videos unseen during training. Existing ZS-TAL methods involve fine-tuning a model on a large amount of annotated training data. While effective, training-based ZS-TAL approaches assume the availability of labeled data for supervised learning, which can be impractical in some applications. Furthermore… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  6. arXiv:2401.14401  [pdf, other

    cs.CV

    Range-Agnostic Multi-View Depth Estimation With Keyframe Selection

    Authors: Andrea Conti, Matteo Poggi, Valerio Cambareri, Stefano Mattoccia

    Abstract: Methods for 3D reconstruction from posed frames require prior knowledge about the scene metric range, usually to recover matching cues along the epipolar lines and narrow the search range. However, such prior might not be directly available or estimated inaccurately in real scenarios -- e.g., outdoor 3D reconstruction from video sequences -- therefore heavily hampering performance. In this paper,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 3DV 2024 Project Page https://andreaconti.github.io/projects/range_agnostic_multi_view_depth GitHub Page https://github.com/andreaconti/ramdepth.git

  7. arXiv:2312.09254  [pdf, other

    cs.CV

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization

    Authors: Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, Stefano Mattoccia

    Abstract: This paper proposes a new framework for depth completion robust against domain-shifting issues. It exploits the generalization capability of modern stereo networks to face depth completion, by processing fictitious stereo pairs obtained through a virtual pattern projection paradigm. Any stereo network or traditional stereo matcher can be seamlessly plugged into our framework, allowing for the depl… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 3DV 2024. Code: https://github.com/bartn8/vppdc - Project page: https://vppdc.github.io/

  8. arXiv:2309.12315  [pdf, other

    cs.CV

    Active Stereo Without Pattern Projector

    Authors: Luca Bartolomei, Matteo Poggi, Fabio Tosi, Andrea Conti, Stefano Mattoccia

    Abstract: This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. We virtually project a pattern over the left and right images according to the sparse measurements obtained from a depth sensor. Any such devices can be seamlessly plugged into our framework, allowing for the deployment of a virtual active stere… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Code: https://github.com/bartn8/vppstereo - Project page: https://vppstereo.github.io

  9. arXiv:2308.09139  [pdf, other

    cs.CV

    The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

    Authors: Giacomo Zara, Alessandro Conti, Subhankar Roy, Stéphane Lathuilière, Paolo Rota, Elisa Ricci

    Abstract: Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this wo… ▽ More

    Submitted 22 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV2023, 14 pages, 7 figures, code is available at https://github.com/giaczara/dallv

  10. arXiv:2306.00917  [pdf, other

    cs.CV

    Vocabulary-free Image Classification

    Authors: Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Recent advances in large vision-language models have revolutionized the image classification paradigm. Despite showing impressive zero-shot capabilities, a pre-defined set of categories, a.k.a. the vocabulary, is assumed at test time for composing the textual prompts. However, such assumption can be impractical when the semantic context is unknown and evolving. We thus formalize a novel task, term… ▽ More

    Submitted 12 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS2023, 19 pages, 8 figures, code is available at https://github.com/altndrr/vic

  11. arXiv:2212.00790  [pdf, other

    cs.CV

    Sparsity Agnostic Depth Completion

    Authors: Andrea Conti, Matteo Poggi, Stefano Mattoccia

    Abstract: We present a novel depth completion approach agnostic to the sparsity of depth points, that is very likely to vary in many practical applications. State-of-the-art approaches yield accurate results only when processing a specific density and distribution of input points, i.e. the one observed during training, narrowing their deployment in real use cases. On the contrary, our solution is robust to… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted for publication at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2023

  12. arXiv:2210.11467  [pdf, other

    cs.CV

    Multi-View Guided Multi-View Stereo

    Authors: Matteo Poggi, Andrea Conti, Stefano Mattoccia

    Abstract: This paper introduces a novel deep framework for dense 3D reconstruction from multiple image frames, leveraging a sparse set of depth measurements gathered jointly with image acquisition. Given a deep multi-view stereo network, our framework uses sparse depth hints to guide the neural network by modulating the plane-sweep cost volume built during the forward step, enabling us to infer constantly m… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: IROS 2022. First two authors contributed equally. Project page: https://github.com/andreaconti/multi-view-guided-multi-view-stereo

  13. arXiv:2210.05246  [pdf, other

    cs.CV cs.AI

    Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition

    Authors: Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Automatically understanding emotions from visual data is a fundamental task for human behaviour understanding. While models devised for Facial Expression Recognition (FER) have demonstrated excellent performances on many datasets, they often suffer from severe performance degradation when trained and tested on different datasets due to domain shift. In addition, as face images are considered highl… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted at BMVC2022, 13 pages, 4 figures, code is available at https://github.com/altndrr/clup

  14. arXiv:2210.03118  [pdf, other

    cs.CV

    Unsupervised confidence for LiDAR depth maps and applications

    Authors: Andrea Conti, Matteo Poggi, Filippo Aleotti, Stefano Mattoccia

    Abstract: Depth perception is pivotal in many fields, such as robotics and autonomous driving, to name a few. Consequently, depth sensors such as LiDARs rapidly spread in many applications. The 3D point clouds generated by these sensors must often be coupled with an RGB camera to understand the framed scene semantically. Usually, the former is projected over the camera image plane, leading to a sparse depth… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: IROS 2022. Code available at https://github.com/andreaconti/lidar-confidence

  15. arXiv:2208.08860  [pdf

    eess.SP cs.LG physics.med-ph

    An intertwined neural network model for EEG classification in brain-computer interfaces

    Authors: Andrea Duggento, Mario De Lorenzo, Stefano Bargione, Allegra Conti, Vincenzo Catrambone, Gaetano Valenza, Nicola Toschi

    Abstract: The brain computer interface (BCI) is a nonstimulatory direct and occasionally bidirectional communication link between the brain and a computer or an external device. Classically, EEG-based BCI algorithms have relied on models such as support vector machines and linear discriminant analysis or multiclass common spatial patterns. During the last decade, however, more sophisticated machine learning… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  16. arXiv:2207.11482  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

    Authors: Riccardo Franceschini, Enrico Fini, Cigdem Beyan, Alessandro Conti, Federica Arrigoni, Elisa Ricci

    Abstract: Emotion recognition is involved in several real-world applications. With an increase in available modalities, automatic understanding of emotions is being performed more accurately. The success in Multimodal Emotion Recognition (MER), primarily relies on the supervised learning paradigm. However, data annotation is expensive, time-consuming, and as emotion expression and perception depends on seve… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: Accepted to 26th International Conference on Pattern Recognition (ICPR) 2022

  17. arXiv:2204.01693  [pdf, other

    cs.CV

    Monitoring social distancing with single image depth estimation

    Authors: Alessio Mingozzi, Andrea Conti, Filippo Aleotti, Matteo Poggi, Stefano Mattoccia

    Abstract: The recent pandemic emergency raised many challenges regarding the countermeasures aimed at containing the virus spread, and constraining the minimum distance between people resulted in one of the most effective strategies. Thus, the implementation of autonomous systems capable of monitoring the so-called social distance gained much interest. In this paper, we aim to address this task leveraging a… ▽ More

    Submitted 29 April, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted for pubblication on IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)

  18. arXiv:2001.11494  [pdf, other

    cs.IT eess.SP

    Peregrine: Network Localization and Navigation with Scalable Inference and Efficient Operation

    Authors: Bryan Teague, Zhenyu Liu, Florian Meyer, Andrea Conti, Moe Z. Win

    Abstract: Location-aware networks will enable new services and applications in fields such as autonomous driving, smart cities, and the Internet-of-Things. One promising solution for ubiquitous localization is network localization and navigation (NLN), where devices form a network that cooperatively localizes itself, reducing the infrastructure needed for accurate localization. This paper introduces a real-… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: 16 pages, 8 figures

  19. Log-concavity property of the error probability with application to local bounds for wireless communications

    Authors: Andrea Conti, Dmitry Panchenko, Sergiy Sidenko, Velio Tralli

    Abstract: A clear understanding the behavior of the error probability (EP) as a function of signal-to-noise ratio (SNR) and other system parameters is fundamental for assessing the design of digital wireless communication systems.We propose an analytical framework based on the log-concavity property of the EP which we prove for a wide family of multidimensional modulation formats in the presence of Gaussi… ▽ More

    Submitted 20 February, 2009; v1 submitted 6 October, 2007; originally announced October 2007.

    Journal ref: IEEE Trans. Inform. Theory, 2009, vol. 55, no. 6, 2766-2775.

  20. arXiv:0704.0282  [pdf, other

    cs.IT cs.CC

    On Punctured Pragmatic Space-Time Codes in Block Fading Channel

    Authors: Samuele Bandi, Luca Stabellini, Andrea Conti, Velio Tralli

    Abstract: This paper considers the use of punctured convolutional codes to obtain pragmatic space-time trellis codes over block-fading channel. We show that good performance can be achieved even when puncturation is adopted and that we can still employ the same Viterbi decoder of the convolutional mother code by using approximated metrics without increasing the complexity of the decoding operations.

    Submitted 2 April, 2007; originally announced April 2007.

  21. arXiv:cs/0703142  [pdf, ps, other

    cs.IT

    Pragmatic Space-Time Trellis Codes for Block Fading Channels

    Authors: Marco Chiani, Andrea Conti, Velio Tralli

    Abstract: A pragmatic approach for the construction of space-time codes over block fading channels is investigated. The approach consists in using common convolutional encoders and Viterbi decoders with suitable generators and rates, thus greatly simplifying the implementation of space-time codes. For the design of pragmatic space-time codes a methodology is proposed and applied, based on the extension of… ▽ More

    Submitted 28 March, 2007; originally announced March 2007.