Skip to main content

Showing 1–50 of 145 results for author: Perez, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04706  [pdf, other

    cs.LG cs.NE eess.SP math.PR stat.ML

    Winner-takes-all learners are geometry-aware conditional density estimators

    Authors: Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

    Abstract: Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypothe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning, Jul 2024, Vienne (Autriche), Austria

  2. arXiv:2405.15508  [pdf, other

    hep-ex cs.LG

    Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

    Authors: Olivia Jullian Parra, Julián García Pardiñas, Lorenzo Del Pianta Pérez, Maximilian Janisch, Suzanne Klaver, Thomas Lehéricy, Nicola Serra

    Abstract: Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating cond… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2404.14027  [pdf, other

    cs.CV cs.LG

    OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

    Authors: Sophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud Marlet

    Abstract: We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to th… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

  4. arXiv:2403.18111  [pdf, other

    cs.HC

    Scrolly2Reel: Retargeting Graphics for Social Media Using Narrative Beats

    Authors: Duy K. Nguyen, Jenny Ma, Pedro Alejandro Perez, Lydia B. Chilton

    Abstract: Content retargeting is crucial for social media creators. Once great content is created, it is important to reach as broad an audience as possible. This is particularly important in journalism where younger audiences are shifting away from print and towards short-video platforms. Many newspapers already create rich graphics for the web that they want to be able to reuse for social media. One examp… ▽ More

    Submitted 19 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 9 pages, 3 figures

  5. arXiv:2401.16434  [pdf

    eess.SY cs.LG eess.SP

    A novel ANROA based control approach for grid-tied multi-functional solar energy conversion system

    Authors: Dinanath Prasad, Narendra Kumar, Rakhi Sharma, Hasmat Malik, Fausto Pedro García Márquez, Jesús María Pinar Pérez

    Abstract: An adaptive control approach for a three-phase grid-interfaced solar photovoltaic system based on the new Neuro-Fuzzy Inference System with Rain Optimization Algorithm (ANROA) methodology is proposed and discussed in this manuscript. This method incorporates an Adaptive Neuro-fuzzy Inference System (ANFIS) with a Rain Optimization Algorithm (ROA). The ANFIS controller has excellent maximum trackin… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: The paper was published in Energy Reports journal (ELSEVIER). Cite as: Prasad, D., Kumar, N., Sharma, R., Malik, H., Márquez, F. P. G., & Pinar-Pérez, J. M. (2023). A novel ANROA based control approach for grid-tied multi-functional solar energy conversion system. Energy Reports, 9, 2044-2057

    Journal ref: Energy Reports (2023) Elsevier

  6. arXiv:2401.09413  [pdf, other

    cs.CV

    POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

    Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: accepted to NeurIPS 2023

  7. arXiv:2401.08251  [pdf

    cs.GT econ.GN eess.SY

    A techno-economic model for avoiding conflicts of interest between owners of offshore wind farms and maintenance suppliers

    Authors: Alberto Pliego Marugán, Fausto Pedro García Márquez, Jesús María Pinar Pérez

    Abstract: Currently, wind energy is one of the most important sources of renewable energy. Offshore locations for wind turbines are increasingly exploited because of their numerous advantages. However, offshore wind farms require high investment in maintenance service. Due to its complexity and special requirements, maintenance service is usually outsourced by wind farm owners. In this paper, we propose a n… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Published in Renewable and Sustainable Energy Reviews (ELSEVIER) 10 July 2022. DOI: https://doi.org/10.1016/j.rser.2022.112753 Cite as: Marugán, A. P., Márquez, F. P. G., & Pérez, J. M. P. (2022). A techno-economic model for avoiding conflicts of interest between owners of offshore wind farms and maintenance suppliers. Renewable and Sustainable Energy Reviews, 168, 112753

  8. Automatic UAV-based Airport Pavement Inspection Using Mixed Real and Virtual Scenarios

    Authors: Pablo Alonso, Jon Ander Iñiguez de Gordoa, Juan Diego Ortega, Sara García, Francisco Javier Iriarte, Marcos Nieto

    Abstract: Runway and taxiway pavements are exposed to high stress during their projected lifetime, which inevitably leads to a decrease in their condition over time. To make sure airport pavement condition ensure uninterrupted and resilient operations, it is of utmost importance to monitor their condition and conduct regular inspections. UAV-based inspection is recently gaining importance due to its wide ra… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, published in proceedings of 15th International Conference on Machine Vision (ICMV)

    Journal ref: Proc. SPIE 12701, Fifteenth International Conference on Machine Vision (ICMV 2022), 1270118

  9. arXiv:2312.13863  [pdf, other

    cs.LG cs.CR cs.RO

    Manipulating Trajectory Prediction with Backdoors

    Authors: Kaouther Messaoud, Kathrin Grosse, Mickael Chen, Matthieu Cord, Patrick Pérez, Alexandre Alahi

    Abstract: Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a relevant concern. In this paper, we focus on backdoors - a security threat acknowledged in other fields but so far overlooked for trajectory prediction. To this end,… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 9 pages, 7 figures

  10. arXiv:2312.12487  [pdf, other

    cs.LG cs.AI

    Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

    Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

    Abstract: This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search fr… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  11. arXiv:2312.12359  [pdf, other

    cs.CV

    CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation

    Authors: Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez

    Abstract: The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless interaction with arbitrary text prompts. However, its lack of spatial awareness makes it unsuitable for dense computer vision tasks, e.g., semantic segmentation, without an additional fine-tuning step that often uses annotations and can potentially suppress its original open-vocabulary properties. Meanwhile, s… ▽ More

    Submitted 27 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  12. arXiv:2312.09231  [pdf, other

    cs.CV cs.LG

    Reliability in Semantic Segmentation: Can We Use Synthetic Data?

    Authors: Thibaut Loiseau, Tuan-Hung Vu, Mickael Chen, Patrick Pérez, Matthieu Cord

    Abstract: Assessing the reliability of perception models to covariate shifts and out-of-distribution (OOD) detection is crucial for safety-critical applications such as autonomous vehicles. By nature of the task, however, the relevant data is difficult to collect and annotate. In this paper, we challenge cutting-edge generative models to automatically synthesize data for assessing reliability in semantic se… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project Page: https://valeoai.github.io/blog/publications/GenVal

  13. arXiv:2312.08879  [pdf, other

    cs.CV

    Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

    Authors: Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomas Svoboda

    Abstract: Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of ``rigid clusters" in the input point clouds. The definition of t… ▽ More

    Submitted 26 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  14. arXiv:2312.06386  [pdf, other

    cs.CV cs.AI cs.LG

    ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation

    Authors: Cédric Rommel, Victor Letzelter, Nermin Samet, Renaud Marlet, Matthieu Cord, Patrick Pérez, Eduardo Valle

    Abstract: Monocular 3D human pose estimation (3D-HPE) is an inherently ambiguous task, as a 2D pose in an image might originate from different possible 3D poses. Yet, most 3D-HPE methods rely on regression models, which assume a one-to-one map** between inputs and outputs. In this work, we provide theoretical and empirical evidence that, because of this ambiguity, common regression models are bound to pre… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  15. arXiv:2312.00703  [pdf, other

    cs.CV

    PointBeV: A Sparse Approach to BeV Predictions

    Authors: Loick Chambon, Eloi Zablocki, Mickael Chen, Florent Bartoccioni, Patrick Perez, Matthieu Cord

    Abstract: Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: https://github.com/valeoai/PointBeV

  16. arXiv:2311.17922  [pdf, other

    cs.CV

    A Simple Recipe for Language-guided Domain Generalized Segmentation

    Authors: Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

    Abstract: Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization c… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  17. arXiv:2311.14542  [pdf, other

    cs.CV

    ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

    Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny

    Abstract: Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stage… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  18. arXiv:2311.07229  [pdf, other

    cs.IR

    Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms

    Authors: Linus W. Dietz, Pablo Sánchez, Alejandro Bellogín

    Abstract: The performance of recommendation algorithms is closely tied to key characteristics of the data sets they use, such as sparsity, popularity bias, and preference distributions. In this paper, we conduct a comprehensive explanatory analysis to shed light on the impact of a broad range of data characteristics within the point-of-interest (POI) recommendation domain. To accomplish this, we extend prio… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  19. arXiv:2311.01052  [pdf, other

    stat.ML cs.LG

    Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

    Authors: Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard

    Abstract: We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existi… ▽ More

    Submitted 16 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Journal ref: Advances in neural information processing systems, Dec 2023, New Orleans, United States

  20. arXiv:2310.17504  [pdf, other

    cs.CV

    Three Pillars improving Vision Foundation Model Distillation for Lidar

    Authors: Gilles Puy, Spyros Gidaris, Alexandre Boulch, Oriane Siméoni, Corentin Sautier, Patrick Pérez, Andrei Bursuc, Renaud Marlet

    Abstract: Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show prom… ▽ More

    Submitted 19 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: The code is available at https://github.com/valeoai/ScaLR

  21. arXiv:2310.12904  [pdf, other

    cs.CV

    Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

    Authors: Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, Patrick Pérez

    Abstract: The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about the… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  22. arXiv:2310.07173  [pdf

    quant-ph cs.ET

    Unleashing quantum algorithms with Qinterpreter: bridging the gap between theory and practice across leading quantum computing platforms

    Authors: Wilmer Contreras Sepúlveda, Ángel David Torres-Palencia, José Javier Sánchez Mondragón, Braulio Misael Villegas-Martínez, J. Jesús Escobedo-Alatorre, Sandra Gesing, Néstor Lozano-Crisóstomo, Julio César García-Melgarejo, Juan Carlos Sánchez Pérez, Eddie Nelson Palacios- Pérez, Omar PalilleroSandoval

    Abstract: Quantum computing is a rapidly emerging and promising field that has the potential to revolutionize numerous research domains, including drug design, network technologies and sustainable energy. Due to the inherent complexity and divergence from classical computing, several major quantum computing libraries have been developed to implement quantum algorithms, namely IBM Qiskit, Amazon Braket, Cirq… ▽ More

    Submitted 13 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  23. arXiv:2309.17224  [pdf, other

    cs.LG cs.AR cs.CL cs.ET cs.PF

    Training and inference of large language models using 8-bit floating point

    Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

    Abstract: FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect h… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    ACM Class: I.2.7; B.2.4

  24. arXiv:2309.16670  [pdf, other

    cs.CV cs.GR cs.HC

    Decaf: Monocular Deformation Capture for Face and Hand Interactions

    Authors: Soshi Shimada, Vladislav Golyanik, Patrick Pérez, Christian Theobalt

    Abstract: Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects. Modelling dense non-rigid object deformations in this setting remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR and avatar communications. This is due to the severe ill-posedness of the monocular view setting… ▽ More

    Submitted 13 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  25. arXiv:2309.08302  [pdf, other

    cs.CV cs.RO

    T-UDA: Temporal Unsupervised Domain Adaptation in Sequential Point Clouds

    Authors: Awet Haileslassie Gebrehiwot, David Hurych, Karel Zimmermann, Patrick Pérez, Tomáš Svoboda

    Abstract: Deep perception models have to reliably cope with an open-world setting of domain shifts induced by different geographic regions, sensor properties, mounting positions, and several other reasons. Since covering all domains with annotated data is technically intractable due to the endless possible variations, researchers focus on unsupervised domain adaptation (UDA) methods that adapt models traine… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Will appear at IEEE/RSJ International Conference on Intelligent Robots and Systems 2023 (IROS 2023)

  26. arXiv:2309.01575  [pdf, other

    cs.CV cs.LG

    DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion

    Authors: Cédric Rommel, Eduardo Valle, Mickaël Chen, Souhaiel Khalfaoui, Renaud Marlet, Matthieu Cord, Patrick Pérez

    Abstract: We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstra… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted to 2023 International Conference on Computer Vision Workshop (Analysis and Modeling of Faces and Gestures)

  27. arXiv:2307.09361  [pdf, other

    cs.CV cs.AI cs.LG

    MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

    Authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez

    Abstract: Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  28. arXiv:2306.09281  [pdf, other

    cs.RO cs.CV

    Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?

    Authors: Yihong Xu, Loïck Chambon, Éloi Zablocki, Mickaël Chen, Alexandre Alahi, Matthieu Cord, Patrick Pérez

    Abstract: Motion forecasting is crucial in enabling autonomous vehicles to anticipate the future trajectories of surrounding agents. To do so, it requires solving map**, detection, tracking, and then forecasting problems, in a multi-step pipeline. In this complex system, advances in conventional forecasting methods have been made using curated data, i.e., with the assumption of perfect maps, detection, an… ▽ More

    Submitted 5 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted to ICRA 2024

  29. arXiv:2303.07442  [pdf, other

    eess.AS cs.SD

    A processing framework to access large quantities of whispered speech found in ASMR

    Authors: Pablo Perez Zarazaga, Gustav Eje Henter, Zofia Malisz

    Abstract: Whispering is a ubiquitous mode of communication that humans use daily. Despite this, whispered speech has been poorly served by existing speech technology due to a shortage of resources and processing methodology. To remedy this, this paper provides a processing framework that enables access to large and unique data of high-quality whispered speech. We obtain the data from recordings submitted to… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023, 5 pages, 2 figures, 2 tables

  30. arXiv:2303.04895  [pdf, ps, other

    cs.AI cs.LO

    Morpho-logic from a Topos Perspective: Application to symbolic AI

    Authors: Marc Aiguier, Isabelle Bloch, Salim Nibouche, Ramon Pino Perez

    Abstract: Modal logics have proved useful for many reasoning tasks in symbolic artificial intelligence (AI), such as belief revision, spatial reasoning, among others. On the other hand, mathematical morphology (MM) is a theory for non-linear analysis of structures, that was widely developed and applied in image analysis. Its mathematical bases rely on algebra, complete lattices, topology. Strong links have… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  31. arXiv:2302.03462  [pdf, other

    cs.LG

    Diverse Probabilistic Trajectory Forecasting with Admissibility Constraints

    Authors: Laura Calem, Hedi Ben-Younes, Patrick Pérez, Nicolas Thome

    Abstract: Predicting multiple trajectories for road users is important for automated driving systems: ego-vehicle motion planning indeed requires a clear view of the possible motions of the surrounding agents. However, the generative models used for multiple-trajectory forecasting suffer from a lack of diversity in their proposals. To avoid this form of collapse, we propose a novel method for structured pre… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Journal ref: International Conference on Pattern Recognition (ICPR) 2022

  32. arXiv:2301.11217  [pdf, other

    cs.NI

    An eXtended Reality Offloading IP Traffic Dataset and Models

    Authors: Diego Gonzalez Morin, Daniele Medda, Athanasios Iossifides, Periklis Chatzimisios, Ana Garcia Armada, Alvaro Villegas, Pablo Perez

    Abstract: In recent years, advances in immersive multimedia technologies, such as extended reality (XR) technologies, have led to more realistic and user-friendly devices. However, these devices are often bulky and uncomfortable, still requiring tether connectivity for demanding applications. The deployment of the fifth generation of telecommunications technologies (5G) has set the basis for XR offloading s… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Submitted to IEEE Transactions on Mobile Computing

  33. arXiv:2212.07834  [pdf, other

    cs.CV

    Unsupervised Object Localization: Observing the Background to Discover Objects

    Authors: Oriane Siméoni, Chloé Sekkat, Gilles Puy, Antonin Vobecky, Éloi Zablocki, Patrick Pérez

    Abstract: Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depen… ▽ More

    Submitted 29 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  34. arXiv:2212.03241  [pdf, other

    cs.CV cs.LG

    PØDA: Prompt-driven Zero-shot Domain Adaptation

    Authors: Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

    Abstract: Domain adaptation has been vastly investigated in computer vision but still requires access to target images at train time, which might be intractable in some uncommon conditions. In this paper, we propose the task of `Prompt-driven Zero-shot Domain Adaptation', where we adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prom… ▽ More

    Submitted 19 August, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted to ICCV 2023, Project Page: https://astra-vision.github.io/PODA/

  35. arXiv:2211.12380  [pdf, other

    cs.CV cs.AI

    OCTET: Object-aware Counterfactual Explanations

    Authors: Mehdi Zemni, Mickaël Chen, Éloi Zablocki, Hédi Ben-Younes, Patrick Pérez, Matthieu Cord

    Abstract: Nowadays, deep vision models are being widely deployed in safety-critical applications, e.g., autonomous driving, and explainability of such models is becoming a pressing concern. Among explanation methods, counterfactual explanations aim to find minimal and interpretable changes to the input image that would also change the output of the model to be explained. Such explanations point end-users at… ▽ More

    Submitted 24 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  36. arXiv:2208.12639  [pdf, other

    cs.CV

    Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to User Study

    Authors: Diego Gonzalez Morin, Ester Gonzalez-Sosa, Pablo Perez, Alvaro Villegas

    Abstract: In this work we explore the creation of self-avatars through video pass-through in Mixed Reality (MR) applications. We present our end-to-end system, including: custom MR video pass-through implementation on a commercial head mounted display (HMD), our deep learning-based real-time egocentric body segmentation algorithm, and our optimized offloading architecture, to communicate the segmentation se… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Diego Gonzalez-Morin and Ester Gonzalez-Sosa contribute equally

  37. arXiv:2208.12625  [pdf, other

    cs.LG cs.CV

    Take One Gram of Neural Features, Get Enhanced Group Robustness

    Authors: Simon Roburin, Charles Corbière, Gilles Puy, Nicolas Thome, Matthieu Aubry, Renaud Marlet, Patrick Pérez

    Abstract: Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. The presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations. Extensive attempts have been made to develop methods improving worst-group rob… ▽ More

    Submitted 7 February, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: Long version (Previous version: OOD-CV Workshop @ ECCV 2022)

  38. arXiv:2208.00789  [pdf, other

    cs.CV cs.AI

    Self-supervised learning with rotation-invariant kernels

    Authors: Léon Zheng, Gilles Puy, Elisa Riccietti, Patrick Pérez, Rémi Gribonval

    Abstract: We introduce a regularization loss based on kernel mean embeddings with rotation-invariant kernels on the hypersphere (also known as dot-product kernels) for self-supervised learning of image representations. Besides being fully competitive with the state of the art, our method significantly reduces time and memory complexity for self-supervised training, making it implementable for very large emb… ▽ More

    Submitted 8 March, 2023; v1 submitted 28 July, 2022; originally announced August 2022.

    Journal ref: The Eleventh International Conference on Learning Representations, May 2023, Kigali, Rwanda

  39. arXiv:2207.12112  [pdf, other

    cs.CV

    Active Learning Strategies for Weakly-supervised Object Detection

    Authors: Huy V. Vo, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Jean Ponce

    Abstract: Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2022. Contains 27 pages, 9 tables and 6 figures

  40. Teachers in concordance for pseudo-labeling of 3D sequential data

    Authors: Awet Haileslassie Gebrehiwot, Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomáš Svoboda

    Abstract: Automatic pseudo-labeling is a powerful tool to tap into large amounts of sequential unlabeled data. It is specially appealing in safety-critical applications of autonomous driving, where performance requirements are extreme, datasets are large, and manual labeling is very challenging. We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup… ▽ More

    Submitted 5 July, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: This work has been submitted to the IEEE for publication

    MSC Class: 68T07 ACM Class: I.4.6; I.4.8

    Journal ref: in IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 536-543, Feb. 2023

  41. arXiv:2207.01296  [pdf, other

    cs.CV

    Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality

    Authors: Ester Gonzalez-Sosa, Andrija Gajic, Diego Gonzalez-Morin, Guillermo Robledo, Pablo Perez, Alvaro Villegas

    Abstract: In this work we present our real-time egocentric body segmentation algorithm. Our algorithm achieves a frame rate of 66 fps for an input resolution of 640x480, thanks to our shallow network inspired in Thundernet's architecture. Besides, we put a strong emphasis on the variability of the training data. More concretely, we describe the creation process of our Egocentric Bodies (EgoBodies) dataset,… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 9 pages, 9 figures

  42. arXiv:2206.13294  [pdf, other

    cs.CV cs.AI cs.RO

    LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

    Authors: Florent Bartoccioni, Éloi Zablocki, Andrei Bursuc, Patrick Pérez, Matthieu Cord, Karteek Alahari

    Abstract: Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-project… ▽ More

    Submitted 26 November, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    MSC Class: 68T45

    Journal ref: CoRL 2022 https://openreview.net/forum?id=abd_D-iVjk0

  43. Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practises for QoE Assessment

    Authors: Pablo Pérez, Ester Gonzalez-Sosa, Jesús Gutiérrez, Narciso García

    Abstract: Several technological and scientific advances have been achieved recently in the fields of immersive systems, which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a… ▽ More

    Submitted 1 September, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Frontiers in Signal Processing

    Journal ref: Front. Signal Process. (2022)

  44. arXiv:2205.05677  [pdf, other

    cs.CV cs.GR cs.HC

    HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense Contact Guidance

    Authors: Soshi Shimada, Vladislav Golyanik, Zhi Li, Patrick Pérez, Weipeng Xu, Christian Theobalt

    Abstract: Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation. Due to the inherent depth ambiguity of monocular settings, 3D motions captured with existing methods often contain severe artefacts such as incorrect body-scene inter-penetrations, jitter and body floating. To tackle th… ▽ More

    Submitted 26 July, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  45. arXiv:2204.11667  [pdf, other

    cs.CV

    Multi-Head Distillation for Continual Unsupervised Domain Adaptation in Semantic Segmentation

    Authors: Antoine Saporta, Arthur Douillard, Tuan-Hung Vu, Patrick Pérez, Matthieu Cord

    Abstract: Unsupervised Domain Adaptation (UDA) is a transfer learning task which aims at training on an unlabeled target domain by leveraging a labeled source domain. Beyond the traditional scope of UDA with a single source domain and a single target domain, real-world perception systems face a variety of scenarios to handle, from varying lighting conditions to many cities around the world. In this context,… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Workshop on Continual Learning

  46. arXiv:2204.04290  [pdf

    cs.NI

    FikoRE: 5G and Beyond RAN Emulator for Application Level Experimentation and Prototy**

    Authors: Diego Gonzalez Morin, ManuelJ. López Morales, Pablo Pérez, Ana García Armada Alvaro Villegas

    Abstract: Novel and cutting-edge use cases have arisen since the first deployments of the fifth generation of telecommunication networks (5G). There are plenty of well-though optimally design 5G simulators and emulators which allow telecommunication technologies engineers and researchers to thoroughly study and test the network. However, the 5G ecosystem is not only limited to the network itself: a fast dev… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  47. arXiv:2203.11160  [pdf, other

    cs.CV

    Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

    Authors: Antonin Vobecky, David Hurych, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars which, equipped with cameras and LiDAR sensors, drive around a city. Our contributions are threefold. First, we propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronize… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: v2: improved quality of images. See the project webpage https://vobecant.github.io/DriveAndSegment/ for the code and more

  48. Generalised Score Distribution: A Two-Parameter Discrete Distribution Accurately Describing Responses from Quality of Experience Subjective Experiments

    Authors: Jakub Nawała, Lucjan Janowski, Bogdan Ćmiel, Krzysztof Rusek, Pablo Pérez

    Abstract: Subjective responses from Multimedia Quality Assessment (MQA) experiments are conventionally analysed with methods not suitable for the data type these responses represent. Furthermore, obtaining subjective responses is resource intensive. A method allowing reuse of existing responses would be thus beneficial. Applying improper data analysis methods leads to difficult to interpret results. This en… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: 15 pages, 6 figures. Under review in IEEE Transactions on Multimedia

    Journal ref: IEEE Transactions on Multimedia, 2022

  49. arXiv:2112.10646  [pdf, other

    cs.CV eess.IV

    Raw High-Definition Radar for Multi-Task Learning

    Authors: Julien Rebut, Arthur Ouaknine, Waqas Malik, Patrick Pérez

    Abstract: With their robustness to adverse weather conditions and ability to measure speeds, radar sensors have been part of the automotive landscape for more than two decades. Recent progress toward High Definition (HD) Imaging radar has driven the angular resolution below the degree, thus approaching laser scanning performance. However, the amount of data a HD radar delivers and the computational cost to… ▽ More

    Submitted 13 April, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: 12 pages, 7 figures, 6 tables

    Journal ref: CVPR2022

  50. arXiv:2112.03252  [pdf, other

    cs.CV

    CSG0: Continual Urban Scene Generation with Zero Forgetting

    Authors: Himalaya Jain, Tuan-Hung Vu, Patrick Pérez, Matthieu Cord

    Abstract: With the rapid advances in generative adversarial networks (GANs), the visual quality of synthesised scenes keeps improving, including for complex urban scenes with applications to automated driving. We address in this work a continual scene generation setup in which GANs are trained on a stream of distinct domains; ideally, the learned models should eventually be able to generate new scenes in al… ▽ More

    Submitted 2 May, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Workshop on Continual Learning