Search | arXiv e-print repository

On Lepton and Baryon Numbers as Local Gauge Symmetries

Abstract: A simple theory where the total lepton number is a local gauge symmetry is proposed. In this context, the gauge anomalies are cancelled with the minimal number of extra fermionic fields and one predicts that the neutrinos are Majorana fermions. The properties of the neutrino sector are discussed showing that this theory predicts a $3+2$ light neutrino sector. We show that using the same fermionic… ▽ More A simple theory where the total lepton number is a local gauge symmetry is proposed. In this context, the gauge anomalies are cancelled with the minimal number of extra fermionic fields and one predicts that the neutrinos are Majorana fermions. The properties of the neutrino sector are discussed showing that this theory predicts a $3+2$ light neutrino sector. We show that using the same fermionic fields one can gauge the baryon number and define a simple theory where the lepton and baryon numbers can be spontaneously broken at the low scale in agreement with experiments. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04706 [pdf, other]

Winner-takes-all learners are geometry-aware conditional density estimators

Authors: Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

Abstract: Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypothe… ▽ More Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypotheses for uncertainty quantification is still an open question. In this work, we show how to leverage the appealing geometric properties of the Winner-takes-all learners for conditional density estimation, without modifying its original training scheme. We theoretically establish the advantages of our novel estimator both in terms of quantization and density estimation, and we demonstrate its competitiveness on synthetic and real-world datasets, including audio data. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: International Conference on Machine Learning, Jul 2024, Vienne (Autriche), Austria

arXiv:2406.02568 [pdf, other]

Optimising microstructural characterisation of white-matter phantoms: impact of gradient waveform modulation on Non-uniform Oscillating Gradient Spin-Echo sequences

Authors: Melisa L Gimenez, Pablo Jimenez, Leonardo A Pedraza Perez, Diana Betancourth, Analia Zwick, Gonzalo A Alvarez

Abstract: Changes in the nervous system due to neurological diseases take place at very small spatial scales, on the order of the micro and nanometers. Develo** non-invasive imaging methods for obtaining this microscopic information as quantitative biomarkers is therefore crucial for improved medical diagnosis. In this context, diffusion-weighted magnetic resonance imaging has shown significant advances i… ▽ More Changes in the nervous system due to neurological diseases take place at very small spatial scales, on the order of the micro and nanometers. Develo** non-invasive imaging methods for obtaining this microscopic information as quantitative biomarkers is therefore crucial for improved medical diagnosis. In this context, diffusion-weighted magnetic resonance imaging has shown significant advances in revealing tissue microstructural features by probing molecular diffusion processes. Implementing modulated gradient spin-echo sequences allows monitoring time-dependent diffusion processes to reveal such detailed information. In particular, one of those sequences termed Non-uniform Oscillating Gradient Spin-Echo (NOGSE), has shown to selectively characterise microstructure sizes by generating an image contrast based on a signal decay-shift rather than on the conventionally used signal decay rate. In this work, we prove that such decay-shift is more pronounced with instantaneous switches of the magnetic field gradient strength sign. As fast gradient ramps need to be avoided in clinical settings, due to potential patient discomfort and artefacts in imaging, we evaluate the method's efficacy for estimating microstructure sizes using both idealised, sharp gradient modulations and more realistic, smooth modulations. In this more realistic scenario, we find that the signal decay shift might be lost as the diffusion time increases, likely hindering the accurate estimation of microstructural characteristics. We demonstrate, by a combination of numerical simulations, information theory analysis and proof-of-principle experiments with white-matter phantoms, that optimal sequence design to estimate microstructure size distributions can be achieved using either sharp or smooth gradient spin-echo modulations. This approach simplifies the translation of the NOGSE method for its use in clinical settings. △ Less

Submitted 9 May, 2024; originally announced June 2024.

Comments: 18 pages, 8 figures

arXiv:2405.15508 [pdf, other]

Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments

Authors: Olivia Jullian Parra, Julián García Pardiñas, Lorenzo Del Pianta Pérez, Maximilian Janisch, Suzanne Klaver, Thomas Lehéricy, Nicola Serra

Abstract: Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating cond… ▽ More Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating conditions that change over time. We implement a prototype based on the Proximal Policy Optimization (PPO) algorithm and validate it on a simplified synthetic dataset. We demonstrate how a multi-agent system can be trained for continuous automated monitoring during data collection, with human intervention actively requested only when relevant. We show that random, unbiased noise in human classification can be reduced, leading to an improved accuracy over the baseline. Additionally, we propose data augmentation techniques to deal with scarce data and to accelerate the learning process. Finally, we discuss further steps needed to implement the approach in the real world, including protocols for periodic control of the algorithm's outputs. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.03749 [pdf, other]

Dark Matter from Anomaly Cancellation at the LHC

Authors: Jon Butterworth, Hridoy Debnath, Pavel Fileviez Perez, Yoran Yeh

Abstract: We discuss a class of theories that predict a fermionic dark matter candidate from gauge anomaly cancellation. As an explicit example, we study the predictions in theories where the global symmetry associated with baryon number is promoted to a local gauge symmetry. In this context the symmetry-breaking scale has to be below the multi-TeV scale in order to be in agreement with the cosmological con… ▽ More We discuss a class of theories that predict a fermionic dark matter candidate from gauge anomaly cancellation. As an explicit example, we study the predictions in theories where the global symmetry associated with baryon number is promoted to a local gauge symmetry. In this context the symmetry-breaking scale has to be below the multi-TeV scale in order to be in agreement with the cosmological constraints on the dark matter relic density. The new physical "Cucuyo" Higgs boson in the theory has very interesting properties, decaying mainly into two photons in the low mass region, and mainly into dark matter in the intermediate mass region. We study the most important signatures at the Large Hadron Collider, evaluating the experimental bounds. We discuss the correlation between the dark matter relic density, direct detection and collider constraints. We find that these theories are still viable, and are susceptible to being probed in current, and future high-luminosity, running. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19682 [pdf, other]

General Relativity: New insights from a Geometric Algebra approach

Authors: Pablo Banon Perez, Maarten DeKieviet

Abstract: Geometric algebra (GA) is a formalism capable of describing all fields of physics with elegance, simplifying mathematics and providing new physical insights. Due to the coordinate dependence of tensor formalism, General Relativity(GR) is a challenging subject known for its complicated calculations and interpretation, which has not been thoroughly translated into GA until now. In this paper, we i… ▽ More Geometric algebra (GA) is a formalism capable of describing all fields of physics with elegance, simplifying mathematics and providing new physical insights. Due to the coordinate dependence of tensor formalism, General Relativity(GR) is a challenging subject known for its complicated calculations and interpretation, which has not been thoroughly translated into GA until now. In this paper, we introduce GR with GA, emphasizing the physical interpretation of quantities and providing a step-by-step guide on performing calculations. In doing so, we show how GA provides insightful information on the physical meaning of the connection coefficients, the Riemann tensor and other physical quantities. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 29 pages, 4 figures

arXiv:2404.14027 [pdf, other]

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

Authors: Sophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud Marlet

Abstract: We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to th… ▽ More We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat △ Less

Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

arXiv:2403.18111 [pdf, other]

Scrolly2Reel: Retargeting Graphics for Social Media Using Narrative Beats

Authors: Duy K. Nguyen, Jenny Ma, Pedro Alejandro Perez, Lydia B. Chilton

Abstract: Content retargeting is crucial for social media creators. Once great content is created, it is important to reach as broad an audience as possible. This is particularly important in journalism where younger audiences are shifting away from print and towards short-video platforms. Many newspapers already create rich graphics for the web that they want to be able to reuse for social media. One examp… ▽ More Content retargeting is crucial for social media creators. Once great content is created, it is important to reach as broad an audience as possible. This is particularly important in journalism where younger audiences are shifting away from print and towards short-video platforms. Many newspapers already create rich graphics for the web that they want to be able to reuse for social media. One example is scrollytelling sequences or "scrollies" -- immersive articles with graphics like animation, charts, and 3D visualizations that appear as a user scrolls. We present a system that helps transform scrollies into social media videos. By using the scriptwriting concept of narrative beats to extract fundamental storytelling units, we can create videos that are more aligned with narration, and allow for better pacing and stylistic changes. Narrative beats are thus an important primitive to retargeting content that matches the style of a new medium while maintaining the cohesiveness of the original content. △ Less

Submitted 19 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.17144 [pdf, other]

doi 10.1103/PhysRevD.109.115030

Majorana Neutrinos and Dark Matter from Anomaly Cancellation

Authors: Hridoy Debnath, Pavel Fileviez Perez, Kevin Gonzalez-Quesada

Abstract: We discuss a simple theory for neutrino masses where the total lepton number is a local gauge symmetry spontaneously broken below the multi-TeV scale. In this context, the neutrino masses are generated through the canonical seesaw mechanism and a Majorana dark matter candidate is predicted from anomaly cancellation. We discuss in great detail the dark matter annihilation channels and find out the… ▽ More We discuss a simple theory for neutrino masses where the total lepton number is a local gauge symmetry spontaneously broken below the multi-TeV scale. In this context, the neutrino masses are generated through the canonical seesaw mechanism and a Majorana dark matter candidate is predicted from anomaly cancellation. We discuss in great detail the dark matter annihilation channels and find out the upper bound on the symmetry breaking scale using the cosmological bounds on the relic density. Since in this context the dark matter candidate has suppressed couplings to the Standard Model quarks, one can satisfy the direct detection bounds even if the dark matter mass is close to the electroweak scale. This theory predicts a light pseudo-Nambu-Goldstone boson (the Majoron) associated to the mechanism of neutrino mass. We discuss briefly the properties of the Majoron and the impact of the Big Bang Nucleosynthesis bounds. △ Less

Submitted 6 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: v2: a few corrections, new appendix, conclusions unchanged, version to appear in Physical Review D

Journal ref: Physical Review D 109, 115030 (2024)

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2402.16680 [pdf, other]

Friedmann-Robertson-Walker spacetimes from the perspective of geometric algebra

Authors: Pablo Banon Perez, Bjoern Malte Schaefer, Maarten DeKieviet

Abstract: The intention of our paper is to provide a pedagogical application of geometric algebra to a particularly well-investigated system: We formulate the geometric and dynamical properties of Friedmann-Robertson-Walker spacetimes within the language of geometric algebra and re-derive the Friedmann-equations as the central cosmological equations. Through the geometric algebra-variant of the Raychaudhuri… ▽ More The intention of our paper is to provide a pedagogical application of geometric algebra to a particularly well-investigated system: We formulate the geometric and dynamical properties of Friedmann-Robertson-Walker spacetimes within the language of geometric algebra and re-derive the Friedmann-equations as the central cosmological equations. Through the geometric algebra-variant of the Raychaudhuri equations, we comment on the evolution of spacetime volumes, before illustrating conformal flatness as a central property of Friedmann-cosmologies. An important aspect of spacetime symmetries are the associated conservation laws, for which we provide a geometric algebra formulation of the Lie-derivatives, of the Killing equation and of conserved quantities in Friedmann-Robertson-Walker spacetimes. Finally, we discuss the gravitational dynamics of scalar fields, with their particular relevance in cosmology, for cosmic inflation, and for dark energy. △ Less

Submitted 7 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 31 pages, 4 figures

arXiv:2402.13486 [pdf, other]

The 10 antipodal pairings of strongly involutive polyhedra

Authors: Javier Bracho, Eric Paulí Pérez, Luis Montejano, Jorge Luis Ramírez-Alfonsín

Abstract: It is known that strongly involutive polyhedra are closely related to self-dual maps where the antipodal function acts as duality isomorphism. Such a family of polyhedra appears in different combinatorial, topological and geometric contexts, and is thus attractive to be studied. In this note, we determine the 10 antipodal pairings among the classification of the 24 self-dual pairings… ▽ More It is known that strongly involutive polyhedra are closely related to self-dual maps where the antipodal function acts as duality isomorphism. Such a family of polyhedra appears in different combinatorial, topological and geometric contexts, and is thus attractive to be studied. In this note, we determine the 10 antipodal pairings among the classification of the 24 self-dual pairings $Dual(G)\rhd Aut(G)$ of self-dual maps G. We also present the orbifold associated to each antipodal pairing and describe explicitly the corresponding fundamental regions. We finally explain how to construct two infinite families of strongly involutive polyhedra (one of them new) by using their doodles and the action of the corresponding orbifolds. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 16 pages, 21 figures, 1 table

arXiv:2402.13262 [pdf, other]

Development of crystal optics for Multi-Projection X-ray Imaging for synchrotron and XFEL sources

Authors: Valerio Bellucci, Sarlota Birnsteinova, Tokushi Sato, Romain Letrun, Jayanath C. P. Koliyadu, Chan Kim, Gabriele Giovanetti, Carsten Deiter, Liubov Samoylova, Ilia Petrov, Luis Lopez Morillo, Rita Graceffa, Luigi Adriano, Helge Huelsen, Heiko Kollmann, Thu Nhi Tran Calliste, Dusan Korytar, Zdenko Zaprazny, Andrea Mazzolari, Marco Romagnoni, Eleni Myrto Asimakopoulou, Zisheng Yao, Yuhe Zhang, Jozef Ulicny, Alke Meents , et al. (5 additional authors not shown)

Abstract: X-ray Multi-Projection Imaging (XMPI) is an emerging technology that allows for the acquisition of millions of 3D images per second in samples opaque to visible light. This breakthrough capability enables volumetric observation of fast stochastic phenomena, which were inaccessible due to the lack of a volumetric X-ray imaging probe with kHz to MHz repetition rate. These include phenomena of indust… ▽ More X-ray Multi-Projection Imaging (XMPI) is an emerging technology that allows for the acquisition of millions of 3D images per second in samples opaque to visible light. This breakthrough capability enables volumetric observation of fast stochastic phenomena, which were inaccessible due to the lack of a volumetric X-ray imaging probe with kHz to MHz repetition rate. These include phenomena of industrial and societal relevance such as fractures in solids, propagation of shock waves, laser-based 3D printing, or even fast processes in the biological domain. Indeed, the speed of traditional tomography is limited by the shear forces caused by rotation, to a maximum of 1000 Hz in state-of-the-art tomography. Moreover, the shear forces can disturb the phenomena in observation, in particular with soft samples or sensitive phenomena such as fluid dynamics. XMPI is based on splitting an X-ray beam to generate multiple simultaneous views of the sample, therefore eliminating the need for rotation. The achievable performances depend on the characteristics of the X-ray source, the detection system, and the X-ray optics used to generate the multiple views. The increase in power density of the X-ray sources around the world now enables 3D imaging with sampling speeds in the kilohertz range at synchrotrons and megahertz range at X-ray Free-Electron Lasers (XFELs). Fast detection systems are already available, and 2D MHz imaging was already demonstrated at synchrotron and XFEL. In this work, we explore the properties of X-ray splitter optics and XMPI schemes that are compatible with synchrotron insertion devices and XFEL X-ray beams. We describe two possible schemes designed to permit large samples and complex sample environments. Then, we present experimental proof of the feasibility of MHz-rate XMPI at the European XFEL. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 47 pages, 17 figures

arXiv:2402.12082 [pdf, other]

X-ray multibeam ptychography at up to 20 keV: nano-lithography enhances X-ray nano-imaging

Authors: Tang Li, Maik Kahnt, Thomas L. Sheppard, Runqing Yang, Ken Vidar Falch, Roman Zvagelsky, Pablo Villanueva-Perez, Martin Wegener, Mikhail Lyubomirskiy

Abstract: Non-destructive nano-imaging of the internal structure of solid matter is only feasible using hard X-rays due to their high penetration. The highest resolution images are achieved at synchrotron radiation sources (SRF), offering superior spectral brightness and enabling methods such as X-ray ptychography delivering single-digit nm resolution. However the resolution or field of view is ultimately c… ▽ More Non-destructive nano-imaging of the internal structure of solid matter is only feasible using hard X-rays due to their high penetration. The highest resolution images are achieved at synchrotron radiation sources (SRF), offering superior spectral brightness and enabling methods such as X-ray ptychography delivering single-digit nm resolution. However the resolution or field of view is ultimately constrained by the available coherent flux. To address this, the beam's incoherent fraction can be exploited using multiple parallel beams in an approach known as X-ray multibeam ptychography (MBP). This expands the domain of X-ray ptychography to larger samples or more rapid measurements. Both qualities favor the study of complex composite or functional samples, such as catalysts, energy materials, or electronic devices. The challenges of performing ptychography at high energy and with many parallel beams must be overcome to extract the full advantages for extended samples while minimizing beam attenuation. Here, we report the application of MBP with up to 12 beams and at photon energies of 13 and 20 keV. We demonstrate performance for various samples: a Siemens star test pattern, a porous Ni/\ce{Al2O3} catalyst, a microchip, and gold nano-crystal clusters, exceeding the measurement limits of conventional hard X-ray ptychography without compromising image quality. △ Less

Submitted 20 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.05820 [pdf, other]

doi 10.1109/TNSM.2020.2980752

XLR (piXel Loss Rate): a Lightweight Indicator to Measure Video QoE in IP Networks

Authors: César Díaz, Pablo Pérez, Julián Cabrera, Jaime Ruiz, Narciso García

Abstract: A novel Key Quality Indicator for video delivery applications, XLR (piXel Loss Rate), is defined, characterized, and evaluated. The proposed indicator is an objective measure that captures the effects of transmission errors in the received video, has a good correlation with subjective Mean Opinion Scores, and provides comparable results with state-of-the-art Full-Reference metrics. Moreover, XLR c… ▽ More A novel Key Quality Indicator for video delivery applications, XLR (piXel Loss Rate), is defined, characterized, and evaluated. The proposed indicator is an objective measure that captures the effects of transmission errors in the received video, has a good correlation with subjective Mean Opinion Scores, and provides comparable results with state-of-the-art Full-Reference metrics. Moreover, XLR can be estimated using only a lightweight analysis on the compressed bitstream, thus allowing a No-Reference operational method. Therefore, XLR can be used for measuring the quality of experience without latency at any network location. Thus, it is a relevant tool for network planning, specially in new high-demanding scenarios. The experiments carried out show the outstanding performance of its linear-dimension score and the reliability of the bitstream-based estimation. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Journal ref: IEEE Transactions on Network and Service Management, vol. 17, no. 2, pp. 1096-1109, Jun. 2020

arXiv:2402.01568 [pdf, other]

Do** Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar Es-sghir, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1300 additional authors not shown)

Abstract: Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN… ▽ More Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 770 t of total liquid argon mass with 410 t of fiducial mass. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon do** can substantially recover light losses due to contamination of the liquid argon by nitrogen. △ Less

Submitted 9 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 20 figures

Report number: CERN-EP-2024-024; FERMILAB-PUB-23-0819-LBNF

arXiv:2401.16434 [pdf]

doi 10.1016/j.egyr.2023.01.039

A novel ANROA based control approach for grid-tied multi-functional solar energy conversion system

Authors: Dinanath Prasad, Narendra Kumar, Rakhi Sharma, Hasmat Malik, Fausto Pedro García Márquez, Jesús María Pinar Pérez

Abstract: An adaptive control approach for a three-phase grid-interfaced solar photovoltaic system based on the new Neuro-Fuzzy Inference System with Rain Optimization Algorithm (ANROA) methodology is proposed and discussed in this manuscript. This method incorporates an Adaptive Neuro-fuzzy Inference System (ANFIS) with a Rain Optimization Algorithm (ROA). The ANFIS controller has excellent maximum trackin… ▽ More An adaptive control approach for a three-phase grid-interfaced solar photovoltaic system based on the new Neuro-Fuzzy Inference System with Rain Optimization Algorithm (ANROA) methodology is proposed and discussed in this manuscript. This method incorporates an Adaptive Neuro-fuzzy Inference System (ANFIS) with a Rain Optimization Algorithm (ROA). The ANFIS controller has excellent maximum tracking capability because it includes features of both neural and fuzzy techniques. The ROA technique is in charge of controlling the voltage source converter switching. Avoiding power quality problems including voltage fluctuations, harmonics, and flickers as well as unbalanced loads and reactive power usage is the major goal. Besides, the proposed method performs at zero voltage regulation and unity power factor modes. The suggested control approach has been modeled and simulated, and its performance has been assessed using existing alternative methods. A statistical analysis of proposed and existing techniques has been also presented and discussed. The results of the simulations demonstrate that, when compared to alternative approaches, the suggested strategy may properly and effectively identify the best global solutions. Furthermore, the system's robustness has been studied by using MATLAB/SIMULINK environment and experimentally by Field Programmable Gate Arrays Controller (FPGA)-based Hardware-in-Loop (HLL). △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: The paper was published in Energy Reports journal (ELSEVIER). Cite as: Prasad, D., Kumar, N., Sharma, R., Malik, H., Márquez, F. P. G., & Pinar-Pérez, J. M. (2023). A novel ANROA based control approach for grid-tied multi-functional solar energy conversion system. Energy Reports, 9, 2044-2057

Journal ref: Energy Reports (2023) Elsevier

arXiv:2401.09508 [pdf, other]

4D-ONIX: A deep learning approach for reconstructing 3D movies from sparse X-ray projections

Authors: Yuhe Zhang, Zisheng Yao, Robert Klöfkorn, Tobias Ritschel, Pablo Villanueva-Perez

Abstract: The X-ray flux provided by X-ray free-electron lasers and storage rings offers new spatiotemporal possibilities to study in-situ and operando dynamics, even using single pulses of such facilities. X-ray Multi-Projection Imaging (XMPI) is a novel technique that enables volumetric information using single pulses of such facilities and avoids centrifugal forces induced by state-of-the-art time-resolv… ▽ More The X-ray flux provided by X-ray free-electron lasers and storage rings offers new spatiotemporal possibilities to study in-situ and operando dynamics, even using single pulses of such facilities. X-ray Multi-Projection Imaging (XMPI) is a novel technique that enables volumetric information using single pulses of such facilities and avoids centrifugal forces induced by state-of-the-art time-resolved 3D methods such as time-resolved tomography. As a result, XMPI can acquire 3D movies (4D) at least three orders of magnitude faster than current methods. However, it is exceptionally challenging to reconstruct 4D from highly sparse projections as acquired by XMPI with current algorithms. Here, we present 4D-ONIX, a Deep Learning (DL)-based approach that learns to reconstruct 3D movies (4D) from an extremely limited number of projections. It combines the computational physical model of X-ray interaction with matter and state-of-the-art DL methods. We demonstrate the potential of 4D-ONIX to generate high-quality 4D by generalizing over multiple experiments with only two projections per timestamp for binary droplet collisions. We envision that 4D-ONIX will become an enabling tool for 4D analysis, offering new spatiotemporal resolutions to study processes not possible before. △ Less

Submitted 2 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.09413 [pdf, other]

POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

Abstract: We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of… ▽ More We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of this work are three-fold. First, we design a new model architecture for open-vocabulary 3D semantic occupancy prediction. The architecture consists of a 2D-3D encoder together with occupancy prediction and 3D-language heads. The output is a dense voxel map of 3D grounded language embeddings enabling a range of open-vocabulary tasks. Second, we develop a tri-modal self-supervised learning algorithm that leverages three modalities: (i) images, (ii) language and (iii) LiDAR point clouds, and enables training the proposed architecture using a strong pre-trained vision-language model without the need for any 3D manual language annotations. Finally, we demonstrate quantitatively the strengths of the proposed model on several open-vocabulary tasks: Zero-shot 3D semantic segmentation using existing datasets; 3D grounding and retrieval of free-form language queries, using a small dataset that we propose as an extension of nuScenes. You can find the project page here https://vobecant.github.io/POP3D. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: accepted to NeurIPS 2023

arXiv:2401.08251 [pdf]

doi 10.1016/j.rser.2022.112753

A techno-economic model for avoiding conflicts of interest between owners of offshore wind farms and maintenance suppliers

Authors: Alberto Pliego Marugán, Fausto Pedro García Márquez, Jesús María Pinar Pérez

Abstract: Currently, wind energy is one of the most important sources of renewable energy. Offshore locations for wind turbines are increasingly exploited because of their numerous advantages. However, offshore wind farms require high investment in maintenance service. Due to its complexity and special requirements, maintenance service is usually outsourced by wind farm owners. In this paper, we propose a n… ▽ More Currently, wind energy is one of the most important sources of renewable energy. Offshore locations for wind turbines are increasingly exploited because of their numerous advantages. However, offshore wind farms require high investment in maintenance service. Due to its complexity and special requirements, maintenance service is usually outsourced by wind farm owners. In this paper, we propose a novel approach to determine, quantify, and reduce the possible conflicts of interest between owners and maintenance suppliers. We created a complete techno-economic model to address this problem from an impartial point of view. An iterative process was developed to obtain statistical results that can help stakeholders negotiate the terms of the contract, in which the availability of the wind farm is the reference parameter by which to determine penalisations and incentives. Moreover, a multi-objective programming problem was addressed that maximises the profits of both parties without losing the alignment of their interests. The main scientific contribution of this paper is the maintenance analysis of offshore wind farms from two perspectives: that of the owner and the maintenance supplier. This analysis evaluates the conflicts of interest of both parties. In addition, we demonstrate that proper adjustment of some parameters, such as penalisation, incentives, and resources, and adequate control of availability can help reduce this conflict of interests. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Published in Renewable and Sustainable Energy Reviews (ELSEVIER) 10 July 2022. DOI: https://doi.org/10.1016/j.rser.2022.112753 Cite as: Marugán, A. P., Márquez, F. P. G., & Pérez, J. M. P. (2022). A techno-economic model for avoiding conflicts of interest between owners of offshore wind farms and maintenance suppliers. Renewable and Sustainable Energy Reviews, 168, 112753

arXiv:2401.06019 [pdf, other]

doi 10.1117/12.2679734

Automatic UAV-based Airport Pavement Inspection Using Mixed Real and Virtual Scenarios

Authors: Pablo Alonso, Jon Ander Iñiguez de Gordoa, Juan Diego Ortega, Sara García, Francisco Javier Iriarte, Marcos Nieto

Abstract: Runway and taxiway pavements are exposed to high stress during their projected lifetime, which inevitably leads to a decrease in their condition over time. To make sure airport pavement condition ensure uninterrupted and resilient operations, it is of utmost importance to monitor their condition and conduct regular inspections. UAV-based inspection is recently gaining importance due to its wide ra… ▽ More Runway and taxiway pavements are exposed to high stress during their projected lifetime, which inevitably leads to a decrease in their condition over time. To make sure airport pavement condition ensure uninterrupted and resilient operations, it is of utmost importance to monitor their condition and conduct regular inspections. UAV-based inspection is recently gaining importance due to its wide range monitoring capabilities and reduced cost. In this work, we propose a vision-based approach to automatically identify pavement distress using images captured by UAVs. The proposed method is based on Deep Learning (DL) to segment defects in the image. The DL architecture leverages the low computational capacities of embedded systems in UAVs by using an optimised implementation of EfficientNet feature extraction and Feature Pyramid Network segmentation. To deal with the lack of annotated data for training we have developed a synthetic dataset generation methodology to extend available distress datasets. We demonstrate that the use of a mixed dataset composed of synthetic and real training images yields better results when testing the training models in real application scenarios. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 12 pages, 6 figures, published in proceedings of 15th International Conference on Machine Vision (ICMV)

Journal ref: Proc. SPIE 12701, Fifteenth International Conference on Machine Vision (ICMV 2022), 1270118

arXiv:2312.13863 [pdf, other]

Manipulating Trajectory Prediction with Backdoors

Authors: Kaouther Messaoud, Kathrin Grosse, Mickael Chen, Matthieu Cord, Patrick Pérez, Alexandre Alahi

Abstract: Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a relevant concern. In this paper, we focus on backdoors - a security threat acknowledged in other fields but so far overlooked for trajectory prediction. To this end,… ▽ More Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a relevant concern. In this paper, we focus on backdoors - a security threat acknowledged in other fields but so far overlooked for trajectory prediction. To this end, we describe and investigate four triggers that could affect trajectory prediction. We then show that these triggers (for example, a braking vehicle), when correlated with a desired output (for example, a curve) during training, cause the desired output of a state-of-the-art trajectory prediction model. In other words, the model has good benign performance but is vulnerable to backdoors. This is the case even if the trigger maneuver is performed by a non-casual agent behind the target vehicle. As a side-effect, our analysis reveals interesting limitations within trajectory prediction models. Finally, we evaluate a range of defenses against backdoors. While some, like simple offroad checks, do not enable detection for all triggers, clustering is a promising candidate to support manual inspection to find backdoors. △ Less

Submitted 3 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 9 pages, 7 figures

arXiv:2312.12487 [pdf, other]

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

Abstract: This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search fr… ▽ More This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12359 [pdf, other]

CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation

Authors: Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez

Abstract: The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless interaction with arbitrary text prompts. However, its lack of spatial awareness makes it unsuitable for dense computer vision tasks, e.g., semantic segmentation, without an additional fine-tuning step that often uses annotations and can potentially suppress its original open-vocabulary properties. Meanwhile, s… ▽ More The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless interaction with arbitrary text prompts. However, its lack of spatial awareness makes it unsuitable for dense computer vision tasks, e.g., semantic segmentation, without an additional fine-tuning step that often uses annotations and can potentially suppress its original open-vocabulary properties. Meanwhile, self-supervised representation methods have demonstrated good localization properties without human-made annotations nor explicit supervision. In this work, we take the best of both worlds and propose an open-vocabulary semantic segmentation method, which does not require any annotations. We propose to locally improve dense MaskCLIP features, which are computed with a simple modification of CLIP's last pooling layer, by integrating localization priors extracted from self-supervised features. By doing so, we greatly improve the performance of MaskCLIP and produce smooth outputs. Moreover, we show that the used self-supervised feature properties can directly be learnt from CLIP features. Our method CLIP-DINOiser needs only a single forward pass of CLIP and two light convolutional layers at inference, no extra supervision nor extra memory and reaches state-of-the-art results on challenging and fine-grained benchmarks such as COCO, Pascal Context, Cityscapes and ADE20k. The code to reproduce our results is available at https://github.com/wysoczanska/clip_dinoiser. △ Less

Submitted 27 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.09231 [pdf, other]

Reliability in Semantic Segmentation: Can We Use Synthetic Data?

Authors: Thibaut Loiseau, Tuan-Hung Vu, Mickael Chen, Patrick Pérez, Matthieu Cord

Abstract: Assessing the reliability of perception models to covariate shifts and out-of-distribution (OOD) detection is crucial for safety-critical applications such as autonomous vehicles. By nature of the task, however, the relevant data is difficult to collect and annotate. In this paper, we challenge cutting-edge generative models to automatically synthesize data for assessing reliability in semantic se… ▽ More Assessing the reliability of perception models to covariate shifts and out-of-distribution (OOD) detection is crucial for safety-critical applications such as autonomous vehicles. By nature of the task, however, the relevant data is difficult to collect and annotate. In this paper, we challenge cutting-edge generative models to automatically synthesize data for assessing reliability in semantic segmentation. By fine-tuning Stable Diffusion, we perform zero-shot generation of synthetic data in OOD domains or inpainted with OOD objects. Synthetic data is employed to provide an initial assessment of pretrained segmenters, thereby offering insights into their performance when confronted with real edge cases. Through extensive experiments, we demonstrate a high correlation between the performance on synthetic data and the performance on real OOD data, showing the validity approach. Furthermore, we illustrate how synthetic data can be utilized to enhance the calibration and OOD detection capabilities of segmenters. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Project Page: https://valeoai.github.io/blog/publications/GenVal

arXiv:2312.08879 [pdf, other]

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

Authors: Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomas Svoboda

Abstract: Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of ``rigid clusters" in the input point clouds. The definition of t… ▽ More Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of ``rigid clusters" in the input point clouds. The definition of these clusters is challenging and has a significant impact on the quality of predicted flows. We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. In particular, we enforce \emph{temporal} consistency with a forward-backward cyclic loss and \emph{spatial} consistency by considering surface orientation similarity in addition to spatial proximity. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models, as demonstrated on two most widely used architectures. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets, achieving state-of-the-art performance in 3D scene flow estimation. Our codes are available on https://github.com/ctu-vras/sac-flow. △ Less

Submitted 26 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06386 [pdf, other]

ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation

Authors: Cédric Rommel, Victor Letzelter, Nermin Samet, Renaud Marlet, Matthieu Cord, Patrick Pérez, Eduardo Valle

Abstract: Monocular 3D human pose estimation (3D-HPE) is an inherently ambiguous task, as a 2D pose in an image might originate from different possible 3D poses. Yet, most 3D-HPE methods rely on regression models, which assume a one-to-one map** between inputs and outputs. In this work, we provide theoretical and empirical evidence that, because of this ambiguity, common regression models are bound to pre… ▽ More Monocular 3D human pose estimation (3D-HPE) is an inherently ambiguous task, as a 2D pose in an image might originate from different possible 3D poses. Yet, most 3D-HPE methods rely on regression models, which assume a one-to-one map** between inputs and outputs. In this work, we provide theoretical and empirical evidence that, because of this ambiguity, common regression models are bound to predict topologically inconsistent poses, and that traditional evaluation metrics, such as the MPJPE, P-MPJPE and PCK, are insufficient to assess this aspect. As a solution, we propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input, together with their corresponding plausibility. Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models, thus greatly facilitating its training and usage. Furthermore, by constraining our model to lie within the human pose manifold, we can guarantee the consistency of all hypothetical poses predicted with our approach, which was not possible in previous works. We illustrate the usefulness of ManiPose in a synthetic 1D-to-2D lifting setting and demonstrate on real-world datasets that it outperforms state-of-the-art models in pose consistency by a large margin, while still reaching competitive MPJPE performance. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03130 [pdf, other]

The DUNE Far Detector Vertical Drift Technology, Technical Design Report

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1304 additional authors not shown)

Abstract: DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precisi… ▽ More DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precision measurements of the PMNS matrix parameters, including the CP-violating phase. It will also stand ready to observe supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model. The DUNE far detector implements liquid argon time-projection chamber (LArTPC) technology, and combines the many tens-of-kiloton fiducial mass necessary for rare event searches with the sub-centimeter spatial resolution required to image those events with high precision. The addition of a photon detection system enhances physics capabilities for all DUNE physics drivers and opens prospects for further physics explorations. Given its size, the far detector will be implemented as a set of modules, with LArTPC designs that differ from one another as newer technologies arise. In the vertical drift LArTPC design, a horizontal cathode bisects the detector, creating two stacked drift volumes in which ionization charges drift towards anodes at either the top or bottom. The anodes are composed of perforated PCB layers with conductive strips, enabling reconstruction in 3D. Light-trap-style photon detection modules are placed both on the cryostat's side walls and on the central cathode where they are optically powered. This Technical Design Report describes in detail the technical implementations of each subsystem of this LArTPC that, together with the other far detector modules and the near detector, will enable DUNE to achieve its physics goals. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 425 pages; 281 figures Central editing team: A. Heavey, S. Kettell, A. Marchionni, S. Palestini, S. Rajogopalan, R. J. Wilson

Report number: Fermilab Report no: TM-2813-LBNF

arXiv:2312.00703 [pdf, other]

PointBeV: A Sparse Approach to BeV Predictions

Authors: Loick Chambon, Eloi Zablocki, Mickael Chen, Florent Bartoccioni, Patrick Perez, Matthieu Cord

Abstract: Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV… ▽ More Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV, a novel sparse BeV segmentation model operating on sparse BeV cells instead of dense grids. This approach offers precise control over memory usage, enabling the use of long temporal contexts and accommodating memory-constrained platforms. PointBeV employs an efficient two-pass strategy for training, enabling focused computation on regions of interest. At inference time, it can be used with various memory/performance trade-offs and flexibly adjusts to new specific use cases. PointBeV achieves state-of-the-art results on the nuScenes dataset for vehicle, pedestrian, and lane segmentation, showcasing superior performance in static and temporal settings despite being trained solely with sparse signals. We will release our code along with two new efficient modules used in the architecture: Sparse Feature Pulling, designed for the effective extraction of features from images to BeV, and Submanifold Attention, which enables efficient temporal modeling. Our code is available at https://github.com/valeoai/PointBeV. △ Less

Submitted 23 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: https://github.com/valeoai/PointBeV

arXiv:2311.17922 [pdf, other]

A Simple Recipe for Language-guided Domain Generalized Segmentation

Authors: Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

Abstract: Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization c… ▽ More Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization capabilities, along with the potential of binding different modalities. For instance, the advent of vision-language models like CLIP has opened the doorway for vision models to exploit the textual modality. In this paper, we introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization. Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning, (ii) language-driven local style augmentation, and (iii) randomization by locally mixing the source and augmented styles during training. Extensive experiments report state-of-the-art results on various generalization benchmarks. Code is accessible at https://github.com/astra-vision/FAMix . △ Less

Submitted 2 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: CVPR 2024

arXiv:2311.16149 [pdf, other]

doi 10.1364/OE.510800

Development towards high-resolution kHz-speed rotation-free volumetric imaging

Authors: Eleni Myrto Asimakopoulou, Valerio Bellucci, Sarlota Birnsteinova, Zisheng Yao, Yuhe Zhang, Ilia Petrov, Carsten Deiter, Andrea Mazzolari, Marco Romagnoni, Dusan Korytar, Zdenko Zaprazny, Zuzana Kuglerova, Libor Juha, Bratislav Lukic, Alexander Rack, Liubov Samoylova, Francisco Garcia Moreno, Stephen A Hall, Tillmann Neu, Xiaoyu Liang, Patrik Vagovic, Pablo Villanueva-Perez

Abstract: X-ray multi-projection imaging (XMPI) provides rotation-free 3D movies of optically opaque samples. The absence of rotation enables superior imaging speed and preserves fragile sample dynamics by avoiding the shear forces introduced by conventional rotary tomography. Here, we present our XMPI observations at the ID19 beamline (ESRF, France) of 3D dynamics in melted aluminum with 1000 frames per se… ▽ More X-ray multi-projection imaging (XMPI) provides rotation-free 3D movies of optically opaque samples. The absence of rotation enables superior imaging speed and preserves fragile sample dynamics by avoiding the shear forces introduced by conventional rotary tomography. Here, we present our XMPI observations at the ID19 beamline (ESRF, France) of 3D dynamics in melted aluminum with 1000 frames per second and 8 $μ$m resolution per projection using the full dynamical range of our detectors. Since XMPI is a method under development, we also provide different tests for the instrumentation of up to 3000 frames per second. As the flux of X-ray sources grows globally, XMPI is a promising technique for current and future X-ray imaging instruments. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 12 pages, 7 figures

Journal ref: Opt. Express 32, (2024), 4413-4426

arXiv:2311.16036 [pdf, other]

A Tunable Transition Metal Dichalcogenide Entangled Photon-Pair Source

Authors: Maximilian A. Weissflog, Anna Fedotova, Yilin Tang, Elkin A. Santos, Benjamin Laudert, Saniya Shinde, Fatemeh Abtahi, Mina Afsharnia, Inmaculada Pérez Pérez, Sebastian Ritter, Hao Qin, Jiri Janousek, Sai Shradha, Isabelle Staude, Sina Saravi, Thomas Pertsch, Frank Setzpfandt, Yuerui Lu, Falk Eilenberger

Abstract: Entangled photon-pair sources are at the core of quantum applications like quantum key distribution, sensing, and imaging. Operation in space-limited and adverse environments such as in satellite-based and mobile communication requires robust entanglement sources with minimal size and weight requirements. Here, we meet this challenge by realizing a cubic micrometer scale entangled photon-pair sour… ▽ More Entangled photon-pair sources are at the core of quantum applications like quantum key distribution, sensing, and imaging. Operation in space-limited and adverse environments such as in satellite-based and mobile communication requires robust entanglement sources with minimal size and weight requirements. Here, we meet this challenge by realizing a cubic micrometer scale entangled photon-pair source in a 3R-stacked transition metal dichalcogenide crystal. Its crystal symmetry enables the generation of polarization-entangled Bell states without additional components and provides tunability by simple control of the pump polarization. Remarkably, generation rate and state tuning are decoupled, leading to equal generation efficiency and no loss of entanglement. Combining transition metal dichalcogenides with monolithic cavities and integrated photonic circuitry or using quasi-phasematching opens the gate towards ultrasmall and scalable quantum devices. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.14542 [pdf, other]

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny

Abstract: Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stage… ▽ More Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.07229 [pdf, other]

Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms

Authors: Linus W. Dietz, Pablo Sánchez, Alejandro Bellogín

Abstract: The performance of recommendation algorithms is closely tied to key characteristics of the data sets they use, such as sparsity, popularity bias, and preference distributions. In this paper, we conduct a comprehensive explanatory analysis to shed light on the impact of a broad range of data characteristics within the point-of-interest (POI) recommendation domain. To accomplish this, we extend prio… ▽ More The performance of recommendation algorithms is closely tied to key characteristics of the data sets they use, such as sparsity, popularity bias, and preference distributions. In this paper, we conduct a comprehensive explanatory analysis to shed light on the impact of a broad range of data characteristics within the point-of-interest (POI) recommendation domain. To accomplish this, we extend prior methodologies used to characterize traditional recommendation problems by introducing new explanatory variables specifically relevant to POI recommendation. We subdivide a POI recommendation data set on New York City into domain-driven subsamples to measure the effect of varying these characteristics on different state-of-the-art POI recommendation algorithms in terms of accuracy, novelty, and item exposure. Our findings, obtained through the application of an explanatory framework employing multiple-regression models, reveal that the relevant independent variables encompass all categories of data characteristics and account for as much as $R^2 = $ 85-90\% of the accuracy and item exposure achieved by the algorithms. Our study reaffirms the pivotal role of prominent data characteristics, such as density, popularity bias, and the distribution of check-ins in POI recommendation. Additionally, we unveil novel factors, such as the proximity of user activity to the city center and the duration of user activity. In summary, our work reveals why certain POI recommendation algorithms excel in specific recommendation problems and, conversely, offers practical insights into which data characteristics should be modified (or explicitly recognized) to achieve better performance. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.01052 [pdf, other]

Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

Authors: Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard

Abstract: We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existi… ▽ More We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization problem, demonstrating its practical usefulness and the relevance of its interpretation. △ Less

Submitted 16 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Journal ref: Advances in neural information processing systems, Dec 2023, New Orleans, United States

arXiv:2310.17504 [pdf, other]

Three Pillars improving Vision Foundation Model Distillation for Lidar

Authors: Gilles Puy, Spyros Gidaris, Alexandre Boulch, Oriane Siméoni, Corentin Sautier, Patrick Pérez, Andrei Bursuc, Renaud Marlet

Abstract: Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show prom… ▽ More Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show promising results, obtained thanks to distillation methods that keep improving. Yet, we still notice a large performance gap when measuring the quality of distilled and fully supervised features by linear probing. In this work, instead of focusing only on the distillation method, we study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset. In particular, thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality. This allows us to significantly reduce the gap between the quality of distilled and fully-supervised 3D features, and to improve the robustness of the pretrained backbones to domain gaps and perturbations. △ Less

Submitted 19 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: The code is available at https://github.com/valeoai/ScaLR

arXiv:2310.12904 [pdf, other]

Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

Authors: Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, Patrick Pérez

Abstract: The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about the… ▽ More The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.07173 [pdf]

Unleashing quantum algorithms with Qinterpreter: bridging the gap between theory and practice across leading quantum computing platforms

Authors: Wilmer Contreras Sepúlveda, Ángel David Torres-Palencia, José Javier Sánchez Mondragón, Braulio Misael Villegas-Martínez, J. Jesús Escobedo-Alatorre, Sandra Gesing, Néstor Lozano-Crisóstomo, Julio César García-Melgarejo, Juan Carlos Sánchez Pérez, Eddie Nelson Palacios- Pérez, Omar PalilleroSandoval

Abstract: Quantum computing is a rapidly emerging and promising field that has the potential to revolutionize numerous research domains, including drug design, network technologies and sustainable energy. Due to the inherent complexity and divergence from classical computing, several major quantum computing libraries have been developed to implement quantum algorithms, namely IBM Qiskit, Amazon Braket, Cirq… ▽ More Quantum computing is a rapidly emerging and promising field that has the potential to revolutionize numerous research domains, including drug design, network technologies and sustainable energy. Due to the inherent complexity and divergence from classical computing, several major quantum computing libraries have been developed to implement quantum algorithms, namely IBM Qiskit, Amazon Braket, Cirq, PyQuil, and PennyLane. These libraries allow for quantum simulations on classical computers and facilitate program execution on corresponding quantum hardware, e.g., Qiskit programs on IBM quantum computers. While all platforms have some differences, the main concepts are the same. QInterpreter is a tool embedded in the Quantum Science Gateway QubitHub using Jupyter Notebooks that translates seamlessly programs from one library to the other and visualizes the results. It combines the five well-known quantum libraries: into a unified framework. Designed as an educational tool for beginners, Qinterpreter enables the development and execution of quantum circuits across various platforms in a straightforward way. The work highlights the versatility and accessibility of Qinterpreter in quantum programming and underscores our ultimate goal of pervading Quantum Computing through younger, less specialized, and diverse cultural and national communities. △ Less

Submitted 13 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.17224 [pdf, other]

Training and inference of large language models using 8-bit floating point

Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

Abstract: FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect h… ▽ More FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect has yet to be addressed for FP8. This paper presents a methodology to select the scalings for FP8 linear layers, based on dynamically updating per-tensor scales for the weights, gradients and activations. We apply this methodology to train and validate large language models of the type of GPT and Llama 2 using FP8, for model sizes ranging from 111M to 70B. To facilitate the understanding of the FP8 dynamics, our results are accompanied by plots of the per-tensor scale distribution for weights, activations and gradients during both training and inference. △ Less

Submitted 29 September, 2023; originally announced September 2023.

ACM Class: I.2.7; B.2.4

arXiv:2309.16670 [pdf, other]

Decaf: Monocular Deformation Capture for Face and Hand Interactions

Authors: Soshi Shimada, Vladislav Golyanik, Patrick Pérez, Christian Theobalt

Abstract: Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects. Modelling dense non-rigid object deformations in this setting remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR and avatar communications. This is due to the severe ill-posedness of the monocular view setting… ▽ More Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects. Modelling dense non-rigid object deformations in this setting remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR and avatar communications. This is due to the severe ill-posedness of the monocular view setting and the associated challenges. While it is possible to naively track multiple non-rigid objects independently using 3D templates or parametric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimates such as depth ambiguity, unnatural intra-object collisions and missing or implausible deformations. Hence, this paper introduces the first method that addresses the fundamental challenges depicted above and that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system. As a pivotal step in its creation, we process the reconstructed raw 3D shapes with position-based dynamics and an approach for non-uniform stiffness estimation of the head tissues, which results in plausible annotations of the surface deformations, hand-face contact regions and head-hand positions. At the core of our neural approach are a variational auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estimating the contacts and the deformations. Our final 3D hand and face reconstructions are realistic and more plausible compared to several baselines applicable in our setting, both quantitatively and qualitatively. https://vcai.mpi-inf.mpg.de/projects/Decaf △ Less

Submitted 13 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.10027 [pdf, other]

doi 10.1103/PhysRevD.109.095014

Custodial Symmetry Breaking and Higgs Signatures at the LHC

Authors: Jon Butterworth, Hridoy Debnath, Pavel Fileviez Perez, Francis Mitchell

Abstract: We discuss the simplest model that predicts a tree level modification of the $ρ$ parameter from a shift in the $W$-mass without changing the prediction for the $Z$ mass. This model predicts a new neutral Higgs and two charged Higgses, with very similar masses and suppressed couplings to the Standard Model fermions. We discuss the decay properties of these new scalar bosons, and the main signatures… ▽ More We discuss the simplest model that predicts a tree level modification of the $ρ$ parameter from a shift in the $W$-mass without changing the prediction for the $Z$ mass. This model predicts a new neutral Higgs and two charged Higgses, with very similar masses and suppressed couplings to the Standard Model fermions. We discuss the decay properties of these new scalar bosons, and the main signatures at the Large Hadron Collider. Comparing these signatures for the first time to the latest measurements, we show that while masses around 200 GeV are excluded for some scenarios, over a wide range of model parameter space the new bosons can have a mass close to the electroweak scale without violating existing limits from experimental searches or destroying the agreement with measurements. We investigate the scenario where the new neutral Higgs is fermiophobic and has a large branching ratio into $W$ gauge bosons and/or photons, and show that this could lead to a signal in the diphoton mass spectrum at low Higgs masses. We discuss the different signatures that can motivate new measurements and searches at the Large Hadron Collider. △ Less

Submitted 23 April, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: some corrections, extra discussion, references added, conclusions unchanged, version to appear in Physical Review D

Report number: MCNET-23-06

Journal ref: Physical Review D 109, 095014 (2024)

arXiv:2309.08302 [pdf, other]

T-UDA: Temporal Unsupervised Domain Adaptation in Sequential Point Clouds

Authors: Awet Haileslassie Gebrehiwot, David Hurych, Karel Zimmermann, Patrick Pérez, Tomáš Svoboda

Abstract: Deep perception models have to reliably cope with an open-world setting of domain shifts induced by different geographic regions, sensor properties, mounting positions, and several other reasons. Since covering all domains with annotated data is technically intractable due to the endless possible variations, researchers focus on unsupervised domain adaptation (UDA) methods that adapt models traine… ▽ More Deep perception models have to reliably cope with an open-world setting of domain shifts induced by different geographic regions, sensor properties, mounting positions, and several other reasons. Since covering all domains with annotated data is technically intractable due to the endless possible variations, researchers focus on unsupervised domain adaptation (UDA) methods that adapt models trained on one (source) domain with annotations available to another (target) domain for which only unannotated data are available. Current predominant methods either leverage semi-supervised approaches, e.g., teacher-student setup, or exploit privileged data, such as other sensor modalities or temporal data consistency. We introduce a novel domain adaptation method that leverages the best of both trends. Our approach combines input data's temporal and cross-sensor geometric consistency with the mean teacher method. Dubbed T-UDA for "temporal UDA", such a combination yields massive performance gains for the task of 3D semantic segmentation of driving scenes. Experiments are conducted on Waymo Open Dataset, nuScenes and SemanticKITTI, for two popular 3D point cloud architectures, Cylinder3D and MinkowskiNet. Our codes are publicly available at https://github.com/ctu-vras/T-UDA. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Will appear at IEEE/RSJ International Conference on Intelligent Robots and Systems 2023 (IROS 2023)

arXiv:2309.01575 [pdf, other]

DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion

Authors: Cédric Rommel, Eduardo Valle, Mickaël Chen, Souhaiel Khalfaoui, Renaud Marlet, Matthieu Cord, Patrick Pérez

Abstract: We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstra… ▽ More We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstrate its ability to refine standard supervised 3D-HPE. We also show how diffusion models lead to more robust estimations in the face of occlusions, and improve the time-coherence and the sagittal symmetry of predictions. Using the Human\,3.6M dataset, we illustrate the effectiveness of our approach and its superiority over existing models, even under adverse situations where the occlusion patterns in training do not match those in inference. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models, opening exciting new avenues for 3D-HPE research. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted to 2023 International Conference on Computer Vision Workshop (Analysis and Modeling of Faces and Gestures)

arXiv:2308.07367 [pdf, ps, other]

doi 10.1103/PhysRevD.109.015011

Finite Naturalness and Quark-Lepton Unification

Authors: Pavel Fileviez Perez, Clara Murgui, Samuel Patrone, Adriano Testa, Mark B. Wise

Abstract: We study the implications of finite naturalness in Pati-Salam models where $SU(3)_C$ is embedded in $SU(4)$. For the minimal realization at low-scale of quark-lepton unification, which employs the inverse seesaw for neutrino masses, we find that radiative corrections to the Higgs boson mass are at least $δm_h^2 / m_h^2 \sim {\cal O}(10^4)$. The one-loop contributions to the Higgs mass are suppress… ▽ More We study the implications of finite naturalness in Pati-Salam models where $SU(3)_C$ is embedded in $SU(4)$. For the minimal realization at low-scale of quark-lepton unification, which employs the inverse seesaw for neutrino masses, we find that radiative corrections to the Higgs boson mass are at least $δm_h^2 / m_h^2 \sim {\cal O}(10^4)$. The one-loop contributions to the Higgs mass are suppressed by four powers of the hypercharge gauge coupling. We find that for the vector leptoquarks the naively leading part of the two-loop corrections cancel. We assume the Dirac Yukawa couplings for neutrinos are equal to the up-type quark Yukawa couplings as predicted in the minimal theory for quark-lepton unification. Despite these findings, the two-loop corrections still dominate the finite naturalness bound. We mention a way to relax the lower bound on the vector leptoquark mass and have $δm_h^2 / m_h^2 \sim {\cal O}(10^2)$. △ Less

Submitted 8 January, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 7 pages, v2: several corrections, version to appear in Physical Review D

Report number: Report-no: CALT-TH/2023-025

Journal ref: Physical Review D 109, 015011 (2024)

arXiv:2307.15392 [pdf, other]

Electronic structure and lattice dynamics of 1T-VSe$_2$: origin of the 3D-CDW

Authors: Josu Diego, D. Subires, A. H. Said, D. A. Chaney, A. Korshunov, G. Garbarino, F. Diekmann, K. Mahatha, V. Pardo, J. Strempfer, Pablo J. Bereciartua Perez, S. Francoual, C. Popescu, M. Tallarida, J. Dai, Raffaello Bianco, Lorenzo Monacelli, Matteo Calandra, A. Bosak, Francesco Mauri, K. Rossnagel, Adolfo O. Fumega, Ion Errea, S. Blanco-Canosa

Abstract: In order to characterize in detail the charge density wave (CDW) transition of 1$T$-VSe$_2$, its electronic structure and lattice dynamics are comprehensively studied by means of x-ray diffraction, angle resolved photoemission (ARPES), diffuse and inelastic x-ray scattering (IXS), and state-of-the-art first principles density functional theory calculations. Resonant elastic x-ray scattering (REXS)… ▽ More In order to characterize in detail the charge density wave (CDW) transition of 1$T$-VSe$_2$, its electronic structure and lattice dynamics are comprehensively studied by means of x-ray diffraction, angle resolved photoemission (ARPES), diffuse and inelastic x-ray scattering (IXS), and state-of-the-art first principles density functional theory calculations. Resonant elastic x-ray scattering (REXS) does not show any resonant enhancement at either V or Se K-edges, indicating that the CDW peak describes a purely structural modulation of the electronic ordering. ARPES identifies (i) a pseudogap at T$>$T$_{CDW}$, which leads to a depletion of the density of states in the $ML-M'L'$ plane at T$<$T$_{CDW}$, and (ii) anomalies in the electronic dispersion reflecting a sizable impact of phonons on it. A diffuse scattering precursor, characteristic of soft phonons, is observed at room temperature (RT) and leads to the full collapse of the low-energy phonon ($ω_1$) with propagation vector (0.25 0 -0.3) r.l.u. We show that the frequency and linewidth of this mode are anisotropic in momentum space, reflecting the momentum dependence of the electron-phonon interaction (EPI), hence demonstrating that the origin of the CDW is, to a much larger extent, due to the momentum dependence EPI with a small contribution from nesting. The pressure dependence of the $ω_1$ soft mode remains nearly constant up to 13 GPa at RT, with only a modest softening before the transition to the high-pressure monoclinic $C2/m$ phase. The wide set of experimental data are well captured by our state-of-the art first-principles anharmonic calculations with the inclusion of van der Waals (vdW) corrections in the exchange-correlation functional. The description of the electronics and dynamics of VSe$_2$ reported here adds important pieces of information to the understanding of the electronic modulations of TMDs. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.09361 [pdf, other]

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez

Abstract: Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose… ▽ More Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.03646 [pdf, other]

doi 10.1103/PhysRevD.108.075009

Low Scale Seesaw with Local Lepton Number

Authors: Hridoy Debnath, Pavel Fileviez Perez

Abstract: We discuss a class of theories for Majorana neutrinos where the total lepton number is a local gauge symmetry. These theories predict a dark matter candidate from anomaly cancellation. We discuss the properties of the dark matter candidate and using the cosmological bounds, we obtain the upper bound on the lepton number symmetry breaking scale. The dark matter candidate has unique annihilation cha… ▽ More We discuss a class of theories for Majorana neutrinos where the total lepton number is a local gauge symmetry. These theories predict a dark matter candidate from anomaly cancellation. We discuss the properties of the dark matter candidate and using the cosmological bounds, we obtain the upper bound on the lepton number symmetry breaking scale. The dark matter candidate has unique annihilation channels due to the fact that the theory predicts a light pseudo-Goldstone boson, the Majoron, and one can obtain the correct relic density in a large fraction of the parameter space. In this context, the seesaw scale is below the ${\cal{O}}(10^2)$TeV scale and one can hope to test the origin of neutrino masses at current or future colliders. We discuss the lepton number violating Higgs decays and the possibility to observe lepton number violation at the Large Hadron Collider. △ Less

Submitted 7 September, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: v2: extra discussion, a few corrections, new references, version to appear in Physical Review D

Journal ref: Phys. Rev. D 108, 075009 (2023)

arXiv:2307.03168 [pdf, other]

Recovering implicit pitch contours from formants in whispered speech

Authors: Pablo Pérez Zarazaga, Zofia Malisz

Abstract: Whispered speech is characterised by a noise-like excitation that results in the lack of fundamental frequency. Considering that prosodic phenomena such as intonation are perceived through f0 variation, the perception of whispered prosody is relatively difficult. At the same time, studies have shown that speakers do attempt to produce intonation when whispering and that prosodic variability is bei… ▽ More Whispered speech is characterised by a noise-like excitation that results in the lack of fundamental frequency. Considering that prosodic phenomena such as intonation are perceived through f0 variation, the perception of whispered prosody is relatively difficult. At the same time, studies have shown that speakers do attempt to produce intonation when whispering and that prosodic variability is being transmitted, suggesting that intonation "survives" in whispered formant structure. In this paper, we aim to estimate the way in which formant contours correlate with an "implicit" pitch contour in whisper, using a machine learning model. We propose a two-step method: using a parallel corpus, we first transform the whispered formants into their phonated equivalents using a denoising autoencoder. We then analyse the formant contours to predict phonated pitch contour variation. We observe that our method is effective in establishing a relationship between whispered and phonated formants and in uncovering implicit pitch contours in whisper. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 5 pages, 3 figures, 2 tables, Accepted at ICPhS 2023

arXiv:2307.00820 [pdf, other]

Butterfly factorization by algorithmic identification of rank-one blocks

Authors: Léon Zheng, Gilles Puy, Elisa Riccietti, Patrick Pérez, Rémi Gribonval

Abstract: Many matrices associated with fast transforms posess a certain low-rank property characterized by the existence of several block partitionings of the matrix, where each block is of low rank. Provided that these partitionings are known, there exist algorithms, called butterfly factorization algorithms, that approximate the matrix into a product of sparse factors, thus enabling a rapid evaluation of… ▽ More Many matrices associated with fast transforms posess a certain low-rank property characterized by the existence of several block partitionings of the matrix, where each block is of low rank. Provided that these partitionings are known, there exist algorithms, called butterfly factorization algorithms, that approximate the matrix into a product of sparse factors, thus enabling a rapid evaluation of the associated linear operator. This paper proposes a new method to identify algebraically these block partitionings for a matrix admitting a butterfly factorization, without any analytical assumption on its entries. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: in French language. XXIX{è}me Colloque Francophone de Traitement du Signal et des Images, Aug 2023, Grenoble, France

arXiv:2306.15801 [pdf, other]

doi 10.1140/epjc/s10052-023-12137-y

Production of antihydrogen atoms by 6 keV antiprotons through a positronium cloud

Authors: P. Adrich, P. Blumer, G. Caratsch, M. Chung, P. Cladé, P. Comini, P. Crivelli, O. Dalkarov, P. Debu, A. Douillet, D. Drapier, P. Froelich, N. Garroum, S. Guellati-Khelifa, J. Guyomard, P-A. Hervieux, L. Hilico, P. Indelicato, S. Jonsell, J-P. Karr, B. Kim, S. Kim, E-S. Kim, Y. J. Ko, T. Kosinski , et al. (39 additional authors not shown)

Abstract: We report on the first production of an antihydrogen beam by charge exchange of 6.1 keV antiprotons with a cloud of positronium in the GBAR experiment at CERN. The antiproton beam was delivered by the AD/ELENA facility. The positronium target was produced from a positron beam itself obtained from an electron linear accelerator. We observe an excess over background indicating antihydrogen productio… ▽ More We report on the first production of an antihydrogen beam by charge exchange of 6.1 keV antiprotons with a cloud of positronium in the GBAR experiment at CERN. The antiproton beam was delivered by the AD/ELENA facility. The positronium target was produced from a positron beam itself obtained from an electron linear accelerator. We observe an excess over background indicating antihydrogen production with a significance of 3-4 standard deviations. △ Less

Submitted 3 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Journal ref: European Physical Journal C 83, 1004 (2023)

Showing 1–50 of 775 results for author: Perez, P