-
Handling new target classes in semantic segmentation with domain adaptation
Authors:
Maxime Bucher,
Tuan-Hung Vu,
Matthieu Cord,
Patrick Pérez
Abstract:
In this work, we define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Different to "open-set" and "universal domain adaptation", which both regard all objects from new classes as "unknown", we aim at expl…
▽ More
In this work, we define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Different to "open-set" and "universal domain adaptation", which both regard all objects from new classes as "unknown", we aim at explicit test-time prediction for these new classes. To reach this goal, we propose a framework that leverages domain adaptation and zero-shot learning techniques to enable "boundless" adaptation in the target domain. It relies on a novel architecture, along with a dedicated learning scheme, to bridge the source-target domain gap while learning how to map new classes' labels to relevant visual representations. The performance is further improved using self-training on target-domain pseudo-labels. For validation, we consider different domain adaptation set-ups, namely synthetic-2-real, country-2-country and dataset-2-dataset. Our framework outperforms the baselines by significant margins, setting competitive standards on all benchmarks for the new task. Code and models are available at https://github.com/valeoai/buda.
△ Less
Submitted 16 February, 2021; v1 submitted 2 April, 2020;
originally announced April 2020.
-
StyleRig: Rigging StyleGAN for 3D Control over Portrait Images
Authors:
Ayush Tewari,
Mohamed Elgharib,
Gaurav Bharaj,
Florian Bernard,
Hans-Peter Seidel,
Patrick Pérez,
Michael Zollhöfer,
Christian Theobalt
Abstract:
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealis…
▽ More
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face.
△ Less
Submitted 13 June, 2020; v1 submitted 31 March, 2020;
originally announced April 2020.
-
Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling
Authors:
Ester Gonzalez-Sosa,
Pablo Perez,
Ruben Tolosana,
Redouane Kachach,
Alvaro Villegas
Abstract:
In this study, we focus on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality (AV). The main contributions of this work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images, comprising variations of skin color, and gender, among others. We provide all details required f…
▽ More
In this study, we focus on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality (AV). The main contributions of this work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images, comprising variations of skin color, and gender, among others. We provide all details required for the automated generation of groundtruth and semi-synthetic images; iii) the use of deep learning for the first time for segmenting arms in AV; iv) to showcase the usefulness of this database, we report results on different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands, Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture, which allow for direct comparisons with existing approaches utilizing color or depth. Results confirm the suitability of the EgoArm dataset for this task, achieving improvement up to 40% with respect to the original network, depending on the particular dataset. Results also suggest that, while approaches based on color or depth can work in controlled conditions (lack of occlusion, uniform lighting, only objects of interest in the near range, controlled background, etc.), egocentric segmentation based on deep learning is more robust in real AV applications.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
The Higgs and Leptophobic Force at the LHC
Authors:
Pavel Fileviez Perez,
Elliot Golias,
Clara Murgui,
Alexis D. Plascencia
Abstract:
The Higgs boson could provide the key to discover new physics at the Large Hadron Collider. We investigate novel decays of the Standard Model (SM) Higgs boson into leptophobic gauge bosons which can be light in agreement with all experimental constraints. We study the associated production of the SM Higgs and the leptophobic gauge boson that could be crucial to test the existence of a leptophobic…
▽ More
The Higgs boson could provide the key to discover new physics at the Large Hadron Collider. We investigate novel decays of the Standard Model (SM) Higgs boson into leptophobic gauge bosons which can be light in agreement with all experimental constraints. We study the associated production of the SM Higgs and the leptophobic gauge boson that could be crucial to test the existence of a leptophobic force. Our results demonstrate that it is possible to have a simple gauge extension of the SM at the low scale, without assuming very small couplings and in agreement with all the experimental bounds that can be probed at the LHC.
△ Less
Submitted 11 June, 2020; v1 submitted 20 March, 2020;
originally announced March 2020.
-
Learning Representations by Predicting Bags of Visual Words
Authors:
Spyros Gidaris,
Andrei Bursuc,
Nikos Komodakis,
Patrick Pérez,
Matthieu Cord
Abstract:
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a…
▽ More
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a first pre-trained self-supervised convnet, over a k-means based vocabulary. Then, as a self-supervised task, we train another convnet to predict the histogram of visual words of an image (i.e., its Bag-of-Words representation) given as input a perturbed version of that image. The proposed task forces the convnet to learn perturbation-invariant and context-aware image features, useful for downstream image understanding tasks. We extensively evaluate our method and demonstrate very strong empirical results, e.g., our pre-trained self-supervised representations transfer better on detection task and similarly on classification over classes "unseen" during pre-training, when compared to the supervised case.
This also shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Deep Reinforcement Learning for Autonomous Driving: A Survey
Authors:
B Ravi Kiran,
Ibrahim Sobh,
Victor Talpaert,
Patrick Mannion,
Ahmad A. Al Sallab,
Senthil Yogamani,
Patrick Pérez
Abstract:
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computat…
▽ More
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
△ Less
Submitted 23 January, 2021; v1 submitted 2 February, 2020;
originally announced February 2020.
-
Scattering Features for Multimodal Gait Recognition
Authors:
Srđan Kitić,
Gilles Puy,
Patrick Pérez,
Philippe Gilberton
Abstract:
We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we prop…
▽ More
We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we propose a feature extraction method based on an (untrained) shallow scattering network, specially tailored for the gait signals. Second, we demonstrate that fusing the two modalities improves identification in the practically relevant open set scenario.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
A survey for variable young stars with small telescopes: II -- Map** a protoplanetary disk with stable structures at 0.15 AU
Authors:
Jack J. Evitts,
Dirk Froebrich,
Aleks Scholz,
Jochen Eislöffel,
Justyn Campbell-White,
Will Furnell,
Thomas Urtly,
Roger Pickard,
Klaas Wiersema,
Pavol A. Dubovský,
Igor Kudzej,
Ramon Naves,
Mario Morales Aimar,
Rafael Castillo García,
Tonny Vanmunster,
Erik Schwendeman,
Francisco C. Soldán Alfaro,
Stephen Johnstone,
Rafael Gonzalez Farfán,
Thomas Killestein,
Jesús Delgado Casal,
Faustino García de la Cuesta,
Dean Roberts,
Ulrich Kolb,
Luís Montoro
, et al. (35 additional authors not shown)
Abstract:
The HOYS citizen science project conducts long term, multifilter, high cadence monitoring of large YSO samples with a wide variety of professional and amateur telescopes. We present the analysis of the light curve of V1490Cyg in the Pelican Nebula. We show that colour terms in the diverse photometric data can be calibrated out to achieve a median photometric accuracy of 0.02mag in broadband filter…
▽ More
The HOYS citizen science project conducts long term, multifilter, high cadence monitoring of large YSO samples with a wide variety of professional and amateur telescopes. We present the analysis of the light curve of V1490Cyg in the Pelican Nebula. We show that colour terms in the diverse photometric data can be calibrated out to achieve a median photometric accuracy of 0.02mag in broadband filters, allowing detailed investigations into a variety of variability amplitudes over timescales from hours to several years. Using GaiaDR2 we estimate the distance to the Pelican Nebula to be 870$^{+70}_{-55}$pc. V1490Cyg is a quasi-periodic dipper with a period of 31.447$\pm$0.011d. The obscuring dust has homogeneous properties, and grains larger than those typical in the ISM. Larger variability on short timescales is observed in U and R$_c-$H$α$, with U-amplitudes reaching 3mag on timescales of hours, indicating the source is accreting. The H$α$ equivalent width and NIR/MIR colours place V1490Cyg between CTTS/WTTS and transition disk objects. The material responsible for the dip** is located in a warped inner disk, about 0.15AU from the star. This mass reservoir can be filled and emptied on time scales shorter than the period at a rate of up to 10$^{-10}$M$_\odot$/yr, consistent with low levels of accretion in other T Tauri stars. Most likely the warp at this separation from the star is induced by a protoplanet in the inner accretion disk. However, we cannot fully rule out the possibility of an AA Tau-like warp, or occultations by the Hill sphere around a forming planet.
△ Less
Submitted 17 January, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
QUEST: Quantized embedding space for transferring knowledge
Authors:
Himalaya Jain,
Spyros Gidaris,
Nikos Komodakis,
Patrick Pérez,
Matthieu Cord
Abstract:
Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the…
▽ More
Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized space. According to our method, the teacher's feature maps are quantized to represent the main visual concepts encompassed in the feature maps. The student is then asked to predict the quantized representation, which thus forms the task that the student uses to learn from the teacher. Despite its simplicity, we show that our approach is able to yield results that improve the state of the art on knowledge distillation. To that end, we provide an extensive evaluation across several network architectures and most commonly used benchmark datasets.
△ Less
Submitted 17 July, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation
Authors:
Maximilian Jaritz,
Tuan-Hung Vu,
Raoul de Charette,
Émilie Wirbel,
Patrick Pérez
Abstract:
Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of annotations in a new domain. There are many multi-modal datasets, but most UDA approaches are uni-modal. In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. This is challenging as the two input sp…
▽ More
Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of annotations in a new domain. There are many multi-modal datasets, but most UDA approaches are uni-modal. In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. This is challenging as the two input spaces are heterogeneous and can be impacted differently by domain shift. In xMUDA, modalities learn from each other through mutual mimicking, disentangled from the segmentation objective, to prevent the stronger modality from adopting false predictions from the weaker one. We evaluate on new UDA scenarios including day-to-night, country-to-country and dataset-to-dataset, leveraging recent autonomous driving datasets. xMUDA brings large improvements over uni-modal UDA on all tested scenarios, and is complementary to state-of-the-art UDA techniques. Code is available at https://github.com/valeoai/xmuda.
△ Less
Submitted 30 March, 2020; v1 submitted 28 November, 2019;
originally announced November 2019.
-
Biquaternionic Reformulation of a Fractional Monochromatic Maxwell System
Authors:
Yudier Peña Pérez,
Ricardo Abreu Blaya,
Martín Patricio Árciga Alejandre,
Juan Bory Reyes
Abstract:
In this work we propose a biquaternionic reformulation of a fractional monochromatic Maxwell system. Additionally, some examples are given to illustrate how the quaternionic fractional approach emerges in linear hydrodynamic and elasticity.
In this work we propose a biquaternionic reformulation of a fractional monochromatic Maxwell system. Additionally, some examples are given to illustrate how the quaternionic fractional approach emerges in linear hydrodynamic and elasticity.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Axion Dark Matter, Proton Decay and Unification
Authors:
Pavel Fileviez Perez,
Clara Murgui,
Alexis D. Plascencia
Abstract:
We discuss the possibility to predict the QCD axion mass in the context of grand unified theories. We investigate the implementation of the DFSZ mechanism in the context of renormalizable SU(5) theories. In the simplest theory, the axion mass can be predicted with good precision in the range $m_a = (2-16)$ neV, and there is a strong correlation between the predictions for the axion mass and proton…
▽ More
We discuss the possibility to predict the QCD axion mass in the context of grand unified theories. We investigate the implementation of the DFSZ mechanism in the context of renormalizable SU(5) theories. In the simplest theory, the axion mass can be predicted with good precision in the range $m_a = (2-16)$ neV, and there is a strong correlation between the predictions for the axion mass and proton decay rates. In this context, we predict an upper bound for the proton decay channels with antineutrinos, $τ(p\to K^+ \barν) \lesssim 4 \times 10^{37} \text{ yr}$ and $τ(p \to π^+ \barν) \lesssim 2 \times 10^{36}\text{ yr}$. This theory can be considered as the minimal realistic grand unified theory with the DFSZ mechanism and it can be fully tested by proton decay and axion experiments.
△ Less
Submitted 20 December, 2019; v1 submitted 13 November, 2019;
originally announced November 2019.
-
This dataset does not exist: training models from generated images
Authors:
Victor Besnier,
Himalaya Jain,
Andrei Bursuc,
Matthieu Cord,
Patrick Pérez
Abstract:
Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data cha…
▽ More
Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data challenges standard practices for training machine learning models, which have been crafted across the years for limited and fixed size datasets. In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. We propose three standalone techniques that can be applied at different stages of the pipeline, i.e., data generation, training on generated data, and deploying on real data. We evaluate our proposed approaches on a subset of the ImageNet dataset and show encouraging results compared to classifiers trained on real images.
△ Less
Submitted 7 November, 2019;
originally announced November 2019.
-
Belief revision and 3-valued logics: Characterization of 19,683 belief change operators
Authors:
Nerio Borges,
Ramón Pino Pérez
Abstract:
In most classical models of belief change, epistemic states are represented by theories (AGM) or formulas (Katsuno-Mendelzon) and the new pieces of information by formulas. The Representation Theorem for revision operators says that operators are represented by total preorders. This important representation is exploited by Darwiche and Pearl to shift the notion of epistemic state to a more abstrac…
▽ More
In most classical models of belief change, epistemic states are represented by theories (AGM) or formulas (Katsuno-Mendelzon) and the new pieces of information by formulas. The Representation Theorem for revision operators says that operators are represented by total preorders. This important representation is exploited by Darwiche and Pearl to shift the notion of epistemic state to a more abstract one, where the paradigm of epistemic state is indeed that of a total preorder over interpretations. In this work, we introduce a 3-valued logic where the formulas can be identified with a generalisation of total preorders of three levels: a ranking function map** interpretations into the truth values. Then we analyse some sort of changes in this kind of structures and give syntactical characterizations of them.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
A Finite-Volume Method for Fluctuating Dynamical Density Functional Theory
Authors:
Antonio Russo,
Sergio P. Perez,
Miguel A. Durán-Olivencia,
Peter Yatsyshin,
José A. Carrillo,
Serafim Kalliadasis
Abstract:
We introduce a finite-volume numerical scheme for solving stochastic gradient-flow equations. Such equations are of crucial importance within the framework of fluctuating hydrodynamics and dynamic density functional theory. Our proposed scheme deals with general free-energy functionals, including, for instance, external fields or interaction potentials. This allows us to simulate a range of physic…
▽ More
We introduce a finite-volume numerical scheme for solving stochastic gradient-flow equations. Such equations are of crucial importance within the framework of fluctuating hydrodynamics and dynamic density functional theory. Our proposed scheme deals with general free-energy functionals, including, for instance, external fields or interaction potentials. This allows us to simulate a range of physical phenomena where thermal fluctuations play a crucial role, such as nucleation and other energy-barrier crossing transitions. A positivity-preserving algorithm for the density is derived based on a hybrid space discretization of the deterministic and the stochastic terms and different implicit and explicit time integrators. We show through numerous applications that not only our scheme is able to accurately reproduce the statistical properties (structure factor and correlations) of the physical system, but, because of the multiplicative noise, it allows us to simulate energy barrier crossing dynamics, which cannot be captured by mean-field approaches.
△ Less
Submitted 31 December, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Addressing Failure Prediction by Learning Model Confidence
Authors:
Charles Corbière,
Nicolas Thome,
Avner Bar-Hen,
Matthieu Cord,
Patrick Pérez
Abstract:
Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addi…
▽ More
Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.
△ Less
Submitted 26 October, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
The combinatorics of plane curve singularities. How Newton polygons blossom into lotuses
Authors:
Evelia R. García Barroso,
Pedro D. González Pérez,
Patrick Popescu-Pampu
Abstract:
This survey may be seen as an introduction to the use of toric and tropical geometry in the analysis of plane curve singularities, which are germs $(C,o)$ of complex analytic curves contained in a smooth complex analytic surface $S$. The embedded topological type of such a pair $(S, C)$ is usually defined to be that of the oriented link obtained by intersecting $C$ with a sufficiently small orient…
▽ More
This survey may be seen as an introduction to the use of toric and tropical geometry in the analysis of plane curve singularities, which are germs $(C,o)$ of complex analytic curves contained in a smooth complex analytic surface $S$. The embedded topological type of such a pair $(S, C)$ is usually defined to be that of the oriented link obtained by intersecting $C$ with a sufficiently small oriented Euclidean sphere centered at the point $o$, defined once a system of local coordinates $(x,y)$ was chosen on the germ $(S,o)$. If one works more generally over an arbitrary algebraically closed field of characteristic zero, one speaks instead of the combinatorial type of $(S, C)$. One may define it by looking either at the Newton-Puiseux series associated to $C$ relative to a generic local coordinate system $(x,y)$, or at the set of infinitely near points which have to be blown up in order to get the minimal embedded resolution of the germ $(C,o)$ or, thirdly, at the preimage of this germ by the resolution. Each point of view leads to a different encoding of the combinatorial type by a decorated tree: an Eggers-Wall tree, an Enriques diagram, or a weighted dual graph. The three trees contain the same information, which in the complex setting is equivalent to the knowledge of the embedded topological type. There are known algorithms for transforming one tree into another. In this paper we explain how a special type of two-dimensional simplicial complex called a lotus allows to think geometrically about the relations between the three types of trees. Namely, all of them embed in a natural lotus, their numerical decorations appearing as invariants of it. This lotus is constructed from the finite set of Newton polygons created during any process of resolution of $(C,o)$ by successive toric modifications.
△ Less
Submitted 22 June, 2020; v1 submitted 15 September, 2019;
originally announced September 2019.
-
The QCD Axion and Unification
Authors:
Pavel Fileviez Perez,
Clara Murgui,
Alexis D. Plascencia
Abstract:
The QCD axion is one of the most appealing candidates for the dark matter in the Universe. In this article, we discuss the possibility to predict the axion mass in the context of a simple renormalizable grand unified theory where the Peccei-Quinn scale is determined by the unification scale. In this framework, the axion mass is predicted to be in the range…
▽ More
The QCD axion is one of the most appealing candidates for the dark matter in the Universe. In this article, we discuss the possibility to predict the axion mass in the context of a simple renormalizable grand unified theory where the Peccei-Quinn scale is determined by the unification scale. In this framework, the axion mass is predicted to be in the range $m_a \simeq (3 - 13) \times 10^{-9} \ \rm{eV}$. We study the axion phenomenology and find that the ABRACADABRA and CASPEr-Electric experiments will be able to fully probe this mass window.
△ Less
Submitted 4 November, 2019; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Boosting Few-Shot Visual Learning with Self-Supervision
Authors:
Spyros Gidaris,
Andrei Bursuc,
Nikos Komodakis,
Patrick Pérez,
Matthieu Cord
Abstract:
Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capac…
▽ More
Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capacity deep neural networks. In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. We use self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples. Through self-supervision, our approach can be naturally extended towards using diverse unlabeled data from other datasets in the few-shot setting. We report consistent improvements across an array of architectures, datasets and self-supervision techniques.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Zero-Shot Semantic Segmentation
Authors:
Maxime Bucher,
Tuan-Hung Vu,
Matthieu Cord,
Patrick Pérez
Abstract:
Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate…
▽ More
Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. By this way, ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time (so called "generalized" zero-shot classification). Performance is further improved by a self-training step that relies on automatic pseudo-labeling of pixels from unseen classes. On the two standard segmentation datasets, Pascal-VOC and Pascal-Context, we propose zero-shot benchmarks and set competitive baselines. For complex scenes as ones in the Pascal-Context dataset, we extend our approach by using a graph-context encoding to fully leverage spatial context priors coming from class-wise segmentation maps.
△ Less
Submitted 18 November, 2019; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Electrical conductivity in extremely disordered molybdenum oxynitrides thin films
Authors:
J. A. Hofer,
S. Bengio,
G. Rozas,
P. D. Pérez,
M. Sirena,
S. Suárez,
N. Haberkorn
Abstract:
We report on the influence of the chemical composition on the electronic properties of molybdenum oxynitrides thin films grown by reactive sputtering on Si (100) substrates at room temperature. The partial pressure of Ar was fixed at 90 %, and the remaining 10 % was adjusted with mixtures N$_2$:O$_2$ (varying from pure N$_2$ to pure O$_2$). The crystalline and electronic structures and the electri…
▽ More
We report on the influence of the chemical composition on the electronic properties of molybdenum oxynitrides thin films grown by reactive sputtering on Si (100) substrates at room temperature. The partial pressure of Ar was fixed at 90 %, and the remaining 10 % was adjusted with mixtures N$_2$:O$_2$ (varying from pure N$_2$ to pure O$_2$). The crystalline and electronic structures and the electrical transport of the films depend on the chemical composition. Thin films grown using oxygen mixtures up 2 % have gamma-Mo$_2$N phase and display superconductivity. The superconducting critical temperature T$_c$ reduces from ~ 6.8 K to below 3.0 K as the oxygen increases. On the other hand, films grown using oxygen mixtures richer than 2 % are mostly amorphous. The electrical transport shows a semiconductor-like behavior with variable-range hop** conduction at low temperatures. The analysis of the optical properties reveals that the samples have not a defined semiconductor band gap, which can be related to the high structural disorder and the excitation of electrons in a wide range of energies
△ Less
Submitted 20 June, 2019; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Neutrino-Dark Matter Connections in Gauge Theories
Authors:
Pavel Fileviez Perez,
Clara Murgui,
Alexis D. Plascencia
Abstract:
We discuss the connection between the origin of neutrino masses and the properties of dark matter candidates in the context of gauge extensions of the Standard Model. We investigate minimal gauge theories for neutrino masses where the neutrinos are predicted to be Dirac or Majorana fermions. We find that the upper bound on the effective number of relativistic species provides a strong constraint i…
▽ More
We discuss the connection between the origin of neutrino masses and the properties of dark matter candidates in the context of gauge extensions of the Standard Model. We investigate minimal gauge theories for neutrino masses where the neutrinos are predicted to be Dirac or Majorana fermions. We find that the upper bound on the effective number of relativistic species provides a strong constraint in the scenarios with Dirac neutrinos. In the context of theories where the lepton number is a local gauge symmetry spontaneously broken at the low scale, the existence of dark matter is predicted from the condition of anomaly cancellation. Applying the cosmological bound on the dark matter relic density, we find an upper bound on the symmetry breaking scale in the multi-TeV region. These results imply we could hope to test simple gauge theories for neutrino masses at current or future experiments.
△ Less
Submitted 23 August, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Methodology for accurately assessing the quality perceived by users on 360VR contents
Authors:
Lara Muñoz,
César Díaz,
Marta Orduna,
José Ignacio Ronda,
Pablo Pérez,
Ignacio Benito,
Narciso García
Abstract:
To properly evaluate the performance of 360VR-specific encoding and transmission schemes, and particularly of the solutions based on viewport adaptation, it is necessary to consider not only the bandwidth saved, but also the quality of the portion of the scene actually seen by users over time. With this motivation, we propose a robust, yet flexible methodology for accurately assessing the quality…
▽ More
To properly evaluate the performance of 360VR-specific encoding and transmission schemes, and particularly of the solutions based on viewport adaptation, it is necessary to consider not only the bandwidth saved, but also the quality of the portion of the scene actually seen by users over time. With this motivation, we propose a robust, yet flexible methodology for accurately assessing the quality within the viewport along the visualization session. This procedure is based on a complete analysis of the geometric relations involved. Moreover, the designed methodology allows for both offline and online usage thanks to the use of different approximations. In this way, our methodology can be used regardless of the approach to properly evaluate the implemented strategy, obtaining a fairer comparison between them.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
The Missing Data Encoder: Cross-Channel Image Completion\\with Hide-And-Seek Adversarial Network
Authors:
Arnaud Dapogny,
Matthieu Cord,
Patrick Perez
Abstract:
Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial t…
▽ More
Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial training as well as perceptual and completion losses, and call it the ``missing data encoder'' (MDE). We consider several configurations based on how the seed fragments are chosen. We show that training MDE for ``random extrapolation and colorization'' (MDE-REC), i.e. using random channel-independent fragments, allows a better capture of the image semantics and geometry. MDE training makes use of a novel ``hide-and-seek'' adversarial loss, where the discriminator seeks the original non-masked regions, while the generator tries to hide them. We validate our models both qualitatively and quantitatively on several datasets, showing their interest for image completion, unsupervised representation learning as well as face occlusion handling.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving
Authors:
Senthil Yogamani,
Ciaran Hughes,
Jonathan Horgan,
Ganesh Sistu,
Padraig Varley,
Derek O'Dea,
Michal Uricar,
Stefan Milz,
Martin Simon,
Karl Amende,
Christian Witt,
Hazem Rashed,
Sumanth Chennupati,
Sanjaya Nayak,
Saquib Mansoor,
Xavier Perroton,
Patrick Perez
Abstract:
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fish…
▽ More
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for fisheye camera instead of using naive rectification.
△ Less
Submitted 2 July, 2021; v1 submitted 4 May, 2019;
originally announced May 2019.
-
SoDeep: a Sorting Deep net to learn ranking loss surrogates
Authors:
Martin Engilberge,
Louis Chevallier,
Patrick Pérez,
Matthieu Cord
Abstract:
Several tasks in machine learning are evaluated using non-differentiable metrics such as mean average precision or Spearman correlation. However, their non-differentiability prevents from using them as objective functions in a learning framework. Surrogate and relaxation methods exist but tend to be specific to a given metric.
In the present work, we introduce a new method to learn approximation…
▽ More
Several tasks in machine learning are evaluated using non-differentiable metrics such as mean average precision or Spearman correlation. However, their non-differentiability prevents from using them as objective functions in a learning framework. Surrogate and relaxation methods exist but tend to be specific to a given metric.
In the present work, we introduce a new method to learn approximations of such non-differentiable objective functions. Our approach is based on a deep architecture that approximates the sorting of arbitrary sets of scores. It is trained virtually for free using synthetic data. This sorting deep (SoDeep) net can then be combined in a plug-and-play manner with existing deep architectures. We demonstrate the interest of our approach in three different tasks that require ranking: Cross-modal text-image retrieval, multi-label image classification and visual memorability ranking. Our approach yields very competitive results on these three tasks, which validates the merit and the flexibility of SoDeep as a proxy for sorting operation in ranking-based losses.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Unsupervised Image Matching and Object Discovery as Optimization
Authors:
Huy V. Vo,
Francis Bach,
Minsu Cho,
Kai Han,
Yann LeCun,
Patrick Perez,
Jean Ponce
Abstract:
Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object…
▽ More
Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object categories among images in a collection, following the work of Cho et al. 2015. We show that the original approach can be reformulated and solved as a proper optimization problem. Experiments on several benchmarks establish the merit of our approach.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
DADA: Depth-aware Domain Adaptation in Semantic Segmentation
Authors:
Tuan-Hung Vu,
Himalaya Jain,
Maxime Bucher,
Matthieu Cord,
Patrick Pérez
Abstract:
Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real "target domain" data models that are trained on annotated images from a different "source domain", notably a virtual environment. To this end, most previous works consider semantic segmentation as the…
▽ More
Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real "target domain" data models that are trained on annotated images from a different "source domain", notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks.
△ Less
Submitted 19 August, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
On Anomaly-Free Dark Matter Models
Authors:
Pavel Fileviez Perez,
Elliot Golias,
Rui-Hao Li,
Clara Murgui,
Alexis D. Plascencia
Abstract:
We investigate the predictions of anomaly-free dark matter models for direct and indirect detection experiments. We focus on gauge theories where the existence of a fermionic dark matter candidate is predicted by anomaly cancellation, its mass is defined by the new symmetry breaking scale, and its stability is guaranteed by a remnant symmetry after the breaking of the gauge symmetry. We find an up…
▽ More
We investigate the predictions of anomaly-free dark matter models for direct and indirect detection experiments. We focus on gauge theories where the existence of a fermionic dark matter candidate is predicted by anomaly cancellation, its mass is defined by the new symmetry breaking scale, and its stability is guaranteed by a remnant symmetry after the breaking of the gauge symmetry. We find an upper bound on the symmetry breaking scale by applying the relic density and perturbative constraints. The anomaly-free property of the theories allows us to perform a full study of the gamma lines from dark matter annihilation. We investigate the correlation between predictions for final radiation processes and gamma lines. Furthermore, we demonstrate that the latter can be distinguished from the continuum gamma ray spectrum.
△ Less
Submitted 13 June, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
LHCb VELO Timepix3 Telescope
Authors:
Kazu Akiba,
Martin van Beuzekom,
Henk Boterenbrood,
Emma Buchanan,
Jan Buytaert,
Wiktor Byczynski,
Xabier Cid Vidal,
Paula Collins,
Elena Dall'Occo,
Alvaro Dosil Suarez,
Raphael Dumps,
Tim Evans,
Vinicius Franco Lima,
Abraham Gallas Torreira,
Julian Garcia Pardinas,
Bas van der Heijden,
Christoph Hombach,
Malcolm John,
Szymon Kulis,
Xavi Llopart Cudie,
Franciole Marinho,
Eugenia Price,
Sophie Richards,
Pablo Rodriguez Perez,
Daniel Saunders
, et al. (8 additional authors not shown)
Abstract:
The LHCb VELO Timepix3 telescope is a silicon pixel tracking system constructed initially to evaluate the performance of LHCb VELO Upgrade prototypes. The telesope consists of eight hybrid pixel silicon sensor planes equipped with the Timepix3 ASIC. The planes provide excellent charge measurement, timestam** and spatial resolution and the system can function at high track rates. This paper descr…
▽ More
The LHCb VELO Timepix3 telescope is a silicon pixel tracking system constructed initially to evaluate the performance of LHCb VELO Upgrade prototypes. The telesope consists of eight hybrid pixel silicon sensor planes equipped with the Timepix3 ASIC. The planes provide excellent charge measurement, timestam** and spatial resolution and the system can function at high track rates. This paper describes the construction of the telescope and its data acquisition system and offline reconstruction software. A timing resolution of 350~ps was obtained for reconstructed tracks. A pointing resolution of better than 2~\mum was determined for the 180~GeV/c %\gevc mixed hadron beam at the CERN SPS. The telescope has been shown to operate at a rate of 5 million particles~\unit{s^{-1}\cdot cm^{-2}} without a loss in efficiency.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Gamma Lines from the Hidden Sector
Authors:
Pavel Fileviez Perez,
Clara Murgui
Abstract:
We discuss the visibility of gamma lines from dark matter annihilation. We point out a class of theories for dark matter which predict the existence of gamma lines with striking features. In these theories, the final state radiation processes are highly suppressed and one could distinguish easily the gamma lines from the continuum spectrum. We discuss the main experimental bounds and show that one…
▽ More
We discuss the visibility of gamma lines from dark matter annihilation. We point out a class of theories for dark matter which predict the existence of gamma lines with striking features. In these theories, the final state radiation processes are highly suppressed and one could distinguish easily the gamma lines from the continuum spectrum. We discuss the main experimental bounds and show that one could test the predictions for gamma lines in the near future in the context of simple gauge theories for dark matter.
△ Less
Submitted 26 November, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Video Multimethod Assessment Fusion (VMAF) on 360VR contents
Authors:
Marta Orduna,
César Díaz,
Lara Muñoz,
Pablo Pérez,
Ignacio Benito,
Narciso García
Abstract:
This paper describes the subjective experiments and subsequent analysis carried out to validate the application of one of the most robust and influential video quality metrics, Video Multimethod Assessment Fusion (VMAF), to 360VR contents. VMAF is a full reference metric initially designed to work with traditional 2D contents. Hence, at first, it cannot be assumed to be compatible with the particu…
▽ More
This paper describes the subjective experiments and subsequent analysis carried out to validate the application of one of the most robust and influential video quality metrics, Video Multimethod Assessment Fusion (VMAF), to 360VR contents. VMAF is a full reference metric initially designed to work with traditional 2D contents. Hence, at first, it cannot be assumed to be compatible with the particularities of the scenario where omnidirectional content is visualized using a Head-Mounted Display (HMD). Therefore, through a complete set of tests, we prove that this metric can be successfully used without any specific training or adjustments to obtain the quality of 360VR sequences actually perceived by users.
△ Less
Submitted 18 January, 2019;
originally announced January 2019.
-
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Authors:
Victor Talpaert,
Ibrahim Sobh,
B Ravi Kiran,
Patrick Mannion,
Senthil Yogamani,
Ahmad El-Sallab,
Patrick Perez
Abstract:
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still…
▽ More
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.
△ Less
Submitted 16 January, 2019; v1 submitted 6 January, 2019;
originally announced January 2019.
-
FML: Face Model Learning from Videos
Authors:
Ayush Tewari,
Florian Bernard,
Pablo Garrido,
Gaurav Bharaj,
Mohamed Elgharib,
Hans-Peter Seidel,
Patrick Pérez,
Michael Zollhöfer,
Christian Theobalt
Abstract:
Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) lea…
▽ More
Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multi-frame consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.
△ Less
Submitted 9 April, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Well-balanced finite volume schemes for hydrodynamic equations with general free energy
Authors:
José A. Carrillo,
Serafim Kalliadasis,
Sergio P. Perez,
Chi-Wang Shu
Abstract:
Well balanced and free energy dissipative first- and second-order accurate finite volume schemes are proposed for a general class of hydrodynamic systems with linear and nonlinear dam**. The natural Liapunov functional of the system, given by its free energy, allows for a characterization of the stationary states by its variation. An analog property at the discrete level enables us to preserve s…
▽ More
Well balanced and free energy dissipative first- and second-order accurate finite volume schemes are proposed for a general class of hydrodynamic systems with linear and nonlinear dam**. The natural Liapunov functional of the system, given by its free energy, allows for a characterization of the stationary states by its variation. An analog property at the discrete level enables us to preserve stationary states at machine precision while kee** the dissipation of the discrete free energy. These schemes allow for analysing accurately the stability properties of stationary states in challeging problems such as: phase transitions in collective behavior, generalized Euler-Poisson systems in chemotaxis and astrophysics, and models in dynamic density functional theories; having done a careful validation in a battery of relevant test cases.
△ Less
Submitted 4 November, 2020; v1 submitted 3 December, 2018;
originally announced December 2018.
-
ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Authors:
Tuan-Hung Vu,
Himalaya Jain,
Maxime Bucher,
Matthieu Cord,
Patrick Pérez
Abstract:
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe…
▽ More
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging "synthetic-2-real" set-ups and show that the approach can also be used for detection.
△ Less
Submitted 17 April, 2019; v1 submitted 30 November, 2018;
originally announced November 2018.
-
Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision
Authors:
Sanjeel Parekh,
Alexey Ozerov,
Slim Essid,
Ngoc Duong,
Patrick Pérez,
Gaël Richard
Abstract:
We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded…
▽ More
We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Perfectly nested circuits
Authors:
María Carrasco,
Zenaida Castillo,
Nerio Borges,
Ramón Pino Pérez
Abstract:
Nested graphs have been used in different applications, for example to represent knowledge in semantic networks. On the other hand, graphs with cycles are really important in surface reconstruction, periodic schedule and network analysis. Also, of particular interest are the cycle basis, which arise in mathematical and algorithm problems. In this work we develop the concept of perfectly nested eul…
▽ More
Nested graphs have been used in different applications, for example to represent knowledge in semantic networks. On the other hand, graphs with cycles are really important in surface reconstruction, periodic schedule and network analysis. Also, of particular interest are the cycle basis, which arise in mathematical and algorithm problems. In this work we develop the concept of perfectly nested eulerian circuits, exploring some of their properties. The main result establishes an order isomorphism between some sets of perfectly nested circuits and equivalence classes over finite binary sequences.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Leptophobic Dark Matter and the Baryon Number Violation Scale
Authors:
Pavel Fileviez Perez,
Elliot Golias,
Rui-Hao Li,
Clara Murgui
Abstract:
We discuss the possible connection between the scale for baryon number violation and the cosmological bound on the dark matter relic density. A simple gauge theory for baryon number which predicts the existence of a leptophobic cold dark matter particle candidate is investigated. In this context, the dark matter candidate is a Dirac fermion with mass defined by the new symmetry breaking scale. Usi…
▽ More
We discuss the possible connection between the scale for baryon number violation and the cosmological bound on the dark matter relic density. A simple gauge theory for baryon number which predicts the existence of a leptophobic cold dark matter particle candidate is investigated. In this context, the dark matter candidate is a Dirac fermion with mass defined by the new symmetry breaking scale. Using the cosmological bounds on the dark matter relic density we find the upper bound on the symmetry breaking scale around 200 TeV. The properties of the leptophobic dark matter candidate are investigated in great detail and we show the prospects to test this theory at current and future experiments. We discuss the main implications for the mechanisms to explain the matter and antimatter asymmetry in the Universe.
△ Less
Submitted 24 January, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
A novel approach for venue recommendation using cross-domain techniques
Authors:
Pablo Sánchez,
Alejandro Bellogín
Abstract:
Finding the next venue to be visited by a user in a specific city is an interesting, but challenging, problem. Different techniques have been proposed, combining collaborative, content, social, and geographical signals; however it is not trivial to decide which tech- nique works best, since this may depend on the data density or the amount of activity logged for each user or item. At the same time…
▽ More
Finding the next venue to be visited by a user in a specific city is an interesting, but challenging, problem. Different techniques have been proposed, combining collaborative, content, social, and geographical signals; however it is not trivial to decide which tech- nique works best, since this may depend on the data density or the amount of activity logged for each user or item. At the same time, cross-domain strategies have been exploited in the recommender systems literature when dealing with (very) sparse situations, such as those inherently arising when recommendations are produced based on information from a single city.
In this paper, we address the problem of venue recommendation from a novel perspective: applying cross-domain recommenda- tion techniques considering each city as a different domain. We perform an experimental comparison of several recommendation techniques in a temporal split under two conditions: single-domain (only information from the target city is considered) and cross- domain (information from many other cities is incorporated into the recommendation algorithm). For the latter, we have explored two strategies to transfer knowledge from one domain to another: testing the target city and training a model with information of the k cities with more ratings or only using the k closest cities.
Our results show that, in general, applying cross-domain by proximity increases the performance of the majority of the recom- menders in terms of relevance. This is the first work, to the best of our knowledge, where so many domains (eight) are combined in the tourism context where a temporal split is used, and thus we expect these results could provide readers with an overall picture of what can be achieved in a real-world environment.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
Hard X-ray multi-projection imaging for single-shot approaches
Authors:
P. Villanueva-Perez,
B. Pedrini,
R. Mokso,
P. Vagovic,
V. Guzenko,
S. Leake,
P. R. Willmott,
C. David,
H. N. Chapman,
M. Stampanoni
Abstract:
Obtaining 3D information from a single X-ray exposure at high-brilliance sources, such as X-ray free-electron lasers (XFELs) [1] or diffraction-limited storage rings [2], allows the study of fast dynamical processes in their native environment. However, current X-ray 3D methodologies are either not compatible with single-shot approaches because they rely on multiple exposures, such as confocal mic…
▽ More
Obtaining 3D information from a single X-ray exposure at high-brilliance sources, such as X-ray free-electron lasers (XFELs) [1] or diffraction-limited storage rings [2], allows the study of fast dynamical processes in their native environment. However, current X-ray 3D methodologies are either not compatible with single-shot approaches because they rely on multiple exposures, such as confocal microscopy [3, 4] and tomography [5, 6]; or they record a single projection per pulse [7] and are therefore restricted to approximately two-dimensional objects [8]. Here we propose and verify experimentally a novel imaging approach named X-ray multi-projection imaging (XMPI), which simultaneously acquires several projections without rotating the sample at significant tomographic angles. When implemented at high-brilliance sources it can provide volumetric information using a single pulse. Moreover, XMPI at MHz repetition XFELs could allow a way to record 3D movies of deterministic or stochastic natural processes in the micrometer to nanometer resolution range, and at time scales from microseconds down to femtoseconds.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
The valuative tree is the projective limit of Eggers-Wall trees
Authors:
Evelia R. García Barroso,
Pedro D. González Pérez,
Patrick Popescu-Pampu
Abstract:
Consider a germ $C$ of reduced curve on a smooth germ $S$ of complex analytic surface. Assume that $C$ contains a smooth branch $L$. Using the Newton-Puiseux series of $C$ relative to any coordinate system $(x,y)$ on $S$ such that $L$ is the $y$-axis, one may define the {\em Eggers-Wall tree} $Θ_L(C)$ of $C$ relative to $L$. Its ends are labeled by the branches of $C$ and it is endowed with three…
▽ More
Consider a germ $C$ of reduced curve on a smooth germ $S$ of complex analytic surface. Assume that $C$ contains a smooth branch $L$. Using the Newton-Puiseux series of $C$ relative to any coordinate system $(x,y)$ on $S$ such that $L$ is the $y$-axis, one may define the {\em Eggers-Wall tree} $Θ_L(C)$ of $C$ relative to $L$. Its ends are labeled by the branches of $C$ and it is endowed with three natural functions measuring the characteristic exponents of the previous Newton-Puiseux series, their denominators and contact orders. The main objective of this paper is to embed canonically $Θ_L(C)$ into Favre and Jonsson's valuative tree $\mathbb{P}(\mathcal{V})$ of real-valued semivaluations of $S$ up to scalar multiplication, and to show that this embedding identifies the three natural functions on $Θ_L(C)$ as pullbacks of other naturally defined functions on $\mathbb{P}(\mathcal{V})$. As a consequence, we prove an inversion theorem generalizing the well-known Abhyankar-Zariski inversion theorem concerning one branch: if $L'$ is a second smooth branch of $C$, then the valuative embeddings of the Eggers-Wall trees $Θ_{L'}(C)$ and $Θ_L(C)$ identify them canonically, their associated triples of functions being easily expressible in terms of each other. We prove also that the space $\mathbb{P}(\mathcal{V})$ is the projective limit of Eggers-Wall trees over all choices of curves $C$. As a supplementary result, we explain how to pass from $Θ_L(C)$ to an associated splice diagram.
△ Less
Submitted 8 July, 2018;
originally announced July 2018.
-
A Flexible Convolutional Solver with Application to Photorealistic Style Transfer
Authors:
Gilles Puy,
Patrick Pérez
Abstract:
We propose a new flexible deep convolutional neural network (convnet) to perform fast visual style transfer. In contrast to existing convnets that address the same task, our architecture derives directly from the structure of the gradient descent originally used to solve the style transfer problem [Gatys et al., 2016]. Like existing convnets, ours approximately solves the original problem much fas…
▽ More
We propose a new flexible deep convolutional neural network (convnet) to perform fast visual style transfer. In contrast to existing convnets that address the same task, our architecture derives directly from the structure of the gradient descent originally used to solve the style transfer problem [Gatys et al., 2016]. Like existing convnets, ours approximately solves the original problem much faster than the gradient descent. However, our network is uniquely flexible by design: it can be manipulated at runtime to enforce new constraints on the final solution. In particular, we show how to modify it to obtain a photorealistic result with no retraining. We study the modifications made by [Luan et al., 2017] to the original cost function of [Gatys et al., 2016] to achieve photorealistic style transfer. These modifications affect directly the gradient descent and can be reported on-the-fly in our network. These modifications are possible as the proposed architecture stems from unrolling the gradient descent.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Deep Video Portraits
Authors:
Hyeongwoo Kim,
Pablo Garrido,
Ayush Tewari,
Weipeng Xu,
Justus Thies,
Matthias Nießner,
Patrick Pérez,
Christian Richardt,
Michael Zollhöfer,
Christian Theobalt
Abstract:
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core o…
▽ More
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network -- thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.
-
Resource allocation under uncertainty: an algebraic and qualitative treatment
Authors:
Franklin Camacho,
Gerardo Chacón,
Ramón Pino Peréz
Abstract:
We use an algebraic viewpoint, namely a matrix framework to deal with the problem of resource allocation under uncertainty in the context of a qualitative approach. Our basic qualitative data are a plausibility relation over the resources, a hierarchical relation over the agents and of course the preference that the agents have over the resources. With this data we propose a qualitative binary rel…
▽ More
We use an algebraic viewpoint, namely a matrix framework to deal with the problem of resource allocation under uncertainty in the context of a qualitative approach. Our basic qualitative data are a plausibility relation over the resources, a hierarchical relation over the agents and of course the preference that the agents have over the resources. With this data we propose a qualitative binary relation $\unrhd$ between allocations such that $\mathcal{F}\unrhd \mathcal{G}$ has the following intended meaning: the allocation $\mathcal{F}$ produces more or equal social welfare than the allocation $\mathcal{G}$. We prove that there is a family of allocations which are maximal with respect to $\unrhd$. We prove also that there is a notion of simple deal such that optimal allocations can be reached by sequences of simple deals. Finally, we introduce some mechanism for discriminating {optimal} allocations.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Seesaw Scale, Unification, and Proton Decay
Authors:
Pavel Fileviez Perez,
Axel Gross,
Clara Murgui
Abstract:
We investigate a simple realistic grand unified theory based on the $SU(5)$ gauge symmetry which predicts an upper bound on the proton decay lifetime for the channels $p \to K^+ \barν$ and $p \to π^+ \barν$, i.e. $τ(p \to K^+ \barν) \lesssim 3.4 \times 10^{35}$ and $τ(p \to π^+ \barν) \lesssim 1.7 \times 10^{34}$ years, respectively. In this context, the neutrino masses are generated through the t…
▽ More
We investigate a simple realistic grand unified theory based on the $SU(5)$ gauge symmetry which predicts an upper bound on the proton decay lifetime for the channels $p \to K^+ \barν$ and $p \to π^+ \barν$, i.e. $τ(p \to K^+ \barν) \lesssim 3.4 \times 10^{35}$ and $τ(p \to π^+ \barν) \lesssim 1.7 \times 10^{34}$ years, respectively. In this context, the neutrino masses are generated through the type I and type III seesaw mechanisms, and one predicts that the field responsible for type III seesaw must be light with a mass below 500 TeV. We discuss the testability of this theory at current and future proton decay experiments.
△ Less
Submitted 9 August, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events
Authors:
Sanjeel Parekh,
Slim Essid,
Alexey Ozerov,
Ngoc Q. K. Duong,
Patrick Pérez,
Gaël Richard
Abstract:
Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is traine…
▽ More
Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is trained using only video-level event labels without any timing information. An important feature of our method is its capacity to learn from unsynchronized audio-visual events. We achieve state-of-the-art results on a large-scale dataset of weakly-labeled audio event videos. Visualizations of localized visual regions and audio segments substantiate our system's efficacy, especially when dealing with noisy situations where modality-specific cues appear asynchronously.
△ Less
Submitted 9 July, 2018; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Learning how to be robust: Deep polynomial regression
Authors:
Juan-Manuel Perez-Rua,
Tomas Crivelli,
Patrick Bouthemy,
Patrick Perez
Abstract:
Polynomial regression is a recurrent problem with a large number of applications. In computer vision it often appears in motion analysis. Whatever the application, standard methods for regression of polynomial models tend to deliver biased results when the input data is heavily contaminated by outliers. Moreover, the problem is even harder when outliers have strong structure. Departing from proble…
▽ More
Polynomial regression is a recurrent problem with a large number of applications. In computer vision it often appears in motion analysis. Whatever the application, standard methods for regression of polynomial models tend to deliver biased results when the input data is heavily contaminated by outliers. Moreover, the problem is even harder when outliers have strong structure. Departing from problem-tailored heuristics for robust estimation of parametric models, we explore deep convolutional neural networks. Our work aims to find a generic approach for training deep regression models without the explicit need of supervised annotation. We bypass the need for a tailored loss function on the regression parameters by attaching to our model a differentiable hard-wired decoder corresponding to the polynomial operation at hand. We demonstrate the value of our findings by comparing with standard robust regression methods. Furthermore, we demonstrate how to use such models for a real computer vision problem, i.e., video stabilization. The qualitative and quantitative experiments show that neural networks are able to learn robustness for general polynomial regression, with results that well overpass scores of traditional robust estimation methods.
△ Less
Submitted 23 May, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.
-
Finding beans in burgers: Deep semantic-visual embedding with localization
Authors:
Martin Engilberge,
Louis Chevallier,
Patrick Pérez,
Matthieu Cord
Abstract:
Several works have proposed to learn a two-path neural network that maps images and texts, respectively, to a same shared Euclidean space where geometry captures useful semantic relationships. Such a multi-modal embedding can be trained and used for various tasks, notably image captioning. In the present work, we introduce a new architecture of this type, with a visual path that leverages recent s…
▽ More
Several works have proposed to learn a two-path neural network that maps images and texts, respectively, to a same shared Euclidean space where geometry captures useful semantic relationships. Such a multi-modal embedding can be trained and used for various tasks, notably image captioning. In the present work, we introduce a new architecture of this type, with a visual path that leverages recent space-aware pooling mechanisms. Combined with a textual path which is jointly trained from scratch, our semantic-visual embedding offers a versatile model. Once trained under the supervision of captioned images, it yields new state-of-the-art performance on cross-modal retrieval. It also allows the localization of new concepts from the embedding space into any input image, delivering state-of-the-art result on the visual grounding of phrases.
△ Less
Submitted 6 April, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Structural inpainting
Authors:
Huy V. Vo,
Ngoc Q. K. Duong,
Patrick Perez
Abstract:
Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional "context encoders" (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In…
▽ More
Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional "context encoders" (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In the present paper we propose to push further this key ability by relying on perceptual reconstruction losses at training time. We show on a wide variety of visual scenes the merit of the approach for structural inpainting, and confirm it through a user study. Combined with the optimization-based refinement of Yang et al. 2016 with neural patches, our context encoder opens up new opportunities for prior-free visual inpainting.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.