Search | arXiv e-print repository

Gravitational reheating in Starobinsky inflation

Authors: Gláuber C. Dorsch, Luiz Miranda, Nelson Yokomizo

Abstract: We investigate the possibility of achieving post-inflationary reheating exclusively through the gravitational interaction in Starobinsky inflation, which itself assumes nothing but gravity. We consider the possibility that the reheating sector couples to gravity via a non-minimal coupling. Our analysis is performed both in a perturbative and in a non-perturbative approach, where particle productio… ▽ More We investigate the possibility of achieving post-inflationary reheating exclusively through the gravitational interaction in Starobinsky inflation, which itself assumes nothing but gravity. We consider the possibility that the reheating sector couples to gravity via a non-minimal coupling. Our analysis is performed both in a perturbative and in a non-perturbative approach, where particle production is computed from Bogoliubov coefficients. Our findings indicate that, for sufficiently large non-minimal coupling ($ξ\gtrsim~\text{few}\times 10$), non-perturbative particle production allows for temperatures of order $10^{12}$ GeV to be reached while also ensuring that the Universe becomes radiation-dominated. This shows that the gravitational interaction could be the sole responsible for reheating the Universe after inflation, without the need to assume other ad hoc inflaton interactions. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 20 pages, 11 figures

arXiv:2404.15988 [pdf, other]

Symmetries in particle physics: from nuclear isospin to the quark model

Authors: Bruno Berganholi, Gláuber C. Dorsch, Beatriz M. D. Sena, Giovanna F. do Valle

Abstract: We present a concise pedagogic introduction to group representation theory motivated by the historical developments surrounding the advent of the Eightfold Way. Abstract definitions of groups and representations are avoided in favour of the physical intuition of symmetries of the nuclear interaction. The concept of nuclear isospin is used as a physical motivation to introduce SU(2) and discuss the… ▽ More We present a concise pedagogic introduction to group representation theory motivated by the historical developments surrounding the advent of the Eightfold Way. Abstract definitions of groups and representations are avoided in favour of the physical intuition of symmetries of the nuclear interaction. The concept of nuclear isospin is used as a physical motivation to introduce SU(2) and discuss the main techniques of representation theory. The discovery of strange particles motivates extending the symmetry group to SU(3), at first in the context of the Sakata model. We highlight the successes in fitting mesons in the SU(3) octet, discuss the drawbacks of the Sakata model for baryonic classifications, and how the Eightfold Way finally led to the quark model. This approach has two major advantages: (i) the main concepts of the theory of Lie groups are introduced and discussed without ever losing touch with its applications in particle physics; (ii) it allows the beginner to study group theory while also becoming acquainted with the historical developments of particle physics that led to the concept of quarks. In particular, in this pedagogical path the quarks appear as yet another class of particles predicted from symmetry principles, rather than being introduced ad hoc for postulating an SU(3) symmetry, as usually done in the literature. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 12 figures, 24 pages

arXiv:2402.00847 [pdf, other]

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Authors: Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman

Abstract: To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulat… ▽ More To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/ △ Less

Submitted 23 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00155 [pdf, other]

doi 10.1140/epjc/s10052-024-12840-4

Vacuum Stability in the one-loop approximation of a 331 Model

Authors: G. C. Dorsch, A. A. Louzi, B. L. Sánchez-Vega, A. Viglioni

Abstract: In this study, we analyze the vacuum stability of the economical 331 model at the one-loop level using the renormalization group equations and a single-scale renormalization method. By integrating these equations, we determine stability conditions up to the Planck scale, incorporating constraints from recent experimental data on new Higgs-like bosons, charged scalars, and charged and neutral gauge… ▽ More In this study, we analyze the vacuum stability of the economical 331 model at the one-loop level using the renormalization group equations and a single-scale renormalization method. By integrating these equations, we determine stability conditions up to the Planck scale, incorporating constraints from recent experimental data on new Higgs-like bosons, charged scalars, and charged and neutral gauge bosons. Our analysis uncovers intriguing relations between the mass of the heaviest scalar and the masses of exotic quarks, in order to ensure stability of the model up to the Planck scale. For the 331 energy scale used in this work, $18$ TeV, we find an upper bound on the heaviest quark mass of the model, which is not so distant from future LHC runs, serving as bounds to be searched. Additionally, we explore relations between the scalar couplings coming stability and perturbativity conditions. These impose unprecedented constraints on the economical 331 model. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Journal ref: Eur. Phys. J. C 84, 471 (2024)

arXiv:2312.02354 [pdf, other]

Bubble wall velocities with an extended fluid Ansatz

Authors: Glauber C. Dorsch, Daniel A. Pinto

Abstract: We compute the terminal bubble wall velocity during a cosmological phase transition by modelling non-equilibrium effects in the plasma with the so-called "extended fluid Ansatz". A $φ^6$ operator is included in the Standard Model effective potential to mimic effects of new physics. Hydrodynamical heating of the plasma ahead of the bubble is taken into account. We find that the inclusion of higher… ▽ More We compute the terminal bubble wall velocity during a cosmological phase transition by modelling non-equilibrium effects in the plasma with the so-called "extended fluid Ansatz". A $φ^6$ operator is included in the Standard Model effective potential to mimic effects of new physics. Hydrodynamical heating of the plasma ahead of the bubble is taken into account. We find that the inclusion of higher order terms in the fluid Ansatz is typically relevant, and may even turn detonation solutions into deflagrations. Our results also corroborate recent findings in the literature that, for a Standard Model particle content in the plasma, only deflagration solutions are viable. However, we also show that this outcome may be altered in a theory with a different particle content. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 22 pages, 8 figures

arXiv:2312.00598 [pdf, other]

Learning from One Continuous Video Stream

Authors: João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

Abstract: We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of str… ▽ More We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of streams and tasks composed from two existing video datasets, plus methodology for performance evaluation that considers both adaptation and generalization. We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation as well as between arbitrary tasks, without ever requiring changes to models and always using the same pixel loss. Equipped with this framework we obtained large single-stream learning gains from pre-training with a novel family of future prediction tasks, found that momentum hurts, and that the pace of weight updates matters. The combination of these insights leads to matching the performance of IID learning with batch size 1, when using the same architecture and without costly replay buffers. △ Less

Submitted 28 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: CVPR camera ready version

arXiv:2308.15975 [pdf, other]

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Authors: Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

Abstract: For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster a… ▽ More For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes. △ Less

Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Project website: https://robotap.github.io

arXiv:2307.06376 [pdf, other]

Probing a Dark Sector with Collider Physics, Direct Detection, and Gravitational Waves

Authors: Giorgio Arcadi, Glauber C. Dorsch, Jacinto P. Neto, Farinaldo S. Queiroz, Y. M. Oviedo-Torres

Abstract: We assess the complementarity between colliders, direct detection searches, and gravitational wave interferometry in probing a scenario of dark matter in the early universe. The model under consideration contains a B-L gauge symmetry and a vector-like fermion which acts as the dark matter candidate. The fermion induces significant a large dark matter-nucleon scattering rate, and the Z' field produ… ▽ More We assess the complementarity between colliders, direct detection searches, and gravitational wave interferometry in probing a scenario of dark matter in the early universe. The model under consideration contains a B-L gauge symmetry and a vector-like fermion which acts as the dark matter candidate. The fermion induces significant a large dark matter-nucleon scattering rate, and the Z' field produces clear dilepton events at colliders. Thus, direct detection experiments and colliders severely constrain the parameter space in which the correct relic density is found in agreement with the data. Nevertheless, little is known about the new scalar responsible for breaking the B-L symmetry. If this breaking occurs via a first-order phase transition at a TeV scale, it could lead to gravitational waves in the mHz frequency range detectable by LISA, DECIGO, and BBO instruments. The spectrum is highly sensitive to properties of the scalar sector and gauge coupling. We show that a possible GW detection, together with information from colliders and direct detection experiments, can simultaneously pinpoint the scalar self-coupling, and narrow down the dark matter mass where a thermal relic is viable. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 9 pages, 6 figures. Comments are welcome

arXiv:2306.08637 [pdf, other]

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Authors: Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

Abstract: We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on loc… ▽ More We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage. △ Less

Submitted 30 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: Published at ICCV 2023

arXiv:2305.13786 [pdf, other]

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Authors: Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, João Carreira

Abstract: We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning… ▽ More We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test △ Less

Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

arXiv:2303.10296 [pdf, other]

doi 10.1590/1806-9126-RBEF-2023-0067

Particle Physics in High School Part II: Nuclear Physics

Authors: Thaisa C. da C. Guio, Gláuber C. Dorsch

Abstract: We present the second part of a series of papers proposing a novel teaching sequence for Particle Physics in high school. The topic of the present work is Nuclear Physics. The goal of the sequence is to approach the subject in a way as to stimulate scientific literacy, from a perspective involving Science, Technology, Society and Environment (STSE). We evaluate the potentialities and effectiveness… ▽ More We present the second part of a series of papers proposing a novel teaching sequence for Particle Physics in high school. The topic of the present work is Nuclear Physics. The goal of the sequence is to approach the subject in a way as to stimulate scientific literacy, from a perspective involving Science, Technology, Society and Environment (STSE). We evaluate the potentialities and effectiveness of the material proposed here, allied to a dialogical approach by the teacher, by analyzing the presence of scientific literacy and engagement indicators during interventions applied to a public school in Espírito Santo, Brazil. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 31 figures, in portuguese

Journal ref: Rev. Bras. Ens. Física vol. 45, e20230067 (2023)

arXiv:2211.03726 [pdf, other]

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Authors: Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Abstract: Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation e… ▽ More Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data. △ Less

Submitted 31 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: Published in NeurIPS Datasets and Benchmarks track, 2022

arXiv:2204.05434 [pdf, other]

Cosmology with the Laser Interferometer Space Antenna

Authors: Pierre Auclair, David Bacon, Tessa Baker, Tiago Barreiro, Nicola Bartolo, Enis Belgacem, Nicola Bellomo, Ido Ben-Dayan, Daniele Bertacca, Marc Besancon, Jose J. Blanco-Pillado, Diego Blas, Guillaume Boileau, Gianluca Calcagni, Robert Caldwell, Chiara Caprini, Carmelita Carbone, Chia-Feng Chang, Hsin-Yu Chen, Nelson Christensen, Sebastien Clesse, Denis Comelli, Giuseppe Congedo, Carlo Contaldi, Marco Crisostomi , et al. (155 additional authors not shown)

Abstract: The Laser Interferometer Space Antenna (LISA) has two scientific objectives of cosmological focus: to probe the expansion rate of the universe, and to understand stochastic gravitational-wave backgrounds and their implications for early universe and particle physics, from the MeV to the Planck scale. However, the range of potential cosmological applications of gravitational wave observations exten… ▽ More The Laser Interferometer Space Antenna (LISA) has two scientific objectives of cosmological focus: to probe the expansion rate of the universe, and to understand stochastic gravitational-wave backgrounds and their implications for early universe and particle physics, from the MeV to the Planck scale. However, the range of potential cosmological applications of gravitational wave observations extends well beyond these two objectives. This publication presents a summary of the state of the art in LISA cosmology, theory and methods, and identifies new opportunities to use gravitational wave observations by LISA to probe the universe. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Report number: LISA CosWG-22-03

arXiv:2203.03570 [pdf, other]

Kubric: A scalable dataset generator

Authors: Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi , et al. (10 additional authors not shown)

Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential… ▽ More Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 21 pages, CVPR2022

arXiv:2201.06722 [pdf, ps, other]

doi 10.1088/1361-6404/ac4645

An introduction to gravitational waves through electrodynamics: a quadrupole comparison

Authors: Glauber C. Dorsch, Lucas E. A. Porto

Abstract: We present a pedagogical introduction to some key computations in gravitational waves via a side-by-side comparison with the quadrupole contribution of electromagnetic radiation. Subtleties involving gauge choices and projections over transverse modes in the tensorial theory are made clearer by direct analogy with the vectorial counterpart. The power emitted by the quadrupole moment in both theori… ▽ More We present a pedagogical introduction to some key computations in gravitational waves via a side-by-side comparison with the quadrupole contribution of electromagnetic radiation. Subtleties involving gauge choices and projections over transverse modes in the tensorial theory are made clearer by direct analogy with the vectorial counterpart. The power emitted by the quadrupole moment in both theories is computed, and the similarities as well as the origins of eventual discrepancies are discussed. Finally, we analyze the stability of bound systems under radiation emission, and discuss how the strength of the interactions can be established this way. We use the results to impose an anthropic bound on Newton's constant of order $G\lesssim 3\times 10^4\, G_\text{obs}$, which is on par with similar constraints from stellar formation. △ Less

Submitted 28 January, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: 26 pages, 4 figures

arXiv:2112.12548 [pdf, other]

doi 10.1088/1475-7516/2022/04/010

A sonic boom in bubble wall friction

Authors: Glauber C. Dorsch, Stephan J. Huber, Thomas Konstandin

Abstract: We revisit the computation of bubble wall friction during a cosmological first-order phase transition, using an extended fluid Ansatz to solve the linearized Boltzmann equation. A singularity is found in the fluctuations of background species as the wall approaches the speed of sound. Using hydrodynamics, we argue that a discontinuity across the speed of sound is expected on general grounds, which… ▽ More We revisit the computation of bubble wall friction during a cosmological first-order phase transition, using an extended fluid Ansatz to solve the linearized Boltzmann equation. A singularity is found in the fluctuations of background species as the wall approaches the speed of sound. Using hydrodynamics, we argue that a discontinuity across the speed of sound is expected on general grounds, which manifests itself as the singularity in the solution of the linearized system. We discuss this result in comparison with alternative approaches proposed recently, which find a regular behaviour of the friction for all velocities. △ Less

Submitted 15 February, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Report number: DESY 21-224

arXiv:2112.03243 [pdf, other]

Input-level Inductive Biases for 3D Reconstruction

Authors: Wang Yifan, Carl Doersch, Relja Arandjelović, João Carreira, Andrew Zisserman

Abstract: Much of the recent progress in 3D vision has been driven by the development of specialized architectures that incorporate geometrical inductive biases. In this paper we tackle 3D reconstruction using a domain agnostic architecture and study how instead to inject the same type of inductive biases directly as extra inputs to the model. This approach makes it possible to apply existing general models… ▽ More Much of the recent progress in 3D vision has been driven by the development of specialized architectures that incorporate geometrical inductive biases. In this paper we tackle 3D reconstruction using a domain agnostic architecture and study how instead to inject the same type of inductive biases directly as extra inputs to the model. This approach makes it possible to apply existing general models, such as Perceivers, on this rich domain, without the need for architectural changes, while simultaneously maintaining data efficiency of bespoke models. In particular we study how to encode cameras, projective ray incidence and epipolar geometry as model inputs, and demonstrate competitive multi-view depth estimation performance on multiple benchmarks. △ Less

Submitted 19 March, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: CVPR 2022, including supplemental material

arXiv:2107.14795 [pdf, other]

Perceiver IO: A General Architecture for Structured Inputs & Outputs

Authors: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joāo Carreira

Abstract: A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f… ▽ More A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence. △ Less

Submitted 15 March, 2022; v1 submitted 30 July, 2021; originally announced July 2021.

Comments: ICLR 2022 camera ready. Code: https://dpmd.ai/perceiver-code

arXiv:2106.14108 [pdf, other]

Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs

Authors: Dan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler

Abstract: Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to… ▽ More Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to play a key role in protein function. Recent methods for capturing conformational heterogeneity in cryo-EM data model it in volume space, making recovery of continuous atomic structures challenging. Here we present a fully deep-learning-based approach using variational auto-encoders (VAEs) to recover a continuous distribution of atomic protein structures and poses directly from picked particle images and demonstrate its efficacy on realistic simulated data. We hope that methods built on this work will allow incorporation of stronger prior information about protein structure and enable better understanding of non-rigid protein structures. △ Less

Submitted 26 June, 2021; originally announced June 2021.

arXiv:2106.06547 [pdf, other]

doi 10.1088/1475-7516/2021/07/020

On the wall velocity dependence of electroweak baryogenesis

Authors: Glauber C. Dorsch, Stephan J. Huber, Thomas Konstandin

Abstract: We re-evaluate the status of supersonic electroweak baryogenesis using a generalized fluid Ansatz for the non-equilibrium distribution functions. Instead of truncating the expansion to first order in momentum, we allow for higher order terms as well, including up to 21 fluctuations. The collision terms are computed analytically at leading-log accuracy. We also point out inconsistencies in the stan… ▽ More We re-evaluate the status of supersonic electroweak baryogenesis using a generalized fluid Ansatz for the non-equilibrium distribution functions. Instead of truncating the expansion to first order in momentum, we allow for higher order terms as well, including up to 21 fluctuations. The collision terms are computed analytically at leading-log accuracy. We also point out inconsistencies in the standard treatments of transport in electroweak baryogenesis, arguing that one cannot do without specifying an Ansatz for the distribution function. We present the first analysis of baryogenesis using the fluid approximation to higher orders. Our results support the recent findings that baryogenesis may indeed be possible even in the presence of supersonic wall velocities. △ Less

Submitted 3 December, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Report number: DESY 21-089

arXiv:2103.04946 [pdf, other]

doi 10.1590/1806-9126-RBEF-2021-0083

Particle Physics in High School Part I: Quantum Electrodynamics

Authors: Glauber Carvalho Dorsch, Thaisa Carneiro da Cunha Guio

Abstract: This is the first paper of a series aiming to present a new teaching sequence for Particle Physics in high school. We propose a systematic discussion of the subject, covering not only the understanding of its key concepts, but also of the very nature of Science, of the factors that surround its practice and of its relations to technology, society and the environment, in a framework towards scienti… ▽ More This is the first paper of a series aiming to present a new teaching sequence for Particle Physics in high school. We propose a systematic discussion of the subject, covering not only the understanding of its key concepts, but also of the very nature of Science, of the factors that surround its practice and of its relations to technology, society and the environment, in a framework towards scientific literacy. In this work we present the first part of this teaching sequence, focused on the quantum theory of electromagnetism: Quantum Electrodynamics. We analyze the potentialities of the material proposed here, when applied in a dialogic manner by the teacher, by evaluating the presence of scientific literacy and engagement indicators throughout interventions at a full-time public school in Espirito Santo, Brazil. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 32 figures, in Portuguese

Journal ref: Rev. Bras. Ens. Física vol. 43, e20210083 (2021)

arXiv:2007.11498 [pdf, other]

CrossTransformers: spatially-aware few-shot transfer

Authors: Carl Doersch, Ankush Gupta, Andrew Zisserman

Abstract: Given new tasks with very little data$-$such as new classes in a classification problem or a domain shift in the input$-$performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing t… ▽ More Given new tasks with very little data$-$such as new classes in a classification problem or a domain shift in the input$-$performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets. △ Less

Submitted 17 February, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: Published at NeurIPS 2020. Code/checkpoints: https://github.com/google-research/meta-dataset

arXiv:2006.07733 [pdf, other]

Bootstrap your own latent: A new approach to self-supervised Learning

Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the… ▽ More We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub. △ Less

Submitted 10 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:1910.13125 [pdf, other]

doi 10.1088/1475-7516/2020/03/024

Detecting gravitational waves from cosmological phase transitions with LISA: an update

Authors: Chiara Caprini, Mikael Chala, Glauber C. Dorsch, Mark Hindmarsh, Stephan J. Huber, Thomas Konstandin, Jonathan Kozaczuk, Germano Nardini, Jose Miguel No, Kari Rummukainen, Pedro Schwaller, Geraldine Servant, Anders Tranberg, David J. Weir

Abstract: We investigate the potential for observing gravitational waves from cosmological phase transitions with LISA in light of recent theoretical and experimental developments. Our analysis is based on current state-of-the-art simulations of sound waves in the cosmic fluid after the phase transition completes. We discuss the various sources of gravitational radiation, the underlying parameters describin… ▽ More We investigate the potential for observing gravitational waves from cosmological phase transitions with LISA in light of recent theoretical and experimental developments. Our analysis is based on current state-of-the-art simulations of sound waves in the cosmic fluid after the phase transition completes. We discuss the various sources of gravitational radiation, the underlying parameters describing the phase transition and a variety of viable particle physics models in this context, clarifying common misconceptions that appear in the literature and identifying open questions requiring future study. We also present a web-based tool, PTPlot, that allows users to obtain up-to-date detection prospects for a given set of phase transition parameters at LISA. △ Less

Submitted 12 January, 2021; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: 68 pages, 16 figures, some comments added, typos fixed

arXiv:1907.02499 [pdf, other]

Sim2real transfer learning for 3D human pose estimation: motion to the rescue

Authors: Carl Doersch, Andrew Zisserman

Abstract: Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given re… ▽ More Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset. △ Less

Submitted 14 November, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

Comments: Accepted at NeurIPS 2019

arXiv:1905.09272 [pdf, other]

Data-Efficient Image Recognition with Contrastive Predictive Coding

Authors: Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

Abstract: Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning suc… ▽ More Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations. This new implementation produces features which support state-of-the-art linear classification accuracy on the ImageNet dataset. When used as input for non-linear classification with deep neural networks, this representation allows us to use 2-5x less labels than classifiers trained directly on image pixels. Finally, this unsupervised representation substantially improves transfer learning to object detection on the PASCAL VOC dataset, surpassing fully supervised pre-trained ImageNet classifiers. △ Less

Submitted 1 July, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1905.04266 [pdf, other]

Exploiting temporal context for 3D human pose estimation in the wild

Authors: Anurag Arnab, Carl Doersch, Andrew Zisserman

Abstract: We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change an… ▽ More We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: CVPR 2019

arXiv:1904.03177 [pdf, other]

Structured agents for physical construction

Authors: Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick

Abstract: Physical construction---the ability to compose objects, subject to physical dynamics, to serve some function---is fundamental to human intelligence. We introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking blocks to connect objects together, and creating shelter-like structures over target objects.… ▽ More Physical construction---the ability to compose objects, subject to physical dynamics, to serve some function---is fundamental to human intelligence. We introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking blocks to connect objects together, and creating shelter-like structures over target objects. We examine how a range of deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance. Our results show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes. Model-based agents which use Monte-Carlo Tree Search also outperform strictly model-free agents in our most challenging construction problems. We conclude that approaches which combine structured representations and reasoning with powerful learning are a key path toward agents that possess rich intuitive physics, scene understanding, and planning. △ Less

Submitted 13 May, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: ICML 2019

arXiv:1812.02707 [pdf, other]

Video Action Transformer Network

Authors: Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman

Abstract: We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people… ▽ More We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others. Additionally its attention mechanism learns to emphasize hands and faces, which are often crucial to discriminate an action - all without explicit supervision other than boxes and class labels. We train and test our Action Transformer network on the Atomic Visual Actions (AVA) dataset, outperforming the state-of-the-art by a significant margin using only raw RGB frames as input. △ Less

Submitted 17 May, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: CVPR 2019

arXiv:1809.04907 [pdf, other]

doi 10.1088/1475-7516/2018/12/034

Bubble wall velocities in the Standard Model and beyond

Authors: Glauber C. Dorsch, Stephan J. Huber, Thomas Konstandin

Abstract: We present results for the bubble wall velocity and bubble wall thickness during a cosmological first-order phase transition in a condensed form. Our results are for minimal extensions of the Standard Model but in principle are applicable to a much broader class of settings. Our first assumption about the model is that only the electroweak Higgs is obtaining a vacuum expectation value during the p… ▽ More We present results for the bubble wall velocity and bubble wall thickness during a cosmological first-order phase transition in a condensed form. Our results are for minimal extensions of the Standard Model but in principle are applicable to a much broader class of settings. Our first assumption about the model is that only the electroweak Higgs is obtaining a vacuum expectation value during the phase transition. The second is that most of the friction is produced by electroweak gauge bosons and top quarks. Under these assumptions the bubble wall velocity and thickness can be deduced as a function of two equilibrium properties of the plasma: the strength of the phase transition and the pressure difference along the bubble wall. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: 16 pages, 7 figures

Report number: DESY 18-162

arXiv:1809.04482 [pdf, other]

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Authors: Mateusz Malinowski, Carl Doersch

Abstract: Visual QA is a pivotal challenge for higher-level reasoning, requiring understanding language, vision, and relationships between many objects in a scene. Although datasets like CLEVR are designed to be unsolvable without such complex relational reasoning, some surprisingly simple feed-forward, "holistic" models have recently shown strong performance on this dataset. These models lack any kind of e… ▽ More Visual QA is a pivotal challenge for higher-level reasoning, requiring understanding language, vision, and relationships between many objects in a scene. Although datasets like CLEVR are designed to be unsolvable without such complex relational reasoning, some surprisingly simple feed-forward, "holistic" models have recently shown strong performance on this dataset. These models lack any kind of explicit iterative, symbolic reasoning procedure, which are hypothesized to be necessary for counting objects, narrowing down the set of relevant objects based on several attributes, etc. The reason for this strong performance is poorly understood. Hence, our work analyzes such models, and finds that minor architectural elements are crucial to performance. In particular, we find that \textit{early fusion} of language and vision provides large performance improvements. This contrasts with the late fusion approaches popular at the dawn of Visual QA. We propose a simple module we call Multimodal Core, which we hypothesize performs the fundamental operations for multimodal tasks. We believe that understanding why these elements are so important to complex question answering will aid the design of better-performing algorithms for Visual QA while minimizing hand-engineering effort. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: Presented at ECCV'18 Workshop on Shortcomings in Vision and Language

arXiv:1808.00300 [pdf, other]

Learning Visual Question Answering by Bootstrap** Hard Attention

Authors: Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter Battaglia

Abstract: Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is… ▽ More Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: ECCV 2018

arXiv:1807.10066 [pdf, other]

A Better Baseline for AVA

Authors: Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman

Abstract: We introduce a simple baseline for action localization on the AVA dataset. The model builds upon the Faster R-CNN bounding box detection framework, adapted to operate on pure spatiotemporal features - in our case produced exclusively by an I3D model pretrained on Kinetics. This model obtains 21.9% average AP on the validation set of AVA v2.1, up from 14.5% for the best RGB spatiotemporal model use… ▽ More We introduce a simple baseline for action localization on the AVA dataset. The model builds upon the Faster R-CNN bounding box detection framework, adapted to operate on pure spatiotemporal features - in our case produced exclusively by an I3D model pretrained on Kinetics. This model obtains 21.9% average AP on the validation set of AVA v2.1, up from 14.5% for the best RGB spatiotemporal model used in the original AVA paper (which was pretrained on Kinetics and ImageNet), and up from 11.3 of the publicly available baseline using a ResNet101 image feature extractor, that was pretrained on ImageNet. Our final model obtains 22.8%/21.9% mAP on the val/test sets and outperforms all submissions to the AVA challenge at CVPR 2018. △ Less

Submitted 26 July, 2018; originally announced July 2018.

Comments: ActivityNet Workshop (AVA Challenge), CVPR 2018

arXiv:1803.03835 [pdf, other]

Kickstarting Deep Reinforcement Learning

Authors: Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

Abstract: We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a c… ▽ More We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design. We also show that the same kickstarting pipeline can allow a single student agent to leverage multiple 'expert' teachers which specialize on individual tasks. In this setting kickstarting yields surprisingly large gains, with the kickstarted agent matching the performance of an agent trained from scratch in almost 10x fewer steps, and surpassing its final performance by 42 percent. Kickstarting is conceptually simple and can easily be incorporated into reinforcement learning experiments. △ Less

Submitted 10 March, 2018; originally announced March 2018.

arXiv:1708.07860 [pdf, other]

Multi-task Self-Supervised Visual Learning

Authors: Carl Doersch, Andrew Zisserman

Abstract: We investigate methods for combining multiple self-supervised tasks--i.e., supervised tasks where data can be collected without manual labeling--in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso… ▽ More We investigate methods for combining multiple self-supervised tasks--i.e., supervised tasks where data can be collected without manual labeling--in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "harmonizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks--even via a naive multi-head architecture--always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction. △ Less

Submitted 25 August, 2017; originally announced August 2017.

Comments: Published at ICCV 2017

arXiv:1705.09186 [pdf, other]

doi 10.1007/JHEP12(2017)086

The Higgs Vacuum Uplifted: Revisiting the Electroweak Phase Transition with a Second Higgs Doublet

Authors: G. C. Dorsch, S. J. Huber, K. Mimasu, J. M. No

Abstract: The existence of a second Higgs doublet in Nature could lead to a cosmological first order electroweak phase transition and explain the origin of the matter-antimatter asymmetry in the Universe. We explore the parameter space of such a two-Higgs-doublet-model and show that a first order electroweak phase transition strongly correlates with a significant uplifting of the Higgs vacuum w.r.t. its Sta… ▽ More The existence of a second Higgs doublet in Nature could lead to a cosmological first order electroweak phase transition and explain the origin of the matter-antimatter asymmetry in the Universe. We explore the parameter space of such a two-Higgs-doublet-model and show that a first order electroweak phase transition strongly correlates with a significant uplifting of the Higgs vacuum w.r.t. its Standard Model value. We then obtain the spectrum and properties of the new scalars $H_0$, $A_0$ and $H^{\pm}$ that signal such a phase transition, showing that the decay $A_0 \rightarrow H_0 Z$ at the LHC and a sizable deviation in the Higgs self-coupling $λ_{hhh}$ from its SM value are sensitive indicators of a strongly first order electroweak phase transition in the 2HDM. △ Less

Submitted 25 May, 2017; originally announced May 2017.

Report number: CP3-17-15, DESY 17-076, KCL-PH-TH/2017-27

arXiv:1611.05874 [pdf, other]

doi 10.1088/1475-7516/2017/05/052

A Second Higgs Doublet in the Early Universe: Baryogenesis and Gravitational Waves

Authors: G. C. Dorsch, S. J. Huber, T. Konstandin, J. M. No

Abstract: We show that simple Two Higgs Doublet models still provide a viable explanation for the matter-antimatter asymmetry of the Universe via electroweak baryogenesis, even after taking into account the recent order-of-magnitude improvement on the electron-EDM experimental bound by the ACME Collaboration. Moreover we show that, in the region of parameter space where baryogenesis is possible, the gravita… ▽ More We show that simple Two Higgs Doublet models still provide a viable explanation for the matter-antimatter asymmetry of the Universe via electroweak baryogenesis, even after taking into account the recent order-of-magnitude improvement on the electron-EDM experimental bound by the ACME Collaboration. Moreover we show that, in the region of parameter space where baryogenesis is possible, the gravitational wave spectrum generated at the end of the electroweak phase transition is within the sensitivity reach of the future space-based interferometer LISA. △ Less

Submitted 17 November, 2016; originally announced November 2016.

Comments: 20 pages, 4 figures

arXiv:1606.07873 [pdf, other]

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

Authors: Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert

Abstract: In a given scene, humans can often easily predict a set of immediate future events that might happen. However, generalized pixel-level anticipation in computer vision systems is difficult because machine learning struggles with the ambiguity inherent in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene, specifically what will move in the scene,… ▽ More In a given scene, humans can often easily predict a set of immediate future events that might happen. However, generalized pixel-level anticipation in computer vision systems is difficult because machine learning struggles with the ambiguity inherent in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene, specifically what will move in the scene, where it will travel, and how it will deform over the course of one second. We propose a conditional variational autoencoder as a solution to this problem. In this framework, direct inference from the image shapes the distribution of possible trajectories, while latent variables encode any necessary information that is not available in the image. We show that our method is able to successfully predict events in a wide variety of scenes and can produce multiple different predictions when the future is ambiguous. Our algorithm is trained on thousands of diverse, realistic videos and requires absolutely no human labeling. In addition to non-semantic action prediction, we find that our method learns a representation that is applicable to semantic vision tasks. △ Less

Submitted 25 June, 2016; originally announced June 2016.

arXiv:1606.05908 [pdf, other]

Tutorial on Variational Autoencoders

Authors: Carl Doersch

Abstract: In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, includi… ▽ More In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits, faces, house numbers, CIFAR images, physical models of scenes, segmentation, and predicting the future from static images. This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior. No prior knowledge of variational Bayesian methods is assumed. △ Less

Submitted 3 January, 2021; v1 submitted 19 June, 2016; originally announced June 2016.

arXiv:1601.04545 [pdf, other]

doi 10.1103/PhysRevD.93.115033

Hierarchical vs Degenerate 2HDM: The LHC Run 1 Legacy at the Onset of Run 2

Authors: G. C. Dorsch, S. J. Huber, K. Mimasu, J. M. No

Abstract: Current discussions of the allowed two-Higgs-doublet model (2HDM) parameter space after LHC Run 1 and the prospects for Run 2 are commonly phrased in the context of a quasi-degenerate spectrum for the new scalars. Here we discuss the generic situation of a 2HDM with a non-degenerate spectrum for the new scalars. This is highly motivated from a cosmological perspective since it naturally leads to a… ▽ More Current discussions of the allowed two-Higgs-doublet model (2HDM) parameter space after LHC Run 1 and the prospects for Run 2 are commonly phrased in the context of a quasi-degenerate spectrum for the new scalars. Here we discuss the generic situation of a 2HDM with a non-degenerate spectrum for the new scalars. This is highly motivated from a cosmological perspective since it naturally leads to a strongly first order electroweak phase transition that could explain the matter-antimatter asymmetry in the Universe. While constraints from measurements of Higgs signal strengths do not change, those from searches of new scalar states get modified dramatically once a non-degenerate spectrum is considered. △ Less

Submitted 18 January, 2016; originally announced January 2016.

Comments: 16 pages, 10 figures

Report number: DESY 15-240

Journal ref: Phys. Rev. D 93, 115033 (2016)

arXiv:1511.06856 [pdf, other]

Data-dependent Initializations of Convolutional Neural Networks

Authors: Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell

Abstract: Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable. Despite this, few researchers dare to train their models from scratch. Most work builds on one of a handful of ImageNet pre-trained models, and fine-tunes or adapts these for specific tasks. This is in large part due to the difficulty of properly initializing these networks f… ▽ More Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable. Despite this, few researchers dare to train their models from scratch. Most work builds on one of a handful of ImageNet pre-trained models, and fine-tunes or adapts these for specific tasks. This is in large part due to the difficulty of properly initializing these networks from scratch. A small miscalibration of the initial weights leads to vanishing or exploding gradients, as well as poor convergence properties. In this work we present a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Our initialization matches the current state-of-the-art unsupervised or self-supervised pre-training methods on standard computer vision tasks, such as image classification and object detection, while being roughly three orders of magnitude faster. When combined with pre-training methods, our initialization significantly outperforms prior work, narrowing the gap between supervised and unsupervised pre-training. △ Less

Submitted 22 September, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

Comments: ICLR 2016

arXiv:1511.01689 [pdf, other]

More Higgses at the LHC and the Electroweak Phase Transition

Authors: G. C. Dorsch, S. J. Huber, K. Mimasu, J. M. No

Abstract: A cosmological first order electroweak phase transition could explain the origin of the cosmic matter-antimatter asymmetry. While it does not occur in the Standard Model, it becomes possible in the presence of a second Higgs doublet. In this context, we obtain the properties of the new scalars $H_0$, $A_0$ and $H^{\pm}$ leading to such a phase transition, showing that its key LHC signature would b… ▽ More A cosmological first order electroweak phase transition could explain the origin of the cosmic matter-antimatter asymmetry. While it does not occur in the Standard Model, it becomes possible in the presence of a second Higgs doublet. In this context, we obtain the properties of the new scalars $H_0$, $A_0$ and $H^{\pm}$ leading to such a phase transition, showing that its key LHC signature would be the decay $A_0 \rightarrow H_0 Z$, and we analyze the promising LHC search prospects for this decay in the $\ell \ell b\bar{b}$ and $\ell \ell W^{+} W^{-}$ final states. Finally, we comment on the impact of the $A_0 \rightarrow H_0 Z$ decay on current LHC searches for $A_0$ decaying into SM particles. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Comments: 4 pages, 3 figures, talk presented at the XXVIIth Rencontres de Blois, France, 31 May - 5 June 2015

arXiv:1505.05192 [pdf, other]

Unsupervised Visual Representation Learning by Context Prediction

Authors: Carl Doersch, Abhinav Gupta, Alexei A. Efros

Abstract: This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the mode… ▽ More This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations. △ Less

Submitted 16 January, 2016; v1 submitted 19 May, 2015; originally announced May 2015.

Comments: Oral paper at ICCV 2015

arXiv:1504.07284 [pdf, other]

Mid-level Elements for Object Detection

Authors: Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta

Abstract: Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data). Through extensive experiments and ablation analysis, we show how our approach effectively improves upon the HOG-based pipelines by adding… ▽ More Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data). Through extensive experiments and ablation analysis, we show how our approach effectively improves upon the HOG-based pipelines by adding an intermediate mid-level representation for the task of object detection. This representation is easily interpretable and allows us to visualize what our object detector "sees". We also discuss the insights our approach shares with CNN-based methods, such as sharing representation between categories helps. △ Less

Submitted 27 April, 2015; originally announced April 2015.

arXiv:1405.5537 [pdf, ps, other]

doi 10.1103/PhysRevLett.113.211802

Echoes of the Electroweak Phase Transition: Discovering a second Higgs doublet through $A_0 \rightarrow H_0 Z$

Authors: G. C. Dorsch, S. Huber, K. Mimasu, J. M. No

Abstract: The existence of a second Higgs doublet in Nature could lead to a cosmological first order electroweak phase transition and explain the origin of the matter-antimatter asymmetry in the Universe. We obtain the spectrum and properties of the new scalars $H_0$, $A_0$ and $H^{\pm}$ that signal such a phase transition, and show that the observation of the decay $A_0 \rightarrow H_0 Z$ at LHC would be a… ▽ More The existence of a second Higgs doublet in Nature could lead to a cosmological first order electroweak phase transition and explain the origin of the matter-antimatter asymmetry in the Universe. We obtain the spectrum and properties of the new scalars $H_0$, $A_0$ and $H^{\pm}$ that signal such a phase transition, and show that the observation of the decay $A_0 \rightarrow H_0 Z$ at LHC would be a `smoking gun' signature of these scenarios. We analyze the LHC search prospects for this decay in the $\ell \ell b\bar{b}$ and $\ell \ell W^{+} W^{-}$ final states, arguing that current data may be sensitive to this signature in the former channel as well as there being great potential for a discovery in either one at the very early stages of the 14 TeV run. △ Less

Submitted 21 May, 2014; originally announced May 2014.

Comments: 6 pages, 5 figures

Journal ref: Phys. Rev. Lett. 113, 211802 (2014)

arXiv:1403.5583 [pdf, ps, other]

doi 10.1103/PhysRevLett.113.121801

Cosmological Signatures of a UV-Conformal Standard Model

Authors: Glauber C. Dorsch, Stephan J. Huber, Jose Miguel No

Abstract: Quantum scale invariance in the UV has been recently advocated as an attractive way of solving the gauge hierarchy problem arising in the Standard Model. We explore the cosmological signatures at the electroweak scale when the breaking of scale invariance originates from a hidden sector and is mediated to the Standard Model by gauge interactions (Gauge Mediation). These scenarios, while being hard… ▽ More Quantum scale invariance in the UV has been recently advocated as an attractive way of solving the gauge hierarchy problem arising in the Standard Model. We explore the cosmological signatures at the electroweak scale when the breaking of scale invariance originates from a hidden sector and is mediated to the Standard Model by gauge interactions (Gauge Mediation). These scenarios, while being hard to distinguish from the Standard Model at LHC, can give rise to a strong electroweak phase transition leading to the generation of a large stochastic gravitational wave background in possible reach of future space-based detectors such as eLISA and BBO. This relic would be the cosmological imprint of the breaking of scale invariance in Nature. △ Less

Submitted 21 March, 2014; originally announced March 2014.

Journal ref: Phys. Rev. Lett. 113, 121801 (2014)

arXiv:1305.6610 [pdf, ps, other]

doi 10.1007/JHEP10(2013)029

A strong electroweak phase transition in the 2HDM after LHC8

Authors: G. C. Dorsch, S. J. Huber, J. M. No

Abstract: The nature of the electroweak phase transition in two-Higgs-doublet models is revisited in light of the recent LHC results. A scan over an extensive region of their parameter space is performed, showing that a strongly first-order phase transition favours a light neutral scalar with SM-like properties, together with a heavy pseudo-scalar (m_A^0 > 400 GeV) and a mass hierarchy in the scalar sector,… ▽ More The nature of the electroweak phase transition in two-Higgs-doublet models is revisited in light of the recent LHC results. A scan over an extensive region of their parameter space is performed, showing that a strongly first-order phase transition favours a light neutral scalar with SM-like properties, together with a heavy pseudo-scalar (m_A^0 > 400 GeV) and a mass hierarchy in the scalar sector, m_H^+ < m_H^0 < m_A^0. We also investigate the h^0 -> gamma gamma decay channel and find that an enhancement in the branching ratio is allowed, and in some cases even preferred, when a strongly first-order phase transition is required. △ Less

Submitted 29 October, 2013; v1 submitted 28 May, 2013; originally announced May 2013.

Showing 1–47 of 47 results for author: Doersch, C