-
A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems
Authors:
Alexandre Duval,
Simon V. Mathis,
Chaitanya K. Joshi,
Victor Schmidt,
Santiago Miret,
Fragkiskos D. Malliaros,
Taco Cohen,
Pietro Liò,
Yoshua Bengio,
Michael Bronstein
Abstract:
Recent advances in computational modelling of atomic systems, spanning molecules, proteins, and materials, represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space. In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations.…
▽ More
Recent advances in computational modelling of atomic systems, spanning molecules, proteins, and materials, represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space. In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations. In recent years, Geometric Graph Neural Networks have emerged as the preferred machine learning architecture powering applications ranging from protein structure prediction to molecular simulations and material generation. Their specificity lies in the inductive biases they leverage - such as physical symmetries and chemical properties - to learn informative representations of these geometric graphs.
In this opinionated paper, we provide a comprehensive and self-contained overview of the field of Geometric GNNs for 3D atomic systems. We cover fundamental background material and introduce a pedagogical taxonomy of Geometric GNN architectures: (1) invariant networks, (2) equivariant networks in Cartesian basis, (3) equivariant networks in spherical basis, and (4) unconstrained networks. Additionally, we outline key datasets and application areas and suggest future research directions. The objective of this work is to present a structured perspective on the field, making it accessible to newcomers and aiding practitioners in gaining an intuition for its mathematical abstractions.
△ Less
Submitted 13 March, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Determination of droplet size from wide-angle light scattering image data using convolutional neural networks
Authors:
Tom Kirstein,
Simon Aßmann,
Orkun Furat,
Stefan Will,
Volker Schmidt
Abstract:
Wide-angle light scattering (WALS) offers the possibility of a highly temporally and spatially resolved measurement of droplets in spray-based methods for nanoparticle synthesis. The size of these droplets is a critical variable affecting the final properties of synthesized materials such as hetero-aggregates. However, conventional methods for determining droplet sizes from WALS image data are lab…
▽ More
Wide-angle light scattering (WALS) offers the possibility of a highly temporally and spatially resolved measurement of droplets in spray-based methods for nanoparticle synthesis. The size of these droplets is a critical variable affecting the final properties of synthesized materials such as hetero-aggregates. However, conventional methods for determining droplet sizes from WALS image data are labor-intensive and may introduce biases, particularly when applied to complex systems like spray flame synthesis (SFS). To address these challenges, we introduce a fully automatic machine learning-based approach that employs convolutional neural networks (CNNs) in order to streamline the droplet sizing process. This CNN-based methodology offers further advantages: it requires few manual labels and can utilize transfer learning, making it a promising alternative to conventional methods, specifically with respect to efficiency. To evaluate the performance of our machine learning models, we consider WALS data from an ethanol spray flame process at various heights above the burner surface (HABs), where the models are trained and cross-validated on a large dataset comprising nearly 35000 WALS images.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Using convolutional neural networks for stereological characterization of 3D hetero-aggregates based on synthetic STEM data
Authors:
Lukas Fuchs,
Tom Kirstein,
Christoph Mahr,
Orkun Furat,
Valentin Baric,
Andreas Rosenauer,
Lutz Maedler,
Volker Schmidt
Abstract:
The structural characterization of hetero-aggregates in 3D is of great interest, e.g., for deriving process-structure or structure-property relationships. However, since 3D imaging techniques are often difficult to perform as well as time and cost intensive, a characterization of hetero-aggregates based on 2D image data is desirable, but often non-trivial. To overcome the issues of characterizing…
▽ More
The structural characterization of hetero-aggregates in 3D is of great interest, e.g., for deriving process-structure or structure-property relationships. However, since 3D imaging techniques are often difficult to perform as well as time and cost intensive, a characterization of hetero-aggregates based on 2D image data is desirable, but often non-trivial. To overcome the issues of characterizing 3D structures from 2D measurements, a method is presented that relies on machine learning combined with methods of spatial stochastic modeling, where the latter are utilized for the generation of synthetic training data. This kind of training data has the advantage that time-consuming experiments for the synthesis of differently structured materials followed by their 3D imaging can be avoided. More precisely, a parametric stochastic 3D model is presented, from which a wide spectrum of virtual hetero-aggregates can be generated. Additionally, the virtual structures are passed to a physics-based simulation tool in order to generate virtual scanning transmission electron microscopy (STEM) images. The preset parameters of the 3D model together with the simulated STEM images serve as a database for the training of convolutional neural networks, which can be used to determine the parameters of the underlying 3D model and, consequently, to predict 3D structures of hetero-aggregates from 2D STEM images. Furthermore, an error analysis is performed to evaluate the prediction power of the trained neural networks with respect to structural descriptors, e.g. the hetero-coordination number.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions
Authors:
Alvaro Carbonero,
Alexandre Duval,
Victor Schmidt,
Santiago Miret,
Alex Hernandez-Garcia,
Yoshua Bengio,
David Rolnick
Abstract:
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorporate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predic…
▽ More
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorporate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Crystal-GFN: sampling crystals with desirable properties and constraints
Authors:
Mila AI4Science,
Alex Hernandez-Garcia,
Alexandre Duval,
Alexandra Volokhova,
Yoshua Bengio,
Divya Sharma,
Pierre Luc Carrier,
Yasmine Benabed,
Michał Koziarski,
Victor Schmidt
Abstract:
Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal struct…
▽ More
Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
△ Less
Submitted 13 December, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
torchgfn: A PyTorch GFlowNet library
Authors:
Salem Lahlou,
Joseph D. Viviano,
Victor Schmidt,
Yoshua Bengio
Abstract:
The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments. torchgfn is a PyTorch library that aims to address this n…
▽ More
The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments. torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses. Multiple examples are provided, replicating and unifying published results. The code is available in https://github.com/saleml/torchgfn.
△ Less
Submitted 29 August, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
FAENet: Frame Averaging Equivariant GNN for Materials Modeling
Authors:
Alexandre Duval,
Victor Schmidt,
Alex Hernandez Garcia,
Santiago Miret,
Fragkiskos D. Malliaros,
Yoshua Bengio,
David Rolnick
Abstract:
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to specific symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible f…
▽ More
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to specific symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design
Authors:
Alexandre Duval,
Victor Schmidt,
Santiago Miret,
Yoshua Bengio,
Alex Hernández-García,
David Rolnick
Abstract:
Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electroch…
▽ More
Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{https://phast.readthedocs.io}.
△ Less
Submitted 11 March, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Testing Causality in Scientific Modelling Software
Authors:
Andrew G. Clark,
Michael Foster,
Benedikt Prifling,
Neil Walkinshaw,
Robert M. Hierons,
Volker Schmidt,
Robert D. Turner
Abstract:
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in develo** scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously diffic…
▽ More
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in develo** scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse data instead of costly experiments. This paper introduces the Causal Testing Framework: a framework that uses Causal Inference techniques to establish causal effects from existing data, enabling users to conduct software testing activities concerning the effect of a change, such as Metamorphic Testing, a posteriori. We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes from reused, confounded test data to provide an efficient solution for testing scientific modelling software.
△ Less
Submitted 30 June, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Classification of FIB/SEM-tomography images for highly porous multiphase materials using random forest classifiers
Authors:
Markus Osenberg,
André Hilger,
Matthias Neumann,
Amalia Wagner,
Nicole Bohn,
Joachim R. Binder,
Volker Schmidt,
John Banhart,
Ingo Manke
Abstract:
FIB/SEM tomography represents an indispensable tool for the characterization of three-dimensional nanostructures in battery research and many other fields. However, contrast and 3D classification/reconstruction problems occur in many cases, which strongly limits the applicability of the technique especially on porous materials, like those used for electrode materials in batteries or fuel cells. Di…
▽ More
FIB/SEM tomography represents an indispensable tool for the characterization of three-dimensional nanostructures in battery research and many other fields. However, contrast and 3D classification/reconstruction problems occur in many cases, which strongly limits the applicability of the technique especially on porous materials, like those used for electrode materials in batteries or fuel cells. Distinguishing the different components like active Li storage particles and carbon/binder materials is difficult and often prevents a reliable quantitative analysis of image data, or may even lead to wrong conclusions about structure-property relationships. In this contribution, we present a novel approach for data classification in three-dimensional image data obtained by FIB/SEM tomography and its applications to NMC battery electrode materials. We use two different image signals, namely the signal of the angled SE2 chamber detector and the Inlens detector signal, combine both signals and train a random forest, i.e. a particular machine learning algorithm. We demonstrate that this approach can overcome current limitations of existing techniques suitable for multi-phase measurements and that it allows for quantitative data reconstruction even where current state-of the art techniques fail, or demand for large training sets. This approach may yield as guideline for future research using FIB/SEM tomography.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods
Authors:
Victor Schmidt,
Alexandra Sasha Luccioni,
Mélisande Teng,
Tianyu Zhang,
Alexia Reynaud,
Sunand Raghupathi,
Gautier Cosne,
Adrien Juraver,
Vahe Vardanyan,
Alex Hernandez-Garcia,
Yoshua Bengio
Abstract:
Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places…
▽ More
Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing
Authors:
Prateek Gupta,
Tegan Maharaj,
Martin Weiss,
Nasim Rahaman,
Hannah Alsdurf,
Abhinav Sharma,
Nanor Minoyan,
Soren Harnois-Leblanc,
Victor Schmidt,
Pierre-Luc St. Charles,
Tristan Deleu,
Andrew Williams,
Akshay Patel,
Meng Qu,
Olexa Bilaniuk,
Gaétan Marceau Caron,
Pierre Luc Carrier,
Satya Ortiz-Gagné,
Marc-Andre Rousseau,
David Buckeridge,
Joumana Ghosn,
Yang Zhang,
Bernhard Schölkopf,
Jian Tang,
Irina Rish
, et al. (4 additional authors not shown)
Abstract:
The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental si…
▽ More
The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental simulator we call COVI-AgentSim, integrating detailed consideration of virology, disease progression, social contact networks, and mobility patterns, based on parameters derived from empirical research. We verify by comparing to real data that COVI-AgentSim is able to reproduce realistic COVID-19 spread dynamics, and perform a sensitivity analysis to verify that the relative performance of contact tracing methods are consistent across a range of settings. We use COVI-AgentSim to perform cost-benefit analyses comparing no DCT to: 1) standard binary contact tracing (BCT) that assigns binary recommendations based on binary test results; and 2) a rule-based method for feature-based contact tracing (FCT) that assigns a graded level of recommendation based on diverse individual features. We find all DCT methods consistently reduce the spread of the disease, and that the advantage of FCT over BCT is maintained over a wide range of adoption rates. Feature-based methods of contact tracing avert more disability-adjusted life years (DALYs) per socioeconomic cost (measured by productive hours lost). Our results suggest any DCT method can help save lives, support re-opening of economies, and prevent second-wave outbreaks, and that FCT methods are a promising direction for enriching BCT using self-reported symptoms, yielding earlier warning signals and a significantly reduced spread of the virus per socioeconomic cost.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Predicting Infectiousness for Proactive Contact Tracing
Authors:
Yoshua Bengio,
Prateek Gupta,
Tegan Maharaj,
Nasim Rahaman,
Martin Weiss,
Tristan Deleu,
Eilif Muller,
Meng Qu,
Victor Schmidt,
Pierre-Luc St-Charles,
Hannah Alsdurf,
Olexa Bilanuik,
David Buckeridge,
Gáetan Marceau Caron,
Pierre-Luc Carrier,
Joumana Ghosn,
Satya Ortiz-Gagne,
Chris Pal,
Irina Rish,
Bernhard Schölkopf,
Abhinav Sharma,
Jian Tang,
Andrew Williams
Abstract:
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between pri…
▽ More
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Image-to-image Map** with Many Domains by Sparse Attribute Transfer
Authors:
Matthew Amodio,
Rim Assouel,
Victor Schmidt,
Tristan Sylvain,
Smita Krishnaswamy,
Yoshua Bengio
Abstract:
Unsupervised image-to-image translation consists of learning a pair of map**s between two domains without known pairwise correspondences between points. The current convention is to approach this task with cycle-consistent GANs: using a discriminator to encourage the generator to change the image to match the target domain, while training the generator to be inverted with another map**. While…
▽ More
Unsupervised image-to-image translation consists of learning a pair of map**s between two domains without known pairwise correspondences between points. The current convention is to approach this task with cycle-consistent GANs: using a discriminator to encourage the generator to change the image to match the target domain, while training the generator to be inverted with another map**. While ending up with paired inverse functions may be a good end result, enforcing this restriction at all times during training can be a hindrance to effective modeling. We propose an alternate approach that directly restricts the generator to performing a simple sparse transformation in a latent layer, motivated by recent work from cognitive neuroscience suggesting an architectural prior on representations corresponding to consciousness. Our biologically motivated approach leads to representations more amenable to transformation by disentangling high-level abstract concepts in the latent space. We demonstrate that image-to-image domain translation with many different domains can be learned more effectively with our architecturally constrained, simple transformation than with previous unconstrained architectures that rely on a cycle-consistency loss.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Towards Lifelong Self-Supervision For Unpaired Image-to-Image Translation
Authors:
Victor Schmidt,
Makesh Narsimhan Sreedhar,
Mostafa ElAraby,
Irina Rish
Abstract:
Unpaired Image-to-Image Translation (I2IT) tasks often suffer from lack of data, a problem which self-supervised learning (SSL) has recently been very popular and successful at tackling. Leveraging auxiliary tasks such as rotation prediction or generative colorization, SSL can produce better and more robust representations in a low data regime. Training such tasks along an I2IT task is however com…
▽ More
Unpaired Image-to-Image Translation (I2IT) tasks often suffer from lack of data, a problem which self-supervised learning (SSL) has recently been very popular and successful at tackling. Leveraging auxiliary tasks such as rotation prediction or generative colorization, SSL can produce better and more robust representations in a low data regime. Training such tasks along an I2IT task is however computationally intractable as model size and the number of task grow. On the other hand, learning sequentially could incur catastrophic forgetting of previously learned tasks. To alleviate this, we introduce Lifelong Self-Supervision (LiSS) as a way to pre-train an I2IT model (e.g., CycleGAN) on a set of self-supervised auxiliary tasks. By kee** an exponential moving average of past encoders and distilling the accumulated knowledge, we are able to maintain the network's validation performance on a number of tasks without any form of replay, parameter isolation or retraining techniques typically used in continual learning. We show that models trained with LiSS perform better on past tasks, while also being more robust than the CycleGAN baseline to color bias and entity entanglement (when two entities are very close).
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Using Simulated Data to Generate Images of Climate Change
Authors:
Gautier Cosne,
Adrien Juraver,
Mélisande Teng,
Victor Schmidt,
Vahe Vardanyan,
Alexandra Luccioni,
Yoshua Bengio
Abstract:
Generative adversarial networks (GANs) used in domain adaptation tasks have the ability to generate images that are both realistic and personalized, transforming an input image while maintaining its identifiable characteristics. However, they often require a large quantity of training data to produce high-quality images in a robust way, which limits their usability in cases when access to data is…
▽ More
Generative adversarial networks (GANs) used in domain adaptation tasks have the ability to generate images that are both realistic and personalized, transforming an input image while maintaining its identifiable characteristics. However, they often require a large quantity of training data to produce high-quality images in a robust way, which limits their usability in cases when access to data is limited. In our paper, we explore the potential of using images from a simulated 3D environment to improve a domain adaptation task carried out by the MUNIT architecture, aiming to use the resulting images to raise awareness of the potential future impacts of climate change.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
Quantifying the Carbon Emissions of Machine Learning
Authors:
Alexandre Lacoste,
Alexandra Luccioni,
Victor Schmidt,
Thomas Dandres
Abstract:
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate…
▽ More
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
△ Less
Submitted 4 November, 2019; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks
Authors:
Victor Schmidt,
Alexandra Luccioni,
S. Karthik Mukkavilli,
Narmada Balasooriya,
Kris Sankaran,
Jennifer Chayes,
Yoshua Bengio
Abstract:
We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consistent Adversarial Networks (CycleGANs). By training our CycleGAN model on street-view images of houses before and after extreme weather events (e.g. floods, forest fires, etc.), we learn a map** that can then be applied to images of locations that have not y…
▽ More
We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consistent Adversarial Networks (CycleGANs). By training our CycleGAN model on street-view images of houses before and after extreme weather events (e.g. floods, forest fires, etc.), we learn a map** that can then be applied to images of locations that have not yet experienced these events. This visual transformation is paired with climate model predictions to assess likelihood and type of climate-related events in the long term (50 years) in order to bring the future closer in the viewers mind. The eventual goal of our project is to enable individuals to make more informed choices about their climate future by creating a more visceral understanding of the effects of climate change, while maintaining scientific credibility by drawing on climate model projections.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.