-
Information Retrieval and Classification of Real-Time Multi-Source Hurricane Evacuation Notices
Authors:
Tingting Zhao,
Shubo Tian,
Jordan Daly,
Melissa Geiger,
Minna Jia,
**feng Zhang
Abstract:
For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study…
▽ More
For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study, we developed an approach to timely detect and track the locally issued hurricane evacuation notices. The text data were collected mainly with a spatially targeted web scra** method. They were manually labeled and then classified using natural language processing techniques with deep learning models. The classification of mandatory evacuation notices achieved a high accuracy (recall = 96%). We used Hurricane Ian (2022) to illustrate how real-time evacuation notices extracted from local government sources could be redistributed with a Web GIS system. Our method applied to future hurricanes provides live data for situation awareness to higher-level government agencies and news media. The archived data helps scholars to study government responses toward weather warnings and individual behaviors influenced by evacuation history. The framework may be applied to other types of disasters for rapid and targeted retrieval, classification, redistribution, and archiving of real-time government orders and notifications.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Multithreaded parallelism for heterogeneous clusters of QPUs
Authors:
Philipp Seitz,
Manuel Geiger,
Christian B. Mendl
Abstract:
In this work, we present MILQ, a quantum unrelated parallel machines scheduler and cutter. The setting of unrelated parallel machines considers independent hardware backends, each distinguished by differing setup and processing times. MILQ optimizes the total execution time of a batch of circuits scheduled on multiple quantum devices. It leverages state-of-the-art circuit-cutting techniques to fit…
▽ More
In this work, we present MILQ, a quantum unrelated parallel machines scheduler and cutter. The setting of unrelated parallel machines considers independent hardware backends, each distinguished by differing setup and processing times. MILQ optimizes the total execution time of a batch of circuits scheduled on multiple quantum devices. It leverages state-of-the-art circuit-cutting techniques to fit circuits onto the devices and schedules them based on a mixed-integer linear program. Our results show a total improvement of up to 26 % compared to a baseline approach.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation
Authors:
Ameya Daigavane,
Song Kim,
Mario Geiger,
Tess Smidt
Abstract:
We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equiva…
▽ More
We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equivariant features. This allows a novel representation of probability distributions via spherical harmonic signals to efficiently model the 3D geometry of molecules. We show that Symphony is able to accurately generate small molecules from the QM9 dataset, outperforming existing autoregressive models and approaching the performance of diffusion models.
△ Less
Submitted 16 March, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders
Authors:
Allan dos Santos Costa,
Ilan Mitnikov,
Mario Geiger,
Manvitha Ponnapati,
Tess Smidt,
Joseph Jacobson
Abstract:
Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model th…
▽ More
Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all-atom protein structures. Our model departs from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions with efficient time complexity in protein length. We measure the reconstruction capabilities of Ophiuchus across different compression rates, and compare it to existing models. We examine the learned latent space and demonstrate its utility through conformational interpolation. Finally, we leverage denoising diffusion probabilistic models (DDPM) in the latent space to efficiently sample protein structures. Our experiments demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation.
△ Less
Submitted 26 December, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Integration of Quantum Accelerators with High Performance Computing -- A Review of Quantum Programming Tools
Authors:
Amr Elsharkawy,
Xiao-Ting Michelle To,
Philipp Seitz,
Yanbin Chen,
Yannick Stade,
Manuel Geiger,
Qunsheng Huang,
Xiaorang Guo,
Muhammad Arslan Ansari,
Christian B. Mendl,
Dieter Kranzlmüller,
Martin Schulz
Abstract:
Quantum computing (QC) introduces a novel mode of computation with the possibility of greater computational power that remains to be exploited - presenting exciting opportunities for high performance computing (HPC) applications. However, recent advancements in the field have made clear that QC does not supplant conventional HPC, but can rather be incorporated into current heterogeneous HPC infras…
▽ More
Quantum computing (QC) introduces a novel mode of computation with the possibility of greater computational power that remains to be exploited - presenting exciting opportunities for high performance computing (HPC) applications. However, recent advancements in the field have made clear that QC does not supplant conventional HPC, but can rather be incorporated into current heterogeneous HPC infrastructures as an additional accelerator, thereby enabling the optimal utilization of both paradigms. The desire for such integration significantly affects the development of software for quantum computers, which in turn influences the necessary software infrastructure. To date, previous review papers have investigated various quantum programming tools (QPTs) (such as languages, libraries, frameworks) in their ability to program, compile, and execute quantum circuits. However, the integration effort with classical HPC frameworks or systems has not been addressed. This study aims to characterize existing QPTs from an HPC perspective, investigating if existing QPTs have the potential to be efficiently integrated with classical computing models and determining where work is still required. This work structures a set of criteria into an analysis blueprint that enables HPC scientists to assess whether a QPT is suitable for the quantum-accelerated classical application at hand.
△ Less
Submitted 18 September, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
A General Framework for Equivariant Neural Networks on Reductive Lie Groups
Authors:
Ilyes Batatia,
Mario Geiger,
Jose Munoz,
Tess Smidt,
Lior Silberman,
Christoph Ortner
Abstract:
Reductive Lie Groups, such as the orthogonal groups, the Lorentz group, or the unitary groups, play essential roles across scientific fields as diverse as high energy physics, quantum mechanics, quantum chromodynamics, molecular dynamics, computer vision, and imaging. In this paper, we present a general Equivariant Neural Network architecture capable of respecting the symmetries of the finite-dime…
▽ More
Reductive Lie Groups, such as the orthogonal groups, the Lorentz group, or the unitary groups, play essential roles across scientific fields as diverse as high energy physics, quantum mechanics, quantum chromodynamics, molecular dynamics, computer vision, and imaging. In this paper, we present a general Equivariant Neural Network architecture capable of respecting the symmetries of the finite-dimensional representations of any reductive Lie Group G. Our approach generalizes the successful ACE and MACE architectures for atomistic point clouds to any data equivariant to a reductive Lie group action. We also introduce the lie-nn software library, which provides all the necessary tools to develop and implement such general G-equivariant neural networks. It implements routines for the reduction of generic tensor products of representations into irreducible representations, making it easy to apply our architecture to a wide range of problems and groups. The generality and performance of our approach are demonstrated by applying it to the tasks of top quark decay tagging (Lorentz group) and shape recognition (orthogonal group).
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data
Authors:
Ivan Diaz,
Mario Geiger,
Richard Iain McKinley
Abstract:
Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen…
▽ More
Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen poses, smaller network size, and improved sample efficiency. Despite this, most segmentation networks used in medical image analysis continue to rely on standard convolutional kernels. In this paper, we present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. In addition, we demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks, with enhanced robustness to reduced amounts of training data and improved parameter efficiency. Code to reproduce our results, and to implement the equivariant segmentation networks for other tasks is available at http://github.com/SCAN-NRAD/e3nn_Unet
△ Less
Submitted 17 May, 2024; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Authors:
Antonio Sclocchi,
Mario Geiger,
Matthieu Wyart
Abstract:
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a k…
▽ More
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a key parameter that controls if the network is `lazy'($α\gg1$) or instead learns features ($α\ll1$). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the $(α,T)$ plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing $T$ or decreasing $α$ both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that the characteristic temperature $T_c$ where the noise of SGD starts affecting the trained model (and eventually performance) is a power law of $P$. We relate this finding with the observation that key dynamical quantities, such as the total variation of weights during training, depend on both $T$ and $P$ as power laws. These results indicate that a key effect of SGD noise occurs late in training by affecting the stop** process whereby all data are fitted. Indeed, we argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. A stronger signal and a longer training time are also required when the size of the training set $P$ increases. We confirm these views in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.
△ Less
Submitted 30 May, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Deep Domain Adaptation for Detecting Bomb Craters in Aerial Images
Authors:
Marco Geiger,
Dominik Martin,
Niklas Kühl
Abstract:
The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promisin…
▽ More
The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promising way to improve the UXO disposal process. However, these methods require a large amount of manually labeled training data. This work leverages domain adaptation with moon surface images to address the problem of automated bomb crater detection with deep learning under the constraint of limited training data. This paper contributes to both academia and practice (1) by providing a solution approach for automated bomb crater detection with limited training data and (2) by demonstrating the usability and associated challenges of using synthetic images for domain adaptation.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
e3nn: Euclidean Neural Networks
Authors:
Mario Geiger,
Tess Smidt
Abstract:
We present e3nn, a generalized framework for creating E(3) equivariant trainable functions, also known as Euclidean neural networks. e3nn naturally operates on geometry and geometric tensors that describe systems in 3D and transform predictably under a change of coordinate system. The core of e3nn are equivariant operations such as the TensorProduct class or the spherical harmonics functions that…
▽ More
We present e3nn, a generalized framework for creating E(3) equivariant trainable functions, also known as Euclidean neural networks. e3nn naturally operates on geometry and geometric tensors that describe systems in 3D and transform predictably under a change of coordinate system. The core of e3nn are equivariant operations such as the TensorProduct class or the spherical harmonics functions that can be composed to create more complex modules such as convolutions and attention mechanisms. These core operations of e3nn can be used to efficiently articulate Tensor Field Networks, 3D Steerable CNNs, Clebsch-Gordan Networks, SE(3) Transformers and other E(3) equivariant networks.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem
Authors:
Mario Geiger,
Christophe Eloy,
Matthieu Wyart
Abstract:
Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance…
▽ More
Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in a two-hypothesis testing problem, akin to the two-arm bandit problem. We compare two extreme cases: (i) the random access memory where any transitions between $M$ memory states are allowed and (ii) a fixed memory where the agent can access its last $m$ actions and rewards. For (i), the probability $q$ to play the worst arm is known to be exponentially small in $M$ for the optimal policy. Our main result is to show that similar performance can be reached for (ii) as well, despite the simplicity of the memory architecture: using a conjecture on Gray-ordered binary necklaces, we find policies for which $q$ is exponentially small in $2^m$, i.e. $q\simα^{2^m}$ with $α< 1$. In addition, we observe empirically that training from random initialization leads to very poor results for (i), and significantly better results for (ii) thanks to the constraints on the memory architecture.
△ Less
Submitted 18 November, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Relative stability toward diffeomorphisms indicates performance in deep nets
Authors:
Leonardo Petrini,
Alessandro Favero,
Mario Geiger,
Matthieu Wyart
Abstract:
Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm t…
▽ More
Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $ε_t$. It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures, we find $ε_t\approx 0.2\sqrt{R_f}$, suggesting that obtaining a small $R_f$ is important to achieve good performance. We study how $R_f$ depends on the size of the training set and compare it to a simple model of invariant learning.
△ Less
Submitted 4 November, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials
Authors:
Simon Batzner,
Albert Musaelian,
Lixin Sun,
Mario Geiger,
Jonathan P. Mailoa,
Mordechai Kornbluth,
Nicola Molinari,
Tess E. Smidt,
Boris Kozinsky
Abstract:
This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, res…
▽ More
This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, resulting in a more information-rich and faithful representation of atomic environments. The method achieves state-of-the-art accuracy on a challenging and diverse set of molecules and materials while exhibiting remarkable data efficiency. NequIP outperforms existing models with up to three orders of magnitude fewer training data, challenging the widely held belief that deep neural networks require massive training sets. The high data efficiency of the method allows for the construction of accurate potentials using high-order quantum chemical level of theory as reference and enables high-fidelity molecular dynamics simulations over long time scales.
△ Less
Submitted 16 December, 2021; v1 submitted 8 January, 2021;
originally announced January 2021.
-
Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training
Authors:
Mario Geiger,
Leonardo Petrini,
Matthieu Wyart
Abstract:
Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure,…
▽ More
Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,α)$ plane where $h$ is the network width and $α$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Authors:
Benjamin Kurt Miller,
Mario Geiger,
Tess E. Smidt,
Frank Noé
Abstract:
Equivariant neural networks (ENNs) are graph neural networks embedded in $\mathbb{R}^3$ and are well suited for predicting molecular properties. The ENN library e3nn has customizable convolutions, which can be designed to depend only on distances between points, or also on angular features, making them rotationally invariant, or equivariant, respectively. This paper studies the practical value of…
▽ More
Equivariant neural networks (ENNs) are graph neural networks embedded in $\mathbb{R}^3$ and are well suited for predicting molecular properties. The ENN library e3nn has customizable convolutions, which can be designed to depend only on distances between points, or also on angular features, making them rotationally invariant, or equivariant, respectively. This paper studies the practical value of including angular dependencies for molecular property prediction directly via an ablation study with \texttt{e3nn} and the QM9 data set. We find that, for fixed network depth and parameter count, adding angular features decreased test error by an average of 23%. Meanwhile, increasing network depth decreased test error by only 4% on average, implying that rotationally equivariant layers are comparatively parameter efficient. We present an explanation of the accuracy improvement on the dipole moment, the target which benefited most from the introduction of angular features.
△ Less
Submitted 24 November, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Geometric compression of invariant manifolds in neural nets
Authors:
Jonas Paccolat,
Leonardo Petrini,
Mario Geiger,
Kevin Tyloo,
Matthieu Wyart
Abstract:
We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insens…
▽ More
We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the $d_\perp=d-d_\parallel$ uninformative directions. These are effectively compressed by a factor $λ\sim \sqrt{p}$, where $p$ is the size of the training set. We quantify the benefit of such a compression on the test error $ε$. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that $ε\sim p^{-β}$, with $β_\text{Lazy} = d / (3d-2)$. Compression improves the learning curves so that $β_\text{Feature} = (2d-1)/(3d-2)$ if $d_\parallel = 1$ and $β_\text{Feature} = (d + d_\perp/2)/(3d-2)$ if $d_\parallel > 1$. We test these predictions for a stripe model where boundaries are parallel interfaces ($d_\parallel=1$) as well as for a cylindrical boundary ($d_\parallel=2$). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST, for which we also find $β_\text{Feature}>β_\text{Lazy}$.
△ Less
Submitted 11 March, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks
Authors:
Tess E. Smidt,
Mario Geiger,
Benjamin Kurt Miller
Abstract:
Curie's principle states that "when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them". We demonstrate that symmetry equivariant neural networks uphold Curie's principle and can be used to articulate many symmetry-relevant scientific questions into simple optimization problems. We prove these properties mathematically and demonstrate them numerically…
▽ More
Curie's principle states that "when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them". We demonstrate that symmetry equivariant neural networks uphold Curie's principle and can be used to articulate many symmetry-relevant scientific questions into simple optimization problems. We prove these properties mathematically and demonstrate them numerically by training a Euclidean symmetry equivariant neural network to learn symmetry-breaking input to deform a square into a rectangle and to generate octahedra tilting patterns in perovskites.
△ Less
Submitted 26 October, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Disentangling feature and lazy training in deep neural networks
Authors:
Mario Geiger,
Stefano Spigler,
Arthur Jacot,
Matthieu Wyart
Abstract:
Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet…
▽ More
Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as $αh^{-1/2}$ at initialization. By varying $α$ and $h$, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an $α^*$ that scales as $h^{-1/2}$. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations $δF$ induced on the learned function by initial conditions decay as $δF\sim 1/\sqrt{h}$, leading to a performance that increases with $h$. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale $t_1\sim\sqrt{h}α$, such that for $t\ll t_1$ the dynamics is linear.
△ Less
Submitted 4 October, 2020; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
Authors:
Stefano Spigler,
Mario Geiger,
Matthieu Wyart
Abstract:
How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression…
▽ More
How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically $β$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $β$ depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than $n$. Using this idea we predict relate the exponent $β$ to an exponent $a$ describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract $a$ from real data by performing kernel PCA, leading to $β\approx0.36$ for MNIST and $β\approx0.07$ for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.
△ Less
Submitted 18 August, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
WatchOut: A Road Safety Extension for Pedestrians on a Public Windshield Display
Authors:
Matthias Geiger,
Changkun Ou,
Cedric Quintes
Abstract:
We conducted a field study to investigate whether public windshield displays are applicable as an additional interactive digital road safety warning sign. We focused on investigating the acceptance and usability of our novel public windshield display and its potential use for future applications. The study has shown that users are open-minded to the idea of an extraverted windshield display regard…
▽ More
We conducted a field study to investigate whether public windshield displays are applicable as an additional interactive digital road safety warning sign. We focused on investigating the acceptance and usability of our novel public windshield display and its potential use for future applications. The study has shown that users are open-minded to the idea of an extraverted windshield display regardless the use case, whether it is used for safety purposes or different content. Contrary to our hypothesis most people assumed they would mistrust the system if it were as well established as traffic lights and primarily rely on their own perception.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Scaling description of generalization with number of parameters in deep learning
Authors:
Mario Geiger,
Arthur Jacot,
Stefano Spigler,
Franck Gabriel,
Levent Sagun,
Stéphane d'Ascoli,
Giulio Biroli,
Clément Hongler,
Matthieu Wyart
Abstract:
Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over…
▽ More
Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with $N$. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations $\|f_{N}-\bar{f}_{N}\|\sim N^{-1/4}$ of the neural net output function $f_{N}$ around its expectation $\bar{f}_{N}$. These affect the generalization error $ε_{N}$ for classification: under natural assumptions, it decays to a plateau value $ε_{\infty}$ in a power-law fashion $\sim N^{-1/2}$. This description breaks down at a so-called jamming transition $N=N^{*}$. At this threshold, we argue that $\|f_{N}\|$ diverges. This result leads to a plausible explanation for the cusp in test error known to occur at $N^{*}$. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond $N^{*}$, and averaging their outputs.
△ Less
Submitted 8 October, 2019; v1 submitted 6 January, 2019;
originally announced January 2019.
-
A General Theory of Equivariant CNNs on Homogeneous Spaces
Authors:
Taco Cohen,
Mario Geiger,
Maurice Weiler
Abstract:
We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space,…
▽ More
We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also consider a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? Following Mackey, we show that such maps correspond one-to-one with convolutions using equivariant kernels, and characterize the space of such kernels.
△ Less
Submitted 9 January, 2020; v1 submitted 5 November, 2018;
originally announced November 2018.
-
A jamming transition from under- to over-parametrization affects loss landscape and generalization
Authors:
Stefano Spigler,
Mario Geiger,
Stéphane d'Ascoli,
Levent Sagun,
Giulio Biroli,
Matthieu Wyart
Abstract:
We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h…
▽ More
We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.
△ Less
Submitted 18 June, 2019; v1 submitted 22 October, 2018;
originally announced October 2018.
-
The jamming transition as a paradigm to understand the loss landscape of deep neural networks
Authors:
Mario Geiger,
Stefano Spigler,
Stéphane d'Ascoli,
Levent Sagun,
Marco Baity-Jesi,
Giulio Biroli,
Matthieu Wyart
Abstract:
Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal…
▽ More
Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in FC networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random data is independent of their depth. Our observations suggests that this independence also holds for real data. We also study a quantity $Δ$ which characterizes how well ($Δ<0$) or badly ($Δ>0$) a datum is learned. At the critical point it is power-law distributed, $P_+(Δ)\simΔ^θ$ for $Δ>0$ and $P_-(Δ)\sim(-Δ)^{-γ}$ for $Δ<0$, with $θ\approx0.3$ and $γ\approx0.2$. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.
△ Less
Submitted 17 June, 2019; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Deep Learning in the Wild
Authors:
Thilo Stadelmann,
Mohammadreza Amirian,
Ismail Arabaci,
Marek Arnold,
Gilbert François Duivesteijn,
Ismail Elezi,
Melanie Geiger,
Stefan Lörwald,
Benjamin Bruno Meier,
Katharina Rombach,
Lukas Tuggener
Abstract:
Deep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific ch…
▽ More
Deep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific challenges arising in the realm of real world tasks, based on case studies from research \& development in conjunction with industry, and extracts lessons learned from them. It thus fills a gap between the publication of latest algorithmic and methodical developments, and the usually omitted nitty-gritty of how to make them work. Specifically, we give insight into deep learning projects on face matching, print media monitoring, industrial quality control, music scanning, strategy game playing, and automated machine learning, thereby providing best practices for deep learning in practice.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data
Authors:
Maurice Weiler,
Mario Geiger,
Max Welling,
Wouter Boomsma,
Taco Cohen
Abstract:
We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analyt…
▽ More
We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.
△ Less
Submitted 27 October, 2018; v1 submitted 6 July, 2018;
originally announced July 2018.
-
Intertwiners between Induced Representations (with Applications to the Theory of Equivariant Neural Networks)
Authors:
Taco S. Cohen,
Mario Geiger,
Maurice Weiler
Abstract:
Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vect…
▽ More
Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vector or tensor fields ("capsules") to represent data. In algebraic terms, the feature spaces in regular G-CNNs transform according to a regular representation of the group G, whereas the feature spaces in Steerable G-CNNs transform according to the more general induced representations of G. In order to make the network equivariant, each layer in a G-CNN is required to intertwine between the induced representations associated with its input and output space.
In this paper we present a general mathematical framework for G-CNNs on homogeneous spaces like Euclidean space or the sphere. We show, using elementary methods, that the layers of an equivariant network are convolutional if and only if the input and output feature spaces transform according to an induced representation. This result, which follows from G.W. Mackey's abstract theory on induced representations, establishes G-CNNs as a universal class of equivariant network architectures, and generalizes the important recent work of Kondor & Trivedi on the intertwiners between regular representations.
△ Less
Submitted 30 March, 2018; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Authors:
M. Baity-Jesi,
L. Sagun,
M. Geiger,
S. Spigler,
G. Ben Arous,
C. Cammarota,
Y. LeCun,
M. Wyart,
G. Biroli
Abstract:
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur…
▽ More
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
△ Less
Submitted 7 June, 2018; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Spherical CNNs
Authors:
Taco S. Cohen,
Mario Geiger,
Jonas Koehler,
Max Welling
Abstract:
Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive a…
▽ More
Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective.
In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.
△ Less
Submitted 25 February, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Convolutional Networks for Spherical Signals
Authors:
Taco Cohen,
Mario Geiger,
Jonas Köhler,
Max Welling
Abstract:
The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and che…
▽ More
The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and chemistry. In this paper we present spherical convolutional networks. These networks use convolutions on the sphere and rotation group, which results in rotational weight sharing and rotation equivariance. Using a synthetic spherical MNIST dataset, we show that spherical convolutional networks are very effective at dealing with rotationally invariant classification problems.
△ Less
Submitted 15 September, 2017; v1 submitted 14 September, 2017;
originally announced September 2017.
-
Interactive Reference Point-Based Guided Local Search for the Bi-objective Inventory Routing Problem
Authors:
Sandra Huber,
Martin Josef Geiger,
Marc Sevaux
Abstract:
Eliciting preferences of a decision maker is a key factor to successfully combine search and decision making in an interactive method. Therefore, the progressively integration and simulation of the decision maker is a main concern in an application. We contribute in this direction by proposing an interactive method based on a reference point-based guided local search to the bi-objective Inventory…
▽ More
Eliciting preferences of a decision maker is a key factor to successfully combine search and decision making in an interactive method. Therefore, the progressively integration and simulation of the decision maker is a main concern in an application. We contribute in this direction by proposing an interactive method based on a reference point-based guided local search to the bi-objective Inventory Routing Problem. A local search metaheuristic, working on the delivery intervals, and the Clarke & Wright savings heuristic is employed for the subsequently obtained Vehicle Routing Problem. To elicit preferences, the decision maker selects a reference point to guide the search in interesting subregions. Additionally, the reference point is used as a reservation point to discard solutions outside the cone, introduced as a convergence criterion. Computational results of the reference point-based guided local search are reported and analyzed on benchmark data in order to show the applicability of the approach.
△ Less
Submitted 22 May, 2014;
originally announced May 2014.
-
Iterated Variable Neighborhood Search for the resource constrained multi-mode multi-project scheduling problem
Authors:
Martin Josef Geiger
Abstract:
The resource constrained multi-mode multi-project scheduling problem (RCMMMPSP) is a notoriously difficult combinatorial optimization problem. For a given set of activities, feasible execution mode assignments and execution starting times must be found such that some optimization function, e.g. the makespan, is optimized. When determining an optimal (or at least feasible) assignment of decision va…
▽ More
The resource constrained multi-mode multi-project scheduling problem (RCMMMPSP) is a notoriously difficult combinatorial optimization problem. For a given set of activities, feasible execution mode assignments and execution starting times must be found such that some optimization function, e.g. the makespan, is optimized. When determining an optimal (or at least feasible) assignment of decision variable values, a set of side constraints, such as resource availabilities, precedence constraints, etc., has to be respected.
In 2013, the MISTA 2013 Challenge stipulated research in the RCMMMPSP. It's goal was the solution of a given set of instances under running time restrictions. We have contributed to this challenge with the here presented approach.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Solution Representations and Local Search for the bi-objective Inventory Routing Problem
Authors:
Thibaut Barthélemy,
Martin Josef Geiger,
Marc Sevaux
Abstract:
The solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work cont…
▽ More
The solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work contributes to the better understanding of such solution representations. On the basis of an experimental investigation, the advantages and drawbacks of two encodings are studied and compared.
△ Less
Submitted 18 April, 2012;
originally announced April 2012.
-
Neigborhood Selection in Variable Neighborhood Search
Authors:
Martin Josef Geiger,
Marc Sevaux,
Stefan Voss
Abstract:
Variable neighborhood search (VNS) is a metaheuristic for solving optimization problems based on a simple principle: systematic changes of neighborhoods within the search, both in the descent to local minima and in the escape from the valleys which contain them. Designing these neighborhoods and applying them in a meaningful fashion is not an easy task. Moreover, an appropriate order in which they…
▽ More
Variable neighborhood search (VNS) is a metaheuristic for solving optimization problems based on a simple principle: systematic changes of neighborhoods within the search, both in the descent to local minima and in the escape from the valleys which contain them. Designing these neighborhoods and applying them in a meaningful fashion is not an easy task. Moreover, an appropriate order in which they are applied must be determined. In this paper we attempt to investigate this issue. Assume that we are given an optimization problem that is intended to be solved by applying the VNS scheme, how many and which types of neighborhoods should be investigated and what could be appropriate selection criteria to apply these neighborhoods. More specifically, does it pay to "look ahead" (see, e.g., in the context of VNS and GRASP) when attempting to switch from one neighborhood to another?
△ Less
Submitted 15 September, 2011;
originally announced September 2011.
-
On the use of reference points for the biobjective Inventory Routing Problem
Authors:
Martin Josef Geiger,
Marc Sevaux
Abstract:
The article presents a study on the biobjective inventory routing problem. Contrary to most previous research, the problem is treated as a true multi-objective optimization problem, with the goal of identifying Pareto-optimal solutions. Due to the hardness of the problem at hand, a reference point based optimization approach is presented and implemented into an optimization and decision support sy…
▽ More
The article presents a study on the biobjective inventory routing problem. Contrary to most previous research, the problem is treated as a true multi-objective optimization problem, with the goal of identifying Pareto-optimal solutions. Due to the hardness of the problem at hand, a reference point based optimization approach is presented and implemented into an optimization and decision support system, which allows for the computation of a true subset of the optimal outcomes. Experimental investigation involving local search metaheuristics are conducted on benchmark data, and numerical results are reported and analyzed.
△ Less
Submitted 14 September, 2011;
originally announced September 2011.
-
Practical inventory routing: A problem definition and an optimization method
Authors:
Martin Josef Geiger,
Marc Sevaux
Abstract:
The global objective of this work is to provide practical optimization methods to companies involved in inventory routing problems, taking into account this new type of data. Also, companies are sometimes not able to deal with changing plans every period and would like to adopt regular structures for serving customers.
The global objective of this work is to provide practical optimization methods to companies involved in inventory routing problems, taking into account this new type of data. Also, companies are sometimes not able to deal with changing plans every period and would like to adopt regular structures for serving customers.
△ Less
Submitted 28 February, 2011;
originally announced February 2011.
-
On the comparison of plans: Proposition of an instability measure for dynamic machine scheduling
Authors:
Martin Josef Geiger
Abstract:
On the basis of an analysis of previous research, we present a generalized approach for measuring the difference of plans with an exemplary application to machine scheduling. Our work is motivated by the need for such measures, which are used in dynamic scheduling and planning situations. In this context, quantitative approaches are needed for the assessment of the robustness and stability of sche…
▽ More
On the basis of an analysis of previous research, we present a generalized approach for measuring the difference of plans with an exemplary application to machine scheduling. Our work is motivated by the need for such measures, which are used in dynamic scheduling and planning situations. In this context, quantitative approaches are needed for the assessment of the robustness and stability of schedules. Obviously, any `robustness' or `stability' of plans has to be defined w. r. t. the particular situation and the requirements of the human decision maker. Besides the proposition of an instability measure, we therefore discuss possibilities of obtaining meaningful information from the decision maker for the implementation of the introduced approach.
△ Less
Submitted 27 April, 2010;
originally announced April 2010.
-
Improvements for multi-objective flow shop scheduling by Pareto Iterated Local Search
Authors:
Martin Josef Geiger
Abstract:
The article describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow…
▽ More
The article describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow shop scheduling problems under multiple objectives and compared to other local search approaches. While the obtained results are encouraging in terms of their quality, another positive attribute of the approach is its simplicity as it does require the setting of only very few parameters.
△ Less
Submitted 17 July, 2009;
originally announced July 2009.
-
The Single Machine Total Weighted Tardiness Problem - Is it (for Metaheuristics) a Solved Problem ?
Authors:
Martin Josef Geiger
Abstract:
The article presents a study of rather simple local search heuristics for the single machine total weighted tardiness problem (SMTWTP), namely hillclimbing and Variable Neighborhood Search. In particular, we revisit these approaches for the SMTWTP as there appears to be a lack of appropriate/challenging benchmark instances in this case. The obtained results are impressive indeed. Only few instan…
▽ More
The article presents a study of rather simple local search heuristics for the single machine total weighted tardiness problem (SMTWTP), namely hillclimbing and Variable Neighborhood Search. In particular, we revisit these approaches for the SMTWTP as there appears to be a lack of appropriate/challenging benchmark instances in this case. The obtained results are impressive indeed. Only few instances remain unsolved, and even those are approximated within 1% of the optimal/best known solutions. Our experiments support the claim that metaheuristics for the SMTWTP are very likely to lead to good results, and that, before refining search strategies, more work must be done with regard to the proposition of benchmark data. Some recommendations for the construction of such data sets are derived from our investigations.
△ Less
Submitted 17 July, 2009;
originally announced July 2009.
-
Variable Neighborhood Search for the University Lecturer-Student Assignment Problem
Authors:
Martin Josef Geiger,
Wolf Wenger
Abstract:
The paper presents a study of local search heuristics in general and variable neighborhood search in particular for the resolution of an assignment problem studied in the practical work of universities. Here, students have to be assigned to scientific topics which are proposed and supported by members of staff. The problem involves the optimization under given preferences of students which may b…
▽ More
The paper presents a study of local search heuristics in general and variable neighborhood search in particular for the resolution of an assignment problem studied in the practical work of universities. Here, students have to be assigned to scientific topics which are proposed and supported by members of staff. The problem involves the optimization under given preferences of students which may be expressed when applying for certain topics.
It is possible to observe that variable neighborhood search leads to superior results for the tested problem instances. One instance is taken from an actual case, while others have been generated based on the real world data to support the analysis with a deeper analysis.
An extension of the problem has been formulated by integrating a second objective function that simultaneously balances the workload of the members of staff while maximizing utility of the students. The algorithmic approach has been prototypically implemented in a computer system. One important aspect in this context is the application of the research work to problems of other scientific institutions, and therefore the provision of decision support functionalities.
△ Less
Submitted 5 September, 2008;
originally announced September 2008.
-
MOOPPS: An Optimization System for Multi Objective Scheduling
Authors:
Martin Josef Geiger
Abstract:
In the current paper, we present an optimization system solving multi objective production scheduling problems (MOOPPS). The identification of Pareto optimal alternatives or at least a close approximation of them is possible by a set of implemented metaheuristics. Necessary control parameters can easily be adjusted by the decision maker as the whole software is fully menu driven. This allows the…
▽ More
In the current paper, we present an optimization system solving multi objective production scheduling problems (MOOPPS). The identification of Pareto optimal alternatives or at least a close approximation of them is possible by a set of implemented metaheuristics. Necessary control parameters can easily be adjusted by the decision maker as the whole software is fully menu driven. This allows the comparison of different metaheuristic algorithms for the considered problem instances. Results are visualized by a graphical user interface showing the distribution of solutions in outcome space as well as their corresponding Gantt chart representation.
The identification of a most preferred solution from the set of efficient solutions is supported by a module based on the aspiration interactive method (AIM). The decision maker successively defines aspiration levels until a single solution is chosen.
After successfully competing in the finals in Ronneby, Sweden, the MOOPPS software has been awarded the European Academic Software Award 2002 (http://www.bth.se/llab/easa_2002.nsf)
△ Less
Submitted 5 September, 2008;
originally announced September 2008.
-
An application of the Threshold Accepting metaheuristic for curriculum based course timetabling
Authors:
Martin Josef Geiger
Abstract:
The article presents a local search approach for the solution of timetabling problems in general, with a particular implementation for competition track 3 of the International Timetabling Competition 2007 (ITC 2007). The heuristic search procedure is based on Threshold Accepting to overcome local optima. A stochastic neighborhood is proposed and implemented, randomly removing and reassigning eve…
▽ More
The article presents a local search approach for the solution of timetabling problems in general, with a particular implementation for competition track 3 of the International Timetabling Competition 2007 (ITC 2007). The heuristic search procedure is based on Threshold Accepting to overcome local optima. A stochastic neighborhood is proposed and implemented, randomly removing and reassigning events from the current solution.
The overall concept has been incrementally obtained from a series of experiments, which we describe in each (sub)section of the paper. In result, we successfully derived a potential candidate solution approach for the finals of track 3 of the ITC 2007.
△ Less
Submitted 4 September, 2008;
originally announced September 2008.
-
Bin Packing Under Multiple Objectives - a Heuristic Approximation Approach
Authors:
Martin Josef Geiger
Abstract:
The article proposes a heuristic approximation approach to the bin packing problem under multiple objectives. In addition to the traditional objective of minimizing the number of bins, the heterogeneousness of the elements in each bin is minimized, leading to a biobjective formulation of the problem with a tradeoff between the number of bins and their heterogeneousness. An extension of the Best-…
▽ More
The article proposes a heuristic approximation approach to the bin packing problem under multiple objectives. In addition to the traditional objective of minimizing the number of bins, the heterogeneousness of the elements in each bin is minimized, leading to a biobjective formulation of the problem with a tradeoff between the number of bins and their heterogeneousness. An extension of the Best-Fit approximation algorithm is presented to solve the problem. Experimental investigations have been carried out on benchmark instances of different size, ranging from 100 to 1000 items. Encouraging results have been obtained, showing the applicability of the heuristic approach to the described problem.
△ Less
Submitted 4 September, 2008;
originally announced September 2008.
-
Proposition of the Interactive Pareto Iterated Local Search Procedure - Elements and Initial Experiments
Authors:
Martin Josef Geiger
Abstract:
The article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker.
An application of the approach to biobjective portfolio optimization, modeled…
▽ More
The article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker.
An application of the approach to biobjective portfolio optimization, modeled as the well-known knapsack problem, is reported, and experimental results are reported for benchmark instances taken from the literature. In brief, we obtain encouraging results that show the applicability of the approach to the described problem.
△ Less
Submitted 4 September, 2008;
originally announced September 2008.
-
Improving Local Search for Fuzzy Scheduling Problems
Authors:
Martin Josef Geiger,
Sanja Petrovic
Abstract:
The integration of fuzzy set theory and fuzzy logic into scheduling is a rather new aspect with growing importance for manufacturing applications, resulting in various unsolved aspects. In the current paper, we investigate an improved local search technique for fuzzy scheduling problems with fitness plateaus, using a multi criteria formulation of the problem. We especially address the problem of…
▽ More
The integration of fuzzy set theory and fuzzy logic into scheduling is a rather new aspect with growing importance for manufacturing applications, resulting in various unsolved aspects. In the current paper, we investigate an improved local search technique for fuzzy scheduling problems with fitness plateaus, using a multi criteria formulation of the problem. We especially address the problem of changing job priorities over time as studied at the Sherwood Press Ltd, a Nottingham based printing company, who is a collaborator on the project.
△ Less
Submitted 3 September, 2008;
originally announced September 2008.
-
A framework for the interactive resolution of multi-objective vehicle routing problems
Authors:
Martin Josef Geiger,
Wolf Wenger
Abstract:
The article presents a framework for the resolution of rich vehicle routing problems which are difficult to address with standard optimization techniques. We use local search on the basis on variable neighborhood search for the construction of the solutions, but embed the techniques in a flexible framework that allows the consideration of complex side constraints of the problem such as time wind…
▽ More
The article presents a framework for the resolution of rich vehicle routing problems which are difficult to address with standard optimization techniques. We use local search on the basis on variable neighborhood search for the construction of the solutions, but embed the techniques in a flexible framework that allows the consideration of complex side constraints of the problem such as time windows, multiple depots, heterogeneous fleets, and, in particular, multiple optimization criteria. In order to identify a compromise alternative that meets the requirements of the decision maker, an interactive procedure is integrated in the resolution of the problem, allowing the modification of the preference information articulated by the decision maker. The framework is prototypically implemented in a computer system. First results of test runs on multiple depot vehicle routing problems with time windows are reported.
△ Less
Submitted 3 September, 2008;
originally announced September 2008.
-
Genetic Algorithms for multiple objective vehicle routing
Authors:
Martin Josef Geiger
Abstract:
The talk describes a general approach of a genetic algorithm for multiple objective optimization problems. A particular dominance relation between the individuals of the population is used to define a fitness operator, enabling the genetic algorithm to adress even problems with efficient, but convex-dominated alternatives. The algorithm is implemented in a multilingual computer program, solving…
▽ More
The talk describes a general approach of a genetic algorithm for multiple objective optimization problems. A particular dominance relation between the individuals of the population is used to define a fitness operator, enabling the genetic algorithm to adress even problems with efficient, but convex-dominated alternatives. The algorithm is implemented in a multilingual computer program, solving vehicle routing problems with time windows under multiple objectives. The graphical user interface of the program shows the progress of the genetic algorithm and the main parameters of the approach can be easily modified. In addition to that, the program provides powerful decision support to the decision maker. The software has proved it's excellence at the finals of the European Academic Software Award EASA, held at the Keble college/ University of Oxford/ Great Britain.
△ Less
Submitted 2 September, 2008;
originally announced September 2008.
-
A Computational Study of Genetic Crossover Operators for Multi-Objective Vehicle Routing Problem with Soft Time Windows
Authors:
Martin Josef Geiger
Abstract:
The article describes an investigation of the effectiveness of genetic algorithms for multi-objective combinatorial optimization (MOCO) by presenting an application for the vehicle routing problem with soft time windows. The work is motivated by the question, if and how the problem structure influences the effectiveness of different configurations of the genetic algorithm. Computational results…
▽ More
The article describes an investigation of the effectiveness of genetic algorithms for multi-objective combinatorial optimization (MOCO) by presenting an application for the vehicle routing problem with soft time windows. The work is motivated by the question, if and how the problem structure influences the effectiveness of different configurations of the genetic algorithm. Computational results are presented for different classes of vehicle routing problems, varying in their coverage with time windows, time window size, distribution and number of customers. The results are compared with a simple, but effective local search approach for multi-objective combinatorial optimization problems.
△ Less
Submitted 2 September, 2008;
originally announced September 2008.
-
Foundations of the Pareto Iterated Local Search Metaheuristic
Authors:
Martin Josef Geiger
Abstract:
The paper describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow sh…
▽ More
The paper describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow shop scheduling problems under multiple objectives. While the obtained results are encouraging in terms of their quality, another positive attribute of the approach is its' simplicity as it does require the setting of only very few parameters. The implementation of the Pareto Iterated Local Search metaheuristic is based on the MOOPPS computer system of local search heuristics for multi-objective scheduling which has been awarded the European Academic Software Award 2002 in Ronneby, Sweden (http://www.easa-award.net/, http://www.bth.se/llab/easa_2002.nsf)
△ Less
Submitted 2 September, 2008;
originally announced September 2008.
-
Randomised Variable Neighbourhood Search for Multi Objective Optimisation
Authors:
Martin Josef Geiger
Abstract:
Various local search approaches have recently been applied to machine scheduling problems under multiple objectives. Their foremost consideration is the identification of the set of Pareto optimal alternatives. An important aspect of successfully solving these problems lies in the definition of an appropriate neighbourhood structure. Unclear in this context remains, how interdependencies within…
▽ More
Various local search approaches have recently been applied to machine scheduling problems under multiple objectives. Their foremost consideration is the identification of the set of Pareto optimal alternatives. An important aspect of successfully solving these problems lies in the definition of an appropriate neighbourhood structure. Unclear in this context remains, how interdependencies within the fitness landscape affect the resolution of the problem.
The paper presents a study of neighbourhood search operators for multiple objective flow shop scheduling. Experiments have been carried out with twelve different combinations of criteria. To derive exact conclusions, small problem instances, for which the optimal solutions are known, have been chosen. Statistical tests show that no single neighbourhood operator is able to equally identify all Pareto optimal alternatives. Significant improvements however have been obtained by hybridising the solution algorithm using a randomised variable neighbourhood search technique.
△ Less
Submitted 1 September, 2008;
originally announced September 2008.