Search | arXiv e-print repository

Information Retrieval and Classification of Real-Time Multi-Source Hurricane Evacuation Notices

Authors: Tingting Zhao, Shubo Tian, Jordan Daly, Melissa Geiger, Minna Jia, **feng Zhang

Abstract: For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study… ▽ More For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study, we developed an approach to timely detect and track the locally issued hurricane evacuation notices. The text data were collected mainly with a spatially targeted web scra** method. They were manually labeled and then classified using natural language processing techniques with deep learning models. The classification of mandatory evacuation notices achieved a high accuracy (recall = 96%). We used Hurricane Ian (2022) to illustrate how real-time evacuation notices extracted from local government sources could be redistributed with a Web GIS system. Our method applied to future hurricanes provides live data for situation awareness to higher-level government agencies and news media. The archived data helps scholars to study government responses toward weather warnings and individual behaviors influenced by evacuation history. The framework may be applied to other types of disasters for rapid and targeted retrieval, classification, redistribution, and archiving of real-time government orders and notifications. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2311.17490 [pdf, other]

Multithreaded parallelism for heterogeneous clusters of QPUs

Authors: Philipp Seitz, Manuel Geiger, Christian B. Mendl

Abstract: In this work, we present MILQ, a quantum unrelated parallel machines scheduler and cutter. The setting of unrelated parallel machines considers independent hardware backends, each distinguished by differing setup and processing times. MILQ optimizes the total execution time of a batch of circuits scheduled on multiple quantum devices. It leverages state-of-the-art circuit-cutting techniques to fit… ▽ More In this work, we present MILQ, a quantum unrelated parallel machines scheduler and cutter. The setting of unrelated parallel machines considers independent hardware backends, each distinguished by differing setup and processing times. MILQ optimizes the total execution time of a batch of circuits scheduled on multiple quantum devices. It leverages state-of-the-art circuit-cutting techniques to fit circuits onto the devices and schedules them based on a mixed-integer linear program. Our results show a total improvement of up to 26 % compared to a baseline approach. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures, 1 table, 1 algorithm

arXiv:2311.16199 [pdf, other]

Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation

Authors: Ameya Daigavane, Song Kim, Mario Geiger, Tess Smidt

Abstract: We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equiva… ▽ More We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equivariant features. This allows a novel representation of probability distributions via spherical harmonic signals to efficiently model the 3D geometry of molecules. We show that Symphony is able to accurately generate small molecules from the QM9 dataset, outperforming existing autoregressive models and approaching the performance of diffusion models. △ Less

Submitted 16 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted at ICLR 2024

arXiv:2310.02508 [pdf, other]

Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders

Authors: Allan dos Santos Costa, Ilan Mitnikov, Mario Geiger, Manvitha Ponnapati, Tess Smidt, Joseph Jacobson

Abstract: Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model th… ▽ More Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all-atom protein structures. Our model departs from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions with efficient time complexity in protein length. We measure the reconstruction capabilities of Ophiuchus across different compression rates, and compare it to existing models. We examine the learned latent space and demonstrate its utility through conformational interpolation. Finally, we leverage denoising diffusion probabilistic models (DDPM) in the latent space to efficiently sample protein structures. Our experiments demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation. △ Less

Submitted 26 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.06167 [pdf, other]

Integration of Quantum Accelerators with High Performance Computing -- A Review of Quantum Programming Tools

Authors: Amr Elsharkawy, Xiao-Ting Michelle To, Philipp Seitz, Yanbin Chen, Yannick Stade, Manuel Geiger, Qunsheng Huang, Xiaorang Guo, Muhammad Arslan Ansari, Christian B. Mendl, Dieter Kranzlmüller, Martin Schulz

Abstract: Quantum computing (QC) introduces a novel mode of computation with the possibility of greater computational power that remains to be exploited - presenting exciting opportunities for high performance computing (HPC) applications. However, recent advancements in the field have made clear that QC does not supplant conventional HPC, but can rather be incorporated into current heterogeneous HPC infras… ▽ More Quantum computing (QC) introduces a novel mode of computation with the possibility of greater computational power that remains to be exploited - presenting exciting opportunities for high performance computing (HPC) applications. However, recent advancements in the field have made clear that QC does not supplant conventional HPC, but can rather be incorporated into current heterogeneous HPC infrastructures as an additional accelerator, thereby enabling the optimal utilization of both paradigms. The desire for such integration significantly affects the development of software for quantum computers, which in turn influences the necessary software infrastructure. To date, previous review papers have investigated various quantum programming tools (QPTs) (such as languages, libraries, frameworks) in their ability to program, compile, and execute quantum circuits. However, the integration effort with classical HPC frameworks or systems has not been addressed. This study aims to characterize existing QPTs from an HPC perspective, investigating if existing QPTs have the potential to be efficiently integrated with classical computing models and determining where work is still required. This work structures a set of criteria into an analysis blueprint that enables HPC scientists to assess whether a QPT is suitable for the quantum-accelerated classical application at hand. △ Less

Submitted 18 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 35 pages, 8 figures and 4 tables

arXiv:2306.00091 [pdf, other]

A General Framework for Equivariant Neural Networks on Reductive Lie Groups

Authors: Ilyes Batatia, Mario Geiger, Jose Munoz, Tess Smidt, Lior Silberman, Christoph Ortner

Abstract: Reductive Lie Groups, such as the orthogonal groups, the Lorentz group, or the unitary groups, play essential roles across scientific fields as diverse as high energy physics, quantum mechanics, quantum chromodynamics, molecular dynamics, computer vision, and imaging. In this paper, we present a general Equivariant Neural Network architecture capable of respecting the symmetries of the finite-dime… ▽ More Reductive Lie Groups, such as the orthogonal groups, the Lorentz group, or the unitary groups, play essential roles across scientific fields as diverse as high energy physics, quantum mechanics, quantum chromodynamics, molecular dynamics, computer vision, and imaging. In this paper, we present a general Equivariant Neural Network architecture capable of respecting the symmetries of the finite-dimensional representations of any reductive Lie Group G. Our approach generalizes the successful ACE and MACE architectures for atomistic point clouds to any data equivariant to a reductive Lie group action. We also introduce the lie-nn software library, which provides all the necessary tools to develop and implement such general G-equivariant neural networks. It implements routines for the reduction of generic tensor products of representations into irreducible representations, making it easy to apply our architecture to a wide range of problems and groups. The generality and performance of our approach are demonstrated by applying it to the tasks of top quark decay tagging (Lorentz group) and shape recognition (orthogonal group). △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2303.00351 [pdf, other]

doi 10.59275/j.melba.2024-7189

Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

Authors: Ivan Diaz, Mario Geiger, Richard Iain McKinley

Abstract: Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen… ▽ More Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen poses, smaller network size, and improved sample efficiency. Despite this, most segmentation networks used in medical image analysis continue to rely on standard convolutional kernels. In this paper, we present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. In addition, we demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks, with enhanced robustness to reduced amounts of training data and improved parameter efficiency. Code to reproduce our results, and to implement the equivariant segmentation networks for other tasks is available at http://github.com/SCAN-NRAD/e3nn_Unet △ Less

Submitted 17 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:010

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2301.13703 [pdf, other]

Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning

Authors: Antonio Sclocchi, Mario Geiger, Matthieu Wyart

Abstract: Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a k… ▽ More Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a key parameter that controls if the network is `lazy'($α\gg1$) or instead learns features ($α\ll1$). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the $(α,T)$ plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing $T$ or decreasing $α$ both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that the characteristic temperature $T_c$ where the noise of SGD starts affecting the trained model (and eventually performance) is a power law of $P$. We relate this finding with the observation that key dynamical quantities, such as the total variation of weights during training, depend on both $T$ and $P$ as power laws. These results indicate that a key effect of SGD noise occurs late in training by affecting the stop** process whereby all data are fitted. Indeed, we argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. A stronger signal and a longer training time are also required when the size of the training set $P$ increases. We confirm these views in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain. △ Less

Submitted 30 May, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: 25 pages, 21 figures, added analysis in feature-learning

arXiv:2209.11299 [pdf, other]

Deep Domain Adaptation for Detecting Bomb Craters in Aerial Images

Authors: Marco Geiger, Dominik Martin, Niklas Kühl

Abstract: The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promisin… ▽ More The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promising way to improve the UXO disposal process. However, these methods require a large amount of manually labeled training data. This work leverages domain adaptation with moon surface images to address the problem of automated bomb crater detection with deep learning under the constraint of limited training data. This paper contributes to both academia and practice (1) by providing a solution approach for automated bomb crater detection with limited training data and (2) by demonstrating the usability and associated challenges of using synthetic images for domain adaptation. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 56th Annual Hawaii International Conference on System Sciences (HICSS-56)

arXiv:2207.09453 [pdf, other]

e3nn: Euclidean Neural Networks

Authors: Mario Geiger, Tess Smidt

Abstract: We present e3nn, a generalized framework for creating E(3) equivariant trainable functions, also known as Euclidean neural networks. e3nn naturally operates on geometry and geometric tensors that describe systems in 3D and transform predictably under a change of coordinate system. The core of e3nn are equivariant operations such as the TensorProduct class or the spherical harmonics functions that… ▽ More We present e3nn, a generalized framework for creating E(3) equivariant trainable functions, also known as Euclidean neural networks. e3nn naturally operates on geometry and geometric tensors that describe systems in 3D and transform predictably under a change of coordinate system. The core of e3nn are equivariant operations such as the TensorProduct class or the spherical harmonics functions that can be composed to create more complex modules such as convolutions and attention mechanisms. These core operations of e3nn can be used to efficiently articulate Tensor Field Networks, 3D Steerable CNNs, Clebsch-Gordan Networks, SE(3) Transformers and other E(3) equivariant networks. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: draft

arXiv:2106.08849 [pdf, other]

How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem

Authors: Mario Geiger, Christophe Eloy, Matthieu Wyart

Abstract: Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance… ▽ More Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in a two-hypothesis testing problem, akin to the two-arm bandit problem. We compare two extreme cases: (i) the random access memory where any transitions between $M$ memory states are allowed and (ii) a fixed memory where the agent can access its last $m$ actions and rewards. For (i), the probability $q$ to play the worst arm is known to be exponentially small in $M$ for the optimal policy. Our main result is to show that similar performance can be reached for (ii) as well, despite the simplicity of the memory architecture: using a conjecture on Gray-ordered binary necklaces, we find policies for which $q$ is exponentially small in $2^m$, i.e. $q\simα^{2^m}$ with $α< 1$. In addition, we observe empirically that training from random initialization leads to very poor results for (i), and significantly better results for (ii) thanks to the constraints on the memory architecture. △ Less

Submitted 18 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

arXiv:2105.02468 [pdf, other]

doi 10.1088/1742-5468/ac98ac

Relative stability toward diffeomorphisms indicates performance in deep nets

Authors: Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart

Abstract: Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm t… ▽ More Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $ε_t$. It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures, we find $ε_t\approx 0.2\sqrt{R_f}$, suggesting that obtaining a small $R_f$ is important to achieve good performance. We study how $R_f$ depends on the size of the training set and compare it to a simple model of invariant learning. △ Less

Submitted 4 November, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: NeurIPS 2021 Conference

arXiv:2101.03164 [pdf, other]

doi 10.1038/s41467-022-29939-5

E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials

Authors: Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, Boris Kozinsky

Abstract: This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, res… ▽ More This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, resulting in a more information-rich and faithful representation of atomic environments. The method achieves state-of-the-art accuracy on a challenging and diverse set of molecules and materials while exhibiting remarkable data efficiency. NequIP outperforms existing models with up to three orders of magnitude fewer training data, challenging the widely held belief that deep neural networks require massive training sets. The high data efficiency of the method allows for the construction of accurate potentials using high-order quantum chemical level of theory as reference and enables high-fidelity molecular dynamics simulations over long time scales. △ Less

Submitted 16 December, 2021; v1 submitted 8 January, 2021; originally announced January 2021.

arXiv:2012.15110 [pdf, other]

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Authors: Mario Geiger, Leonardo Petrini, Matthieu Wyart

Abstract: Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure,… ▽ More Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,α)$ plane where $h$ is the network width and $α$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2008.08461 [pdf, other]

Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties

Authors: Benjamin Kurt Miller, Mario Geiger, Tess E. Smidt, Frank Noé

Abstract: Equivariant neural networks (ENNs) are graph neural networks embedded in $\mathbb{R}^3$ and are well suited for predicting molecular properties. The ENN library e3nn has customizable convolutions, which can be designed to depend only on distances between points, or also on angular features, making them rotationally invariant, or equivariant, respectively. This paper studies the practical value of… ▽ More Equivariant neural networks (ENNs) are graph neural networks embedded in $\mathbb{R}^3$ and are well suited for predicting molecular properties. The ENN library e3nn has customizable convolutions, which can be designed to depend only on distances between points, or also on angular features, making them rotationally invariant, or equivariant, respectively. This paper studies the practical value of including angular dependencies for molecular property prediction directly via an ablation study with \texttt{e3nn} and the QM9 data set. We find that, for fixed network depth and parameter count, adding angular features decreased test error by an average of 23%. Meanwhile, increasing network depth decreased test error by only 4% on average, implying that rotationally equivariant layers are comparatively parameter efficient. We present an explanation of the accuracy improvement on the dipole moment, the target which benefited most from the introduction of angular features. △ Less

Submitted 24 November, 2020; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: Machine Learning for Molecules Workshop at NeurIPS 2020, NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning

arXiv:2007.11471 [pdf, other]

doi 10.1088/1742-5468/abf1f3

Geometric compression of invariant manifolds in neural nets

Authors: Jonas Paccolat, Leonardo Petrini, Mario Geiger, Kevin Tyloo, Matthieu Wyart

Abstract: We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insens… ▽ More We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the $d_\perp=d-d_\parallel$ uninformative directions. These are effectively compressed by a factor $λ\sim \sqrt{p}$, where $p$ is the size of the training set. We quantify the benefit of such a compression on the test error $ε$. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that $ε\sim p^{-β}$, with $β_\text{Lazy} = d / (3d-2)$. Compression improves the learning curves so that $β_\text{Feature} = (2d-1)/(3d-2)$ if $d_\parallel = 1$ and $β_\text{Feature} = (d + d_\perp/2)/(3d-2)$ if $d_\parallel > 1$. We test these predictions for a stripe model where boundaries are parallel interfaces ($d_\parallel=1$) as well as for a cylindrical boundary ($d_\parallel=2$). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST, for which we also find $β_\text{Feature}>β_\text{Lazy}$. △ Less

Submitted 11 March, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Journal ref: Journal of Statistical Mechanics: Theory and Experiment, Volume 2021, April 2021

arXiv:2007.02005 [pdf, other]

doi 10.1103/PhysRevResearch.3.L012002

Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks

Authors: Tess E. Smidt, Mario Geiger, Benjamin Kurt Miller

Abstract: Curie's principle states that "when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them". We demonstrate that symmetry equivariant neural networks uphold Curie's principle and can be used to articulate many symmetry-relevant scientific questions into simple optimization problems. We prove these properties mathematically and demonstrate them numerically… ▽ More Curie's principle states that "when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them". We demonstrate that symmetry equivariant neural networks uphold Curie's principle and can be used to articulate many symmetry-relevant scientific questions into simple optimization problems. We prove these properties mathematically and demonstrate them numerically by training a Euclidean symmetry equivariant neural network to learn symmetry-breaking input to deform a square into a rectangle and to generate octahedra tilting patterns in perovskites. △ Less

Submitted 26 October, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Comments: 6 pages, 3 figures

Journal ref: Phys. Rev. Research 3, 012002 (2021)

arXiv:1906.08034 [pdf, other]

doi 10.1088/1742-5468/abc4de

Disentangling feature and lazy training in deep neural networks

Authors: Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart

Abstract: Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet… ▽ More Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as $αh^{-1/2}$ at initialization. By varying $α$ and $h$, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an $α^*$ that scales as $h^{-1/2}$. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations $δF$ induced on the learned function by initial conditions decay as $δF\sim 1/\sqrt{h}$, leading to a performance that increases with $h$. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale $t_1\sim\sqrt{h}α$, such that for $t\ll t_1$ the dynamics is linear. △ Less

Submitted 4 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: minor revisions

arXiv:1905.10843 [pdf, other]

doi 10.1088/1742-5468/abc61d

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Authors: Stefano Spigler, Mario Geiger, Matthieu Wyart

Abstract: How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression… ▽ More How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically $β$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $β$ depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than $n$. Using this idea we predict relate the exponent $β$ to an exponent $a$ describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract $a$ from real data by performing kernel PCA, leading to $β\approx0.36$ for MNIST and $β\approx0.07$ for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data. △ Less

Submitted 18 August, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

Comments: We added (i) the prediction of the exponent $β$ for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks")

arXiv:1905.05390 [pdf]

WatchOut: A Road Safety Extension for Pedestrians on a Public Windshield Display

Authors: Matthias Geiger, Changkun Ou, Cedric Quintes

Abstract: We conducted a field study to investigate whether public windshield displays are applicable as an additional interactive digital road safety warning sign. We focused on investigating the acceptance and usability of our novel public windshield display and its potential use for future applications. The study has shown that users are open-minded to the idea of an extraverted windshield display regard… ▽ More We conducted a field study to investigate whether public windshield displays are applicable as an additional interactive digital road safety warning sign. We focused on investigating the acceptance and usability of our novel public windshield display and its potential use for future applications. The study has shown that users are open-minded to the idea of an extraverted windshield display regardless the use case, whether it is used for safety purposes or different content. Contrary to our hypothesis most people assumed they would mistrust the system if it were as well established as traffic lights and primarily rely on their own perception. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1901.01608 [pdf, other]

doi 10.1088/1742-5468/ab633c

Scaling description of generalization with number of parameters in deep learning

Authors: Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

Abstract: Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over… ▽ More Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with $N$. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations $\|f_{N}-\bar{f}_{N}\|\sim N^{-1/4}$ of the neural net output function $f_{N}$ around its expectation $\bar{f}_{N}$. These affect the generalization error $ε_{N}$ for classification: under natural assumptions, it decays to a plateau value $ε_{\infty}$ in a power-law fashion $\sim N^{-1/2}$. This description breaks down at a so-called jamming transition $N=N^{*}$. At this threshold, we argue that $\|f_{N}\|$ diverges. This result leads to a plausible explanation for the cusp in test error known to occur at $N^{*}$. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond $N^{*}$, and averaging their outputs. △ Less

Submitted 8 October, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

Comments: The clarity of the text has been improved: the section "Related works" has been updated and the section "3.1 Regression task" has been added

arXiv:1811.02017 [pdf, other]

A General Theory of Equivariant CNNs on Homogeneous Spaces

Authors: Taco Cohen, Mario Geiger, Maurice Weiler

Abstract: We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space,… ▽ More We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also consider a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? Following Mackey, we show that such maps correspond one-to-one with convolutions using equivariant kernels, and characterize the space of such kernels. △ Less

Submitted 9 January, 2020; v1 submitted 5 November, 2018; originally announced November 2018.

Journal ref: Advances in Neural Information Processing Systems 32 (NeurIPS 2019) 9142-9153

arXiv:1810.09665 [pdf, other]

doi 10.1088/1751-8121/ab4c8b

A jamming transition from under- to over-parametrization affects loss landscape and generalization

Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks. △ Less

Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: arXiv admin note: text overlap with arXiv:1809.09349

arXiv:1809.09349 [pdf, other]

doi 10.1103/PhysRevE.100.012115

The jamming transition as a paradigm to understand the loss landscape of deep neural networks

Authors: Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart

Abstract: Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal… ▽ More Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in FC networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random data is independent of their depth. Our observations suggests that this independence also holds for real data. We also study a quantity $Δ$ which characterizes how well ($Δ<0$) or badly ($Δ>0$) a datum is learned. At the critical point it is power-law distributed, $P_+(Δ)\simΔ^θ$ for $Δ>0$ and $P_-(Δ)\sim(-Δ)^{-γ}$ for $Δ<0$, with $θ\approx0.3$ and $γ\approx0.2$. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned. △ Less

Submitted 17 June, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

Journal ref: Phys. Rev. E 100, 012115 (2019)

arXiv:1807.04950 [pdf, other]

Deep Learning in the Wild

Authors: Thilo Stadelmann, Mohammadreza Amirian, Ismail Arabaci, Marek Arnold, Gilbert François Duivesteijn, Ismail Elezi, Melanie Geiger, Stefan Lörwald, Benjamin Bruno Meier, Katharina Rombach, Lukas Tuggener

Abstract: Deep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific ch… ▽ More Deep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific challenges arising in the realm of real world tasks, based on case studies from research \& development in conjunction with industry, and extracts lessons learned from them. It thus fills a gap between the publication of latest algorithmic and methodical developments, and the usually omitted nitty-gritty of how to make them work. Specifically, we give insight into deep learning projects on face matching, print media monitoring, industrial quality control, music scanning, strategy game playing, and automated machine learning, thereby providing best practices for deep learning in practice. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: Invited paper on ANNPR 2018

arXiv:1807.02547 [pdf, other]

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

Authors: Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, Taco Cohen

Abstract: We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analyt… ▽ More We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry. △ Less

Submitted 27 October, 2018; v1 submitted 6 July, 2018; originally announced July 2018.

arXiv:1803.10743 [pdf, other]

Intertwiners between Induced Representations (with Applications to the Theory of Equivariant Neural Networks)

Authors: Taco S. Cohen, Mario Geiger, Maurice Weiler

Abstract: Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vect… ▽ More Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vector or tensor fields ("capsules") to represent data. In algebraic terms, the feature spaces in regular G-CNNs transform according to a regular representation of the group G, whereas the feature spaces in Steerable G-CNNs transform according to the more general induced representations of G. In order to make the network equivariant, each layer in a G-CNN is required to intertwine between the induced representations associated with its input and output space. In this paper we present a general mathematical framework for G-CNNs on homogeneous spaces like Euclidean space or the sphere. We show, using elementary methods, that the layers of an equivariant network are convolutional if and only if the input and output feature spaces transform according to an induced representation. This result, which follows from G.W. Mackey's abstract theory on induced representations, establishes G-CNNs as a universal class of equivariant network architectures, and generalizes the important recent work of Kondor & Trivedi on the intertwiners between regular representations. △ Less

Submitted 30 March, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

arXiv:1803.06969 [pdf, other]

doi 10.1088/1742-5468/ab3281

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized. △ Less

Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: 10 pages, 5 figures. Version accepted at ICML 2018

Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

arXiv:1801.10130 [pdf, other]

Spherical CNNs

Authors: Taco S. Cohen, Mario Geiger, Jonas Koehler, Max Welling

Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive a… ▽ More Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective. In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression. △ Less

Submitted 25 February, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

Comments: Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018

Journal ref: Proceedings of the International Conference on Learning Representations, 2018

arXiv:1709.04893 [pdf, other]

Convolutional Networks for Spherical Signals

Authors: Taco Cohen, Mario Geiger, Jonas Köhler, Max Welling

Abstract: The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and che… ▽ More The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and chemistry. In this paper we present spherical convolutional networks. These networks use convolutions on the sphere and rotation group, which results in rotational weight sharing and rotation equivariance. Using a synthetic spherical MNIST dataset, we show that spherical convolutional networks are very effective at dealing with rotationally invariant classification problems. △ Less

Submitted 15 September, 2017; v1 submitted 14 September, 2017; originally announced September 2017.

Journal ref: Principled Approaches to Deep Learning Workshop, ICML 2017

arXiv:1405.5643 [pdf, ps, other]

Interactive Reference Point-Based Guided Local Search for the Bi-objective Inventory Routing Problem

Authors: Sandra Huber, Martin Josef Geiger, Marc Sevaux

Abstract: Eliciting preferences of a decision maker is a key factor to successfully combine search and decision making in an interactive method. Therefore, the progressively integration and simulation of the decision maker is a main concern in an application. We contribute in this direction by proposing an interactive method based on a reference point-based guided local search to the bi-objective Inventory… ▽ More Eliciting preferences of a decision maker is a key factor to successfully combine search and decision making in an interactive method. Therefore, the progressively integration and simulation of the decision maker is a main concern in an application. We contribute in this direction by proposing an interactive method based on a reference point-based guided local search to the bi-objective Inventory Routing Problem. A local search metaheuristic, working on the delivery intervals, and the Clarke & Wright savings heuristic is employed for the subsequently obtained Vehicle Routing Problem. To elicit preferences, the decision maker selects a reference point to guide the search in interesting subregions. Additionally, the reference point is used as a reservation point to discard solutions outside the cone, introduced as a convergence criterion. Computational results of the reference point-based guided local search are reported and analyzed on benchmark data in order to show the applicability of the approach. △ Less

Submitted 22 May, 2014; originally announced May 2014.

Journal ref: Proceedings of the 10th Metaheuristics International Conference MIC 2013, August 5-8, 2013, Singapore, Pages 152-161

arXiv:1310.0602 [pdf, ps, other]

Iterated Variable Neighborhood Search for the resource constrained multi-mode multi-project scheduling problem

Authors: Martin Josef Geiger

Abstract: The resource constrained multi-mode multi-project scheduling problem (RCMMMPSP) is a notoriously difficult combinatorial optimization problem. For a given set of activities, feasible execution mode assignments and execution starting times must be found such that some optimization function, e.g. the makespan, is optimized. When determining an optimal (or at least feasible) assignment of decision va… ▽ More The resource constrained multi-mode multi-project scheduling problem (RCMMMPSP) is a notoriously difficult combinatorial optimization problem. For a given set of activities, feasible execution mode assignments and execution starting times must be found such that some optimization function, e.g. the makespan, is optimized. When determining an optimal (or at least feasible) assignment of decision variable values, a set of side constraints, such as resource availabilities, precedence constraints, etc., has to be respected. In 2013, the MISTA 2013 Challenge stipulated research in the RCMMMPSP. It's goal was the solution of a given set of instances under running time restrictions. We have contributed to this challenge with the here presented approach. △ Less

Submitted 2 October, 2013; originally announced October 2013.

Journal ref: In: Graham Kendall, Greet Vanden Berghe, and Barry McCollum (editors): Proceedings of the 6th Multidisciplinary International Conference on Scheduling: Theory and Applications, August 27-29, 2013, Gent, Belgium, pages 807-811

arXiv:1204.4051 [pdf, ps, other]

Solution Representations and Local Search for the bi-objective Inventory Routing Problem

Authors: Thibaut Barthélemy, Martin Josef Geiger, Marc Sevaux

Abstract: The solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work cont… ▽ More The solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work contributes to the better understanding of such solution representations. On the basis of an experimental investigation, the advantages and drawbacks of two encodings are studied and compared. △ Less

Submitted 18 April, 2012; originally announced April 2012.

Comments: Proceedings of EU/ME 2012, Workshop on Metaheuristics for Global Challenges, May 10-11, 2012, Copenhagen, Denmark

arXiv:1109.3313 [pdf, ps, other]

Neigborhood Selection in Variable Neighborhood Search

Authors: Martin Josef Geiger, Marc Sevaux, Stefan Voss

Abstract: Variable neighborhood search (VNS) is a metaheuristic for solving optimization problems based on a simple principle: systematic changes of neighborhoods within the search, both in the descent to local minima and in the escape from the valleys which contain them. Designing these neighborhoods and applying them in a meaningful fashion is not an easy task. Moreover, an appropriate order in which they… ▽ More Variable neighborhood search (VNS) is a metaheuristic for solving optimization problems based on a simple principle: systematic changes of neighborhoods within the search, both in the descent to local minima and in the escape from the valleys which contain them. Designing these neighborhoods and applying them in a meaningful fashion is not an easy task. Moreover, an appropriate order in which they are applied must be determined. In this paper we attempt to investigate this issue. Assume that we are given an optimization problem that is intended to be solved by applying the VNS scheme, how many and which types of neighborhoods should be investigated and what could be appropriate selection criteria to apply these neighborhoods. More specifically, does it pay to "look ahead" (see, e.g., in the context of VNS and GRASP) when attempting to switch from one neighborhood to another? △ Less

Submitted 15 September, 2011; originally announced September 2011.

Comments: ISBN 978-88-900984-3-7

Journal ref: Proceedings of the 9th Metaheuristics International Conference MIC 2011, July 25-28, 2011, Udine, Italy, Pages 571-573

arXiv:1109.3094 [pdf, ps, other]

On the use of reference points for the biobjective Inventory Routing Problem

Authors: Martin Josef Geiger, Marc Sevaux

Abstract: The article presents a study on the biobjective inventory routing problem. Contrary to most previous research, the problem is treated as a true multi-objective optimization problem, with the goal of identifying Pareto-optimal solutions. Due to the hardness of the problem at hand, a reference point based optimization approach is presented and implemented into an optimization and decision support sy… ▽ More The article presents a study on the biobjective inventory routing problem. Contrary to most previous research, the problem is treated as a true multi-objective optimization problem, with the goal of identifying Pareto-optimal solutions. Due to the hardness of the problem at hand, a reference point based optimization approach is presented and implemented into an optimization and decision support system, which allows for the computation of a true subset of the optimal outcomes. Experimental investigation involving local search metaheuristics are conducted on benchmark data, and numerical results are reported and analyzed. △ Less

Submitted 14 September, 2011; originally announced September 2011.

Journal ref: Proceedings of the 9th Metaheuristics International Conference MIC 2011, July 25-28, 2011, Udine, Italy, Pages 141-149

arXiv:1102.5635 [pdf, ps, other]

Practical inventory routing: A problem definition and an optimization method

Authors: Martin Josef Geiger, Marc Sevaux

Abstract: The global objective of this work is to provide practical optimization methods to companies involved in inventory routing problems, taking into account this new type of data. Also, companies are sometimes not able to deal with changing plans every period and would like to adopt regular structures for serving customers. The global objective of this work is to provide practical optimization methods to companies involved in inventory routing problems, taking into account this new type of data. Also, companies are sometimes not able to deal with changing plans every period and would like to adopt regular structures for serving customers. △ Less

Submitted 28 February, 2011; originally announced February 2011.

Journal ref: Proceedings of the EU/MEeting 2011 - Workshop on Client-Centered Logistics and International Aid, February 21-22, 2011, pages 32-35

arXiv:1004.4734 [pdf, ps, other]

On the comparison of plans: Proposition of an instability measure for dynamic machine scheduling

Authors: Martin Josef Geiger

Abstract: On the basis of an analysis of previous research, we present a generalized approach for measuring the difference of plans with an exemplary application to machine scheduling. Our work is motivated by the need for such measures, which are used in dynamic scheduling and planning situations. In this context, quantitative approaches are needed for the assessment of the robustness and stability of sche… ▽ More On the basis of an analysis of previous research, we present a generalized approach for measuring the difference of plans with an exemplary application to machine scheduling. Our work is motivated by the need for such measures, which are used in dynamic scheduling and planning situations. In this context, quantitative approaches are needed for the assessment of the robustness and stability of schedules. Obviously, any `robustness' or `stability' of plans has to be defined w. r. t. the particular situation and the requirements of the human decision maker. Besides the proposition of an instability measure, we therefore discuss possibilities of obtaining meaningful information from the decision maker for the implementation of the introduced approach. △ Less

Submitted 27 April, 2010; originally announced April 2010.

Journal ref: Proceedings of the 25th Mini EURO Conference on Uncertainty and Robustness in Planning and Decision Making, April 15-17, 2010, Coimbra, Portugal. ISBN 978-989-95055-3-7.

arXiv:0907.2993 [pdf, ps, other]

Improvements for multi-objective flow shop scheduling by Pareto Iterated Local Search

Authors: Martin Josef Geiger

Abstract: The article describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow… ▽ More The article describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow shop scheduling problems under multiple objectives and compared to other local search approaches. While the obtained results are encouraging in terms of their quality, another positive attribute of the approach is its simplicity as it does require the setting of only very few parameters. △ Less

Submitted 17 July, 2009; originally announced July 2009.

Journal ref: Proceedings of the 8th Metaheuristics International Conference MIC 2009, July 13-16, 2009, Hamburg, Germany, pp 195.1-195.10

arXiv:0907.2990 [pdf, ps, other]

The Single Machine Total Weighted Tardiness Problem - Is it (for Metaheuristics) a Solved Problem ?

Authors: Martin Josef Geiger

Abstract: The article presents a study of rather simple local search heuristics for the single machine total weighted tardiness problem (SMTWTP), namely hillclimbing and Variable Neighborhood Search. In particular, we revisit these approaches for the SMTWTP as there appears to be a lack of appropriate/challenging benchmark instances in this case. The obtained results are impressive indeed. Only few instan… ▽ More The article presents a study of rather simple local search heuristics for the single machine total weighted tardiness problem (SMTWTP), namely hillclimbing and Variable Neighborhood Search. In particular, we revisit these approaches for the SMTWTP as there appears to be a lack of appropriate/challenging benchmark instances in this case. The obtained results are impressive indeed. Only few instances remain unsolved, and even those are approximated within 1% of the optimal/best known solutions. Our experiments support the claim that metaheuristics for the SMTWTP are very likely to lead to good results, and that, before refining search strategies, more work must be done with regard to the proposition of benchmark data. Some recommendations for the construction of such data sets are derived from our investigations. △ Less

Submitted 17 July, 2009; originally announced July 2009.

Journal ref: Proceedings of the 8th Metaheuristics International Conference MIC 2009, July 13-16, Hamburg, Germany, pp. 141.1-141.10

arXiv:0809.1077 [pdf, ps, other]

Variable Neighborhood Search for the University Lecturer-Student Assignment Problem

Authors: Martin Josef Geiger, Wolf Wenger

Abstract: The paper presents a study of local search heuristics in general and variable neighborhood search in particular for the resolution of an assignment problem studied in the practical work of universities. Here, students have to be assigned to scientific topics which are proposed and supported by members of staff. The problem involves the optimization under given preferences of students which may b… ▽ More The paper presents a study of local search heuristics in general and variable neighborhood search in particular for the resolution of an assignment problem studied in the practical work of universities. Here, students have to be assigned to scientific topics which are proposed and supported by members of staff. The problem involves the optimization under given preferences of students which may be expressed when applying for certain topics. It is possible to observe that variable neighborhood search leads to superior results for the tested problem instances. One instance is taken from an actual case, while others have been generated based on the real world data to support the analysis with a deeper analysis. An extension of the problem has been formulated by integrating a second objective function that simultaneously balances the workload of the members of staff while maximizing utility of the students. The algorithmic approach has been prototypically implemented in a computer system. One important aspect in this context is the application of the research work to problems of other scientific institutions, and therefore the provision of decision support functionalities. △ Less

Submitted 5 September, 2008; originally announced September 2008.

Comments: Proceedings of the 18th Mini Euro Conference on Variable Neighborhood Search, November 23-25, 2005, Puerto de La Cruz, Tenerife, Spain, ISBN 84-689-5679-1

arXiv:0809.0961 [pdf, ps, other]

MOOPPS: An Optimization System for Multi Objective Scheduling

Authors: Martin Josef Geiger

Abstract: In the current paper, we present an optimization system solving multi objective production scheduling problems (MOOPPS). The identification of Pareto optimal alternatives or at least a close approximation of them is possible by a set of implemented metaheuristics. Necessary control parameters can easily be adjusted by the decision maker as the whole software is fully menu driven. This allows the… ▽ More In the current paper, we present an optimization system solving multi objective production scheduling problems (MOOPPS). The identification of Pareto optimal alternatives or at least a close approximation of them is possible by a set of implemented metaheuristics. Necessary control parameters can easily be adjusted by the decision maker as the whole software is fully menu driven. This allows the comparison of different metaheuristic algorithms for the considered problem instances. Results are visualized by a graphical user interface showing the distribution of solutions in outcome space as well as their corresponding Gantt chart representation. The identification of a most preferred solution from the set of efficient solutions is supported by a module based on the aspiration interactive method (AIM). The decision maker successively defines aspiration levels until a single solution is chosen. After successfully competing in the finals in Ronneby, Sweden, the MOOPPS software has been awarded the European Academic Software Award 2002 (http://www.bth.se/llab/easa_2002.nsf) △ Less

Submitted 5 September, 2008; originally announced September 2008.

Journal ref: Proceedings of the Metaheuristics International Conference MIC 2005, Vienna, Austria, pp. 403-408

arXiv:0809.0757 [pdf, ps, other]

An application of the Threshold Accepting metaheuristic for curriculum based course timetabling

Authors: Martin Josef Geiger

Abstract: The article presents a local search approach for the solution of timetabling problems in general, with a particular implementation for competition track 3 of the International Timetabling Competition 2007 (ITC 2007). The heuristic search procedure is based on Threshold Accepting to overcome local optima. A stochastic neighborhood is proposed and implemented, randomly removing and reassigning eve… ▽ More The article presents a local search approach for the solution of timetabling problems in general, with a particular implementation for competition track 3 of the International Timetabling Competition 2007 (ITC 2007). The heuristic search procedure is based on Threshold Accepting to overcome local optima. A stochastic neighborhood is proposed and implemented, randomly removing and reassigning events from the current solution. The overall concept has been incrementally obtained from a series of experiments, which we describe in each (sub)section of the paper. In result, we successfully derived a potential candidate solution approach for the finals of track 3 of the ITC 2007. △ Less

Submitted 4 September, 2008; originally announced September 2008.

Journal ref: Proceedings of the 7th International Conference on the Practice and Theory of Automated Timetabling PATAT 2008, August 19-22, Montreal, Canada

arXiv:0809.0755 [pdf, ps, other]

Bin Packing Under Multiple Objectives - a Heuristic Approximation Approach

Authors: Martin Josef Geiger

Abstract: The article proposes a heuristic approximation approach to the bin packing problem under multiple objectives. In addition to the traditional objective of minimizing the number of bins, the heterogeneousness of the elements in each bin is minimized, leading to a biobjective formulation of the problem with a tradeoff between the number of bins and their heterogeneousness. An extension of the Best-… ▽ More The article proposes a heuristic approximation approach to the bin packing problem under multiple objectives. In addition to the traditional objective of minimizing the number of bins, the heterogeneousness of the elements in each bin is minimized, leading to a biobjective formulation of the problem with a tradeoff between the number of bins and their heterogeneousness. An extension of the Best-Fit approximation algorithm is presented to solve the problem. Experimental investigations have been carried out on benchmark instances of different size, ranging from 100 to 1000 items. Encouraging results have been obtained, showing the applicability of the heuristic approach to the described problem. △ Less

Submitted 4 September, 2008; originally announced September 2008.

Journal ref: The Fourth International Conference on Evolutionary Multi-Criterion Optimization: Late Breaking Papers, Matsushima, Japan, March 2007, pp. 53-56

arXiv:0809.0753 [pdf, ps, other]

Proposition of the Interactive Pareto Iterated Local Search Procedure - Elements and Initial Experiments

Authors: Martin Josef Geiger

Abstract: The article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker. An application of the approach to biobjective portfolio optimization, modeled… ▽ More The article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker. An application of the approach to biobjective portfolio optimization, modeled as the well-known knapsack problem, is reported, and experimental results are reported for benchmark instances taken from the literature. In brief, we obtain encouraging results that show the applicability of the approach to the described problem. △ Less

Submitted 4 September, 2008; originally announced September 2008.

Journal ref: The Fourth International Conference on Evolutionary Multi-Criterion Optimization: Late Breaking Papers, Matsushima, Japan, March 2007, pp. 19-23

arXiv:0809.0662 [pdf]

Improving Local Search for Fuzzy Scheduling Problems

Authors: Martin Josef Geiger, Sanja Petrovic

Abstract: The integration of fuzzy set theory and fuzzy logic into scheduling is a rather new aspect with growing importance for manufacturing applications, resulting in various unsolved aspects. In the current paper, we investigate an improved local search technique for fuzzy scheduling problems with fitness plateaus, using a multi criteria formulation of the problem. We especially address the problem of… ▽ More The integration of fuzzy set theory and fuzzy logic into scheduling is a rather new aspect with growing importance for manufacturing applications, resulting in various unsolved aspects. In the current paper, we investigate an improved local search technique for fuzzy scheduling problems with fitness plateaus, using a multi criteria formulation of the problem. We especially address the problem of changing job priorities over time as studied at the Sherwood Press Ltd, a Nottingham based printing company, who is a collaborator on the project. △ Less

Submitted 3 September, 2008; originally announced September 2008.

Journal ref: Proceedings of the Post Graduate Research Conference in Electronics, Photonics, Communications & Networks and Computing Science PREP 2004, University of Hertfordshire, Great Britain, pp. 146-147

arXiv:0809.0610 [pdf, ps, other]

A framework for the interactive resolution of multi-objective vehicle routing problems

Authors: Martin Josef Geiger, Wolf Wenger

Abstract: The article presents a framework for the resolution of rich vehicle routing problems which are difficult to address with standard optimization techniques. We use local search on the basis on variable neighborhood search for the construction of the solutions, but embed the techniques in a flexible framework that allows the consideration of complex side constraints of the problem such as time wind… ▽ More The article presents a framework for the resolution of rich vehicle routing problems which are difficult to address with standard optimization techniques. We use local search on the basis on variable neighborhood search for the construction of the solutions, but embed the techniques in a flexible framework that allows the consideration of complex side constraints of the problem such as time windows, multiple depots, heterogeneous fleets, and, in particular, multiple optimization criteria. In order to identify a compromise alternative that meets the requirements of the decision maker, an interactive procedure is integrated in the resolution of the problem, allowing the modification of the preference information articulated by the decision maker. The framework is prototypically implemented in a computer system. First results of test runs on multiple depot vehicle routing problems with time windows are reported. △ Less

Submitted 3 September, 2008; originally announced September 2008.

Comments: Proceedings of the 7th EU/ME Workshop: Adaptive, Self-Adaptive, and Multi-Level Metaheuristics, Malaga, Spain, November 16-17, 2006

arXiv:0809.0416 [pdf, ps, other]

Genetic Algorithms for multiple objective vehicle routing

Authors: Martin Josef Geiger

Abstract: The talk describes a general approach of a genetic algorithm for multiple objective optimization problems. A particular dominance relation between the individuals of the population is used to define a fitness operator, enabling the genetic algorithm to adress even problems with efficient, but convex-dominated alternatives. The algorithm is implemented in a multilingual computer program, solving… ▽ More The talk describes a general approach of a genetic algorithm for multiple objective optimization problems. A particular dominance relation between the individuals of the population is used to define a fitness operator, enabling the genetic algorithm to adress even problems with efficient, but convex-dominated alternatives. The algorithm is implemented in a multilingual computer program, solving vehicle routing problems with time windows under multiple objectives. The graphical user interface of the program shows the progress of the genetic algorithm and the main parameters of the approach can be easily modified. In addition to that, the program provides powerful decision support to the decision maker. The software has proved it's excellence at the finals of the European Academic Software Award EASA, held at the Keble college/ University of Oxford/ Great Britain. △ Less

Submitted 2 September, 2008; originally announced September 2008.

Journal ref: Proceedings of the Metaheuristics International Conference MIC'2001, Porto, Portugal, pp. 349-353

arXiv:0809.0410 [pdf, ps, other]

A Computational Study of Genetic Crossover Operators for Multi-Objective Vehicle Routing Problem with Soft Time Windows

Authors: Martin Josef Geiger

Abstract: The article describes an investigation of the effectiveness of genetic algorithms for multi-objective combinatorial optimization (MOCO) by presenting an application for the vehicle routing problem with soft time windows. The work is motivated by the question, if and how the problem structure influences the effectiveness of different configurations of the genetic algorithm. Computational results… ▽ More The article describes an investigation of the effectiveness of genetic algorithms for multi-objective combinatorial optimization (MOCO) by presenting an application for the vehicle routing problem with soft time windows. The work is motivated by the question, if and how the problem structure influences the effectiveness of different configurations of the genetic algorithm. Computational results are presented for different classes of vehicle routing problems, varying in their coverage with time windows, time window size, distribution and number of customers. The results are compared with a simple, but effective local search approach for multi-objective combinatorial optimization problems. △ Less

Submitted 2 September, 2008; originally announced September 2008.

Journal ref: Habenicht, W. et al. (eds.): Multi-Criteria- und Fuzzy Systeme in Theorie und Praxis-Loesungsansaetze fuer Entscheidungsprobleme mit komplexen Zielsystemen, 2003, ISBN 3-8244-7864-1, pp. 191-207

arXiv:0809.0406 [pdf, ps, other]

Foundations of the Pareto Iterated Local Search Metaheuristic

Authors: Martin Josef Geiger

Abstract: The paper describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow sh… ▽ More The paper describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow shop scheduling problems under multiple objectives. While the obtained results are encouraging in terms of their quality, another positive attribute of the approach is its' simplicity as it does require the setting of only very few parameters. The implementation of the Pareto Iterated Local Search metaheuristic is based on the MOOPPS computer system of local search heuristics for multi-objective scheduling which has been awarded the European Academic Software Award 2002 in Ronneby, Sweden (http://www.easa-award.net/, http://www.bth.se/llab/easa_2002.nsf) △ Less

Submitted 2 September, 2008; originally announced September 2008.

Comments: Proceedings of the 18th International Conference on Multiple Criteria Decision Making, Chania, Greece, June 19-23, 2006

arXiv:0809.0271 [pdf, ps, other]

Randomised Variable Neighbourhood Search for Multi Objective Optimisation

Authors: Martin Josef Geiger

Abstract: Various local search approaches have recently been applied to machine scheduling problems under multiple objectives. Their foremost consideration is the identification of the set of Pareto optimal alternatives. An important aspect of successfully solving these problems lies in the definition of an appropriate neighbourhood structure. Unclear in this context remains, how interdependencies within… ▽ More Various local search approaches have recently been applied to machine scheduling problems under multiple objectives. Their foremost consideration is the identification of the set of Pareto optimal alternatives. An important aspect of successfully solving these problems lies in the definition of an appropriate neighbourhood structure. Unclear in this context remains, how interdependencies within the fitness landscape affect the resolution of the problem. The paper presents a study of neighbourhood search operators for multiple objective flow shop scheduling. Experiments have been carried out with twelve different combinations of criteria. To derive exact conclusions, small problem instances, for which the optimal solutions are known, have been chosen. Statistical tests show that no single neighbourhood operator is able to equally identify all Pareto optimal alternatives. Significant improvements however have been obtained by hybridising the solution algorithm using a randomised variable neighbourhood search technique. △ Less

Submitted 1 September, 2008; originally announced September 2008.

Journal ref: Proceedings of the 4th EU/ME Workshop: Design and Evaluation of Advanced Hybrid Meta-Heuristics, November 4--5, Nottingham, United Kingdom, pp. 34-42

Showing 1–50 of 50 results for author: Geiger, M