Search | arXiv e-print repository

Multivector Neurons: Better and Faster O(n)-Equivariant Clifford Graph Neural Networks

Authors: Cong Liu, David Ruhe, Patrick Forré

Abstract: Most current deep learning models equivariant to $O(n)$ or $SO(n)$ either consider mostly scalar information such as distances and angles or have a very high computational complexity. In this work, we test a few novel message passing graph neural networks (GNNs) based on Clifford multivectors, structured similarly to other prevalent equivariant models in geometric deep learning. Our approach lever… ▽ More Most current deep learning models equivariant to $O(n)$ or $SO(n)$ either consider mostly scalar information such as distances and angles or have a very high computational complexity. In this work, we test a few novel message passing graph neural networks (GNNs) based on Clifford multivectors, structured similarly to other prevalent equivariant models in geometric deep learning. Our approach leverages efficient invariant scalar features while simultaneously performing expressive learning on multivector representations, particularly through the use of the equivariant geometric product operator. By integrating these elements, our methods outperform established efficient baseline models on an N-Body simulation task and protein denoising task while maintaining a high efficiency. In particular, we push the state-of-the-art error on the N-body dataset to 0.0035 (averaged over 3 runs); an 8% improvement over recent methods. Our implementation is available on Github. △ Less

Submitted 10 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2402.14730 [pdf, other]

Clifford-Steerable Convolutional Neural Networks

Authors: Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, Patrick Forré

Abstract: We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincaré-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrizatio… ▽ More We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincaré-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrization of $\mathrm{O}(p,q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks. △ Less

Submitted 6 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: accepted to ICML 2024

arXiv:2402.10011 [pdf, other]

Clifford Group Equivariant Simplicial Message Passing Networks

Authors: Cong Liu, David Ruhe, Floor Eijkelboom, Patrick Forré

Abstract: We introduce Clifford Group Equivariant Simplicial Message Passing Networks, a method for steerable E(n)-equivariant message passing on simplicial complexes. Our method integrates the expressivity of Clifford group-equivariant layers with simplicial message passing, which is topologically more intricate than regular graph message passing. Clifford algebras include higher-order objects such as bive… ▽ More We introduce Clifford Group Equivariant Simplicial Message Passing Networks, a method for steerable E(n)-equivariant message passing on simplicial complexes. Our method integrates the expressivity of Clifford group-equivariant layers with simplicial message passing, which is topologically more intricate than regular graph message passing. Clifford algebras include higher-order objects such as bivectors and trivectors, which express geometric features (e.g., areas, volumes) derived from vectors. Using this knowledge, we represent simplex features through geometric products of their vertices. To achieve efficient simplicial message passing, we share the parameters of the message network across different dimensions. Additionally, we restrict the final message to an aggregation of the incoming messages from different dimensions, leading to what we term shared simplicial message passing. Experimental results show that our method is able to outperform both equivariant and simplicial graph neural networks on a variety of geometric tasks. △ Less

Submitted 12 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.09470 [pdf, other]

Rolling Diffusion Models

Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom

Abstract: Diffusion models have recently been increasingly applied to temporal data such as video, fluid mechanics simulations, or climate data. These methods generally treat subsequent frames equally regarding the amount of noise in the diffusion process. This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process. It ensures that the diffusion process progressively c… ▽ More Diffusion models have recently been increasingly applied to temporal data such as video, fluid mechanics simulations, or climate data. These methods generally treat subsequent frames equally regarding the amount of noise in the diffusion process. This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process. It ensures that the diffusion process progressively corrupts through time by assigning more noise to frames that appear later in a sequence, reflecting greater uncertainty about the future as the generation process unfolds. Empirically, we show that when the temporal dynamics are complex, Rolling Diffusion is superior to standard diffusion. In particular, this result is demonstrated in a video prediction task using the Kinetics-600 video dataset and in a chaotic fluid dynamics forecasting experiment. △ Less

Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2312.04237 [pdf, other]

A candidate coherent radio flash following a neutron star merger

Authors: A. Rowlinson, I. de Ruiter, R. L. C. Starling, K. M. Rajwade, A. Hennessy, R. A. M. J. Wijers, G. E. Anderson, M. Mevius, D. Ruhe, K. Gourdji, A. J. van der Horst, S. ter Veen, K. Wiersema

Abstract: In this paper, we present rapid follow-up observations of the short GRB 201006A, consistent with being a compact binary merger, using the LOw Frequency ARray (LOFAR). We have detected a candidate 5.6$σ$, short, coherent radio flash at 144 MHz at 76.6 mins post-GRB with a 3$σ$ duration of 38 seconds. This radio flash is 27 arcsec offset from the GRB location, which has a probability of occurring by… ▽ More In this paper, we present rapid follow-up observations of the short GRB 201006A, consistent with being a compact binary merger, using the LOw Frequency ARray (LOFAR). We have detected a candidate 5.6$σ$, short, coherent radio flash at 144 MHz at 76.6 mins post-GRB with a 3$σ$ duration of 38 seconds. This radio flash is 27 arcsec offset from the GRB location, which has a probability of occurring by chance of $\sim$0.05% (3.8$σ$) when accounting for measurement uncertainties. Despite the offset, we show that the probability of finding an unrelated transient within 40 arcsec of the GRB location is $<10^{-6}$ and conclude that this is a candidate radio counterpart to GRB 201006A. We performed image plane dedispersion and the radio flash is tentatively (2.4$σ$) shown to be highly dispersed, allowing a distance estimate, corresponding to a redshift of $0.58\pm0.06$. The corresponding luminosity of the event at this distance is $6.7^{+6.6}_{-4.4} \times 10^{32}$ erg s$^{-1}$ Hz$^{-1}$. If associated with GRB 201006A, this emission would indicate prolonged activity from the central engine that is consistent with being a newborn, supramassive, likely highly magnetised, millisecond spin neutron star (a magnetar). △ Less

Submitted 28 May, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: Submitted

arXiv:2311.07394 [pdf, other]

Transient study using LoTSS -- framework development and preliminary results

Authors: Iris de Ruiter, Zachary S. Meyers, Antonia Rowlinson, Timothy W. Shimwell, David Ruhe, Ralph A. M. J. Wijers

Abstract: We present a search for transient radio sources on time-scales of seconds to hours at 144 MHz using the LOFAR Two-metre Sky Survey (LoTSS). This search is conducted by examining short time-scale images derived from the LoTSS data. To allow imaging of LoTSS on short time-scales, a novel imaging and filtering strategy is introduced. This includes sky model source subtraction, no cleaning or primary… ▽ More We present a search for transient radio sources on time-scales of seconds to hours at 144 MHz using the LOFAR Two-metre Sky Survey (LoTSS). This search is conducted by examining short time-scale images derived from the LoTSS data. To allow imaging of LoTSS on short time-scales, a novel imaging and filtering strategy is introduced. This includes sky model source subtraction, no cleaning or primary beam correction, a simple source finder, fast filtering schemes and source catalogue matching. This new strategy is first tested by injecting simulated transients, with a range of flux densities and durations, into the data. We find the limiting sensitivity to be 113 and 6 mJy for 8 second and 1 hour transients respectively. The new imaging and filtering strategies are applied to 58 fields of the LoTSS survey, corresponding to LoTSS-DR1 (2% of the survey). One transient source is identified in the 8 second and 2 minute snapshot images. The source shows one minute duration flare in the 8 hour observation. Our method puts the most sensitive constraints on/estimates of the transient surface density at low frequencies at time-scales of seconds to hours; $<4.0\cdot 10^{-4} \; \text{deg}^{-2}$ at 1 hour at a sensitivity of 6.3 mJy; $5.7\cdot 10^{-7} \; \text{deg}^{-2}$ at 2 minutes at a sensitivity of 30 mJy; and $3.6\cdot 10^{-8} \; \text{deg}^{-2}$ at 8 seconds at a sensitivity of 113 mJy. In the future, we plan to apply the strategies presented in this paper to all LoTSS data. △ Less

Submitted 14 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: submitted to MNRAS

arXiv:2306.00608 [pdf, other]

On the Effectiveness of Hybrid Mutual Information Estimation

Authors: Marco Federici, David Ruhe, Patrick Forré

Abstract: Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative me… ▽ More Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative method that can be easily combined with discriminative estimators for minimal computational overhead. Our propositions yield a tighter bound on the information thanks to the reduced variance of the estimator. We test our methods on a challenging task of correlated high-dimensional Gaussian distributions and a stochastic process involving a system of free particles subjected to a fixed energy landscape. Empirical results show that hybrid methods consistently improved mutual information estimates when compared to the corresponding discriminative counterpart. △ Less

Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.11141 [pdf, other]

Clifford Group Equivariant Neural Networks

Authors: David Ruhe, Johannes Brandstetter, Patrick Forré

Abstract: We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{O}(n)$- and $\mathrm{E}(n)$-equivariant models. We identify and study the $\textit{Clifford group}$, a subgroup inside the Clifford algebra tailored to achieve several favorable properties. Primarily, the group's action forms an orthogonal automorphism that extends beyond the typical vector space to… ▽ More We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{O}(n)$- and $\mathrm{E}(n)$-equivariant models. We identify and study the $\textit{Clifford group}$, a subgroup inside the Clifford algebra tailored to achieve several favorable properties. Primarily, the group's action forms an orthogonal automorphism that extends beyond the typical vector space to the entire Clifford algebra while respecting the multivector grading. This leads to several non-equivalent subrepresentations corresponding to the multivector decomposition. Furthermore, we prove that the action respects not just the vector space structure of the Clifford algebra but also its multiplicative structure, i.e., the geometric product. These findings imply that every polynomial in multivectors, An advantage worth mentioning is that we obtain expressive layers that can elegantly generalize to inner-product spaces of any dimension. We demonstrate, notably from a single core implementation, state-of-the-art performance on several distinct tasks, including a three-dimensional $n$-body experiment, a four-dimensional Lorentz-equivariant high-energy physics experiment, and a five-dimensional convex hull experiment. △ Less

Submitted 22 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Published at NeurIPS 2023 (Oral)

arXiv:2302.06594 [pdf, other]

Geometric Clifford Algebra Networks

Authors: David Ruhe, Jayesh K. Gupta, Steven de Keninck, Max Welling, Johannes Brandstetter

Abstract: We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the $\mathrm{Pin}(p,q,r)$ group. We then propose the concept of group action layers, which linearly… ▽ More We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the $\mathrm{Pin}(p,q,r)$ group. We then propose the concept of group action layers, which linearly combine object transformations using pre-specified group actions. Together with a new activation and normalization scheme, these layers serve as adjustable $\textit{geometric templates}$ that can be refined via gradient descent. Theoretical advantages are strongly reflected in the modeling of three-dimensional rigid body transformations as well as large-scale fluid dynamics simulations, showing significantly improved performance over traditional methods. △ Less

Submitted 29 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

arXiv:2211.09008 [pdf, other]

Normalizing Flows for Hierarchical Bayesian Analysis: A Gravitational Wave Population Study

Authors: David Ruhe, Kaze Wong, Miles Cranmer, Patrick Forré

Abstract: We propose parameterizing the population distribution of the gravitational wave population modeling framework (Hierarchical Bayesian Analysis) with a normalizing flow. We first demonstrate the merit of this method on illustrative experiments and then analyze four parameters of the latest LIGO/Virgo data release: primary mass, secondary mass, redshift, and effective spin. Our results show that desp… ▽ More We propose parameterizing the population distribution of the gravitational wave population modeling framework (Hierarchical Bayesian Analysis) with a normalizing flow. We first demonstrate the merit of this method on illustrative experiments and then analyze four parameters of the latest LIGO/Virgo data release: primary mass, secondary mass, redshift, and effective spin. Our results show that despite the small and notoriously noisy dataset, the posterior predictive distributions (assuming a prior over the parameters of the flow) of the observed gravitational wave population recover structure that agrees with robust previous phenomenological modeling results while being less susceptible to biases introduced by less flexible models. Therefore, the method forms a promising flexible, reliable replacement for population inference distributions, even when data is highly noisy. △ Less

Submitted 29 December, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2107.13349 [pdf, other]

Self-Supervised Inference in State-Space Models

Authors: David Ruhe, Patrick Forré

Abstract: We perform approximate inference in state-space models with nonlinear state transitions. Without parameterizing a generative model, we apply Bayesian update formulas using a local linearity approximation parameterized by neural networks. This comes accompanied by a maximum likelihood objective that requires no supervision via uncorrupt observations or ground truth latent states. The optimization b… ▽ More We perform approximate inference in state-space models with nonlinear state transitions. Without parameterizing a generative model, we apply Bayesian update formulas using a local linearity approximation parameterized by neural networks. This comes accompanied by a maximum likelihood objective that requires no supervision via uncorrupt observations or ground truth latent states. The optimization backpropagates through a recursion similar to the classical Kalman filter and smoother. Additionally, using an approximate conditional independence, we can perform smoothing without having to parameterize a separate model. In scientific applications, domain knowledge can give a linear approximation of the latent transition maps, which we can easily incorporate into our model. Usage of such domain knowledge is reflected in excellent results (despite our model's simplicity) on the chaotic Lorenz system compared to fully supervised and variational inference methods. Finally, we show competitive results on an audio denoising experiment. △ Less

Submitted 25 January, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

arXiv:2103.15418 [pdf, other]

Detecting Dispersed Radio Transients in Real Time Using Convolutional Neural Networks

Authors: David Ruhe, Mark Kuiack, Antonia Rowlinson, Ralph Wijers, Patrick Forré

Abstract: We present a methodology for automated real-time analysis of a radio image data stream with the goal to find transient sources. Contrary to previous works, the transients we are interested in occur on a time-scale where dispersion starts to play a role, so we must search a higher-dimensional data space and yet work fast enough to keep up with the data stream in real time. The approach consists of… ▽ More We present a methodology for automated real-time analysis of a radio image data stream with the goal to find transient sources. Contrary to previous works, the transients we are interested in occur on a time-scale where dispersion starts to play a role, so we must search a higher-dimensional data space and yet work fast enough to keep up with the data stream in real time. The approach consists of five main steps: quality control, source detection, association, flux measurement, and physical parameter inference. We present parallelized methods based on convolutions and filters that can be accelerated on a GPU, allowing the pipeline to run in real-time. In the parameter inference step, we apply a convolutional neural network to dynamic spectra that were obtained from the preceding steps. It infers physical parameters, among which the dispersion measure of the transient candidate. Based on critical values of these parameters, an alert can be sent out and data will be saved for further investigation. Experimentally, the pipeline is applied to simulated data and images from AARTFAAC (Amsterdam Astron Radio Transients Facility And Analysis Centre), a transients facility based on the Low-Frequency Array (LOFAR). Results on simulated data show the efficacy of the pipeline, and from real data it discovered dispersed pulses. The current work targets transients on time scales that are longer than the fast transients of beam-formed search, but shorter than slow transients in which dispersion matters less. This fills a methodological gap that is relevant for the upcoming Square-Kilometer Array (SKA). Additionally, since real-time analysis can be performed, only data with promising detections can be saved to disk, providing a solution to the big-data problem that modern astronomy is dealing with. △ Less

Submitted 6 August, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:1906.08619 [pdf, other]

Bayesian Modelling in Practice: Using Uncertainty to Improve Trustworthiness in Medical Applications

Authors: David Ruhe, Giovanni Cinà, Michele Tonutti, Daan de Bruin, Paul Elbers

Abstract: The Intensive Care Unit (ICU) is a hospital department where machine learning has the potential to provide valuable assistance in clinical decision making. Classical machine learning models usually only provide point-estimates and no uncertainty of predictions. In practice, uncertain predictions should be presented to doctors with extra care in order to prevent potentially catastrophic treatment d… ▽ More The Intensive Care Unit (ICU) is a hospital department where machine learning has the potential to provide valuable assistance in clinical decision making. Classical machine learning models usually only provide point-estimates and no uncertainty of predictions. In practice, uncertain predictions should be presented to doctors with extra care in order to prevent potentially catastrophic treatment decisions. In this work we show how Bayesian modelling and the predictive uncertainty that it provides can be used to mitigate risk of misguided prediction and to detect out-of-domain examples in a medical setting. We derive analytically a bound on the prediction loss with respect to predictive uncertainty. The bound shows that uncertainty can mitigate loss. Furthermore, we apply a Bayesian Neural Network to the MIMIC-III dataset, predicting risk of mortality of ICU patients. Our empirical results show that uncertainty can indeed prevent potential errors and reliably identifies out-of-domain patients. These results suggest that Bayesian predictive uncertainty can greatly improve trustworthiness of machine learning models in high-risk settings such as the ICU. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: Presented at AISG @ ICML2019: https://aiforsocialgood.github.io/icml2019/index.htm

Showing 1–13 of 13 results for author: Ruhe, D