Search | arXiv e-print repository

Dynamical Measure Transport and Neural PDE Solvers for Sampling

Authors: **gtong Sun, Julius Berner, Lorenz Richter, Marius Zeinhofer, Johannes Müller, Kamyar Azizzadenesheli, Anima Anandkumar

Abstract: The task of sampling from a probability density can be approached as transporting a tractable density function to the target, known as dynamical measure transport. In this work, we tackle it through a principled unified framework using deterministic or stochastic evolutions described by partial differential equations (PDEs). This framework incorporates prior trajectory-based sampling methods, such… ▽ More The task of sampling from a probability density can be approached as transporting a tractable density function to the target, known as dynamical measure transport. In this work, we tackle it through a principled unified framework using deterministic or stochastic evolutions described by partial differential equations (PDEs). This framework incorporates prior trajectory-based sampling methods, such as diffusion models or Schrödinger bridges, without relying on the concept of time-reversals. Moreover, it allows us to propose novel numerical methods for solving the transport task and thus sampling from complicated targets without the need for the normalization constant or data samples. We employ physics-informed neural networks (PINNs) to approximate the respective PDE solutions, implying both conceptional and computational advantages. In particular, PINNs allow for simulation- and discretization-free optimization and can be trained very efficiently, leading to significantly better mode coverage in the sampling task compared to alternative methods. Moreover, they can readily be fine-tuned with Gauss-Newton methods to achieve high accuracy in sampling. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2406.11292 [pdf, other]

Daedalus 2: Autorotation Entry, Descent and Landing Experiment on REXUS29

Authors: Philip Bergmann, Clemens Riegler, Zuri Klaschka, Tobias Herbst, Jan M. Wolf, Maximilian Reigl, Niels Koch, Sarah Menninger, Jan von Pichowski, Cedric Bös, Bence Barthó, Frederik Dunschen, Johanna Mehringer, Ludwig Richter, Lennart Werner

Abstract: In recent years, interplanetary exploration has gained significant momentum, leading to a focus on the development of launch vehicles. However, the critical technology of edl mechanisms has not received the same level of attention and remains less mature and capable. To address this gap, we took advantage of the REXUS program to develop a pioneering edl mechanism. We propose an alternative to conv… ▽ More In recent years, interplanetary exploration has gained significant momentum, leading to a focus on the development of launch vehicles. However, the critical technology of edl mechanisms has not received the same level of attention and remains less mature and capable. To address this gap, we took advantage of the REXUS program to develop a pioneering edl mechanism. We propose an alternative to conventional, parachute based landing vehicles by utilizing autorotation. Our approach enables future additions such as steerability, controllability, and the possibility of a soft landing. To validate the technique and our specific implementation, we conducted a sounding rocket experiment on REXUS29. The systems design is outlined with relevant design decisions and constraints, covering software, mechanics, electronics and control systems. Furthermore, an emphasis will also be the organization and setup of the team entirely made up and executed by students. The flight results on REXUS itself are presented, including the most important outcomes and possible reasons for mission failure. We have not archived an autorotation based landing, but provide a reliable way of building and operating such vehicles. Ultimately, future works and possibilities for improvements are outlined. The research presented in this paper highlights the need for continued exploration and development of edl mechanisms for future interplanetary missions. By discussing our results, we hope to inspire further research in this area and contribute to the advancement of space exploration technology. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 9 figures

arXiv:2406.04940 [pdf, other]

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Authors: Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

Abstract: Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote… ▽ More Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 9 content pages, 11 reference pages, 9 appendix pages

arXiv:2405.03549 [pdf, other]

Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models

Authors: Ludwig Winkler, Lorenz Richter, Manfred Opper

Abstract: Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes give… ▽ More Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes given by SDEs. In particular, we revisit the $\textit{Ehrenfest process}$, which converges to an Ornstein-Uhlenbeck process in the infinite state space limit. Likewise, we can show that the time-reversal of the Ehrenfest process converges to the time-reversed Ornstein-Uhlenbeck process. This observation bridges discrete and continuous state spaces and allows to carry over methods from one to the respective other setting. Additionally, we suggest an algorithm for training the time-reversal of Markov jump processes which relies on conditional expectations and can thus be directly related to denoising score matching. We demonstrate our methods in multiple convincing numerical experiments. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.15881 [pdf, other]

Fast and Unified Path Gradient Estimators for Normalizing Flows

Authors: Lorenz Vaitl, Ludwig Winkler, Lorenz Richter, Pan Kessel

Abstract: Recent work shows that path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference, resulting in improved training. However, they are often prohibitively more expensive from a computational point of view and cannot be applied to maximum likelihood training in a scalable manner, which severely hinders their widespread adoption. In thi… ▽ More Recent work shows that path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference, resulting in improved training. However, they are often prohibitively more expensive from a computational point of view and cannot be applied to maximum likelihood training in a scalable manner, which severely hinders their widespread adoption. In this work, we overcome these crucial limitations. Specifically, we propose a fast path gradient estimator which improves computational efficiency significantly and works for all normalizing flow architectures of practical relevance. We then show that this estimator can also be applied to maximum likelihood training for which it has a regularizing effect as it can take the form of a given target energy function into account. We empirically establish its superior performance and reduced variance for several natural sciences applications. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.08763 [pdf, other]

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English$\rightarrow$English) and a stronger distribution shift (English$\rightarrow$German) at the $405$M parameter model scale with large dataset sizes (hundreds of billions of tokens). Selecting the weak but realistic shift for larger-scale experiments, we also find that our continual learning strategies match the re-training baseline for a 10B parameter LLM. Our results demonstrate that LLMs can be successfully updated via simple and scalable continual learning strategies, matching the re-training baseline using only a fraction of the compute. Finally, inspired by previous work, we propose alternatives to the cosine learning rate schedule that help circumvent forgetting induced by LR re-warming and that are not bound to a fixed token budget. △ Less

Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2312.07341 [pdf, other]

Unconventional crystal structure of the high-pressure superconductor La$_3$Ni$_2$O$_7$

Authors: Pascal Puphal, Pascal Reiss, Niklas Enderlein, Yu-Mi Wu, Giniyat Khaliullin, Vignesh Sundaramurthy, Tim Priessnitz, Manuel Knauft, Lea Richter, Masahiko Isobe, Peter A. van Aken, Hidenori Takagi, Bernhard Keimer, Y. Eren Suyolcu, Björn Wehinger, Philipp Hansmann, Matthias Hepting

Abstract: The discovery of high-temperature superconductivity in La$_3$Ni$_2$O$_7$ at pressures above 14 GPa has spurred extensive research efforts. Yet, fundamental aspects of the superconducting phase, including the possibility of a filamentary character, are currently subjects of controversial debates. Conversely, a crystal structure with NiO$_6$ octahedral bilayers stacked along the $c$-axis direction w… ▽ More The discovery of high-temperature superconductivity in La$_3$Ni$_2$O$_7$ at pressures above 14 GPa has spurred extensive research efforts. Yet, fundamental aspects of the superconducting phase, including the possibility of a filamentary character, are currently subjects of controversial debates. Conversely, a crystal structure with NiO$_6$ octahedral bilayers stacked along the $c$-axis direction was consistently posited in initial studies on La$_3$Ni$_2$O$_7$. Here we reassess this structure in optical floating zone-grown La$_3$Ni$_2$O$_7$ single crystals that show signs of filamentary superconductivity. Employing scanning transmission electron microscopy and single-crystal x-ray diffraction under high pressures, we observe multiple crystallographic phases in these crystals, with the majority phase exhibiting alternating monolayers and trilayers of NiO$_6$ octahedra, signifying a profound deviation from the previously suggested bilayer structure. Using density functional theory, we disentangle the individual contributions of the monolayer and trilayer structural units to the electronic band structure of La$_3$Ni$_2$O$_7$, providing a firm basis for advanced theoretical modeling and future evaluations of the potential of the monolayer-trilayer structure for hosting superconductivity. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.07275 [pdf, other]

The SARAO MeerKAT 1.3 GHz Galactic Plane Survey

Authors: S. Goedhart, W. D. Cotton, F. Camilo, M. A. Thompson, G. Umana, M. Bietenholz, P. A. Woudt, L. D. Anderson, C. Bordiu, D. A. H. Buckley, C. S. Buemi, F. Bufano, F. Cavallaro, H. Chen, J. O. Chibueze, D. Egbo, B. S. Frank, M. G. Hoare, A. Ingallinera, T. Irabor, R. C. Kraan-Korteweg, S. Kurapati, P. Leto, S. Loru, M. Mutale , et al. (105 additional authors not shown)

Abstract: We present the SARAO MeerKAT Galactic Plane Survey (SMGPS), a 1.3 GHz continuum survey of almost half of the Galactic Plane (251°$\le l \le$ 358°and 2°$\le l \le$ 61°at $|b| \le 1.5°$). SMGPS is the largest, most sensitive and highest angular resolution 1 GHz survey of the Plane yet carried out, with an angular resolution of 8" and a broadband RMS sensitivity of $\sim$10--20 $μ$ Jy/beam. Here we d… ▽ More We present the SARAO MeerKAT Galactic Plane Survey (SMGPS), a 1.3 GHz continuum survey of almost half of the Galactic Plane (251°$\le l \le$ 358°and 2°$\le l \le$ 61°at $|b| \le 1.5°$). SMGPS is the largest, most sensitive and highest angular resolution 1 GHz survey of the Plane yet carried out, with an angular resolution of 8" and a broadband RMS sensitivity of $\sim$10--20 $μ$ Jy/beam. Here we describe the first publicly available data release from SMGPS which comprises data cubes of frequency-resolved images over 908--1656 MHz, power law fits to the images, and broadband zeroth moment integrated intensity images. A thorough assessment of the data quality and guidance for future usage of the data products are given. Finally, we discuss the tremendous potential of SMGPS by showcasing highlights of the Galactic and extragalactic science that it permits. These highlights include the discovery of a new population of non-thermal radio filaments; identification of new candidate supernova remnants, pulsar wind nebulae and planetary nebulae; improved radio/mid-IR classification of rare Luminous Blue Variables and discovery of associated extended radio nebulae; new radio stars identified by Bayesian cross-matching techniques; the realisation that many of the largest radio-quiet WISE HII region candidates are not true HII regions; and a large sample of previously undiscovered background HI galaxies in the Zone of Avoidance. △ Less

Submitted 2 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted for publication in MNRAS. The data release is live and links can be found in the Data Availability Statement in the paper

arXiv:2311.04100 [pdf, other]

Graph-controlled Permutation Mixers in QAOA for the Flexible Job-Shop Problem

Authors: Lilly Palackal, Leonhard Richter, Maximilian Hess

Abstract: One of the most promising attempts towards solving optimization problems with quantum computers in the noisy intermediate scale era of quantum computing are variational quantum algorithms. The Quantum Alternating Operator Ansatz provides an algorithmic framework for constrained, combinatorial optimization problems. As opposed to the better known standard QAOA protocol, the constraints of the optim… ▽ More One of the most promising attempts towards solving optimization problems with quantum computers in the noisy intermediate scale era of quantum computing are variational quantum algorithms. The Quantum Alternating Operator Ansatz provides an algorithmic framework for constrained, combinatorial optimization problems. As opposed to the better known standard QAOA protocol, the constraints of the optimization problem are built into the mixing layers of the ansatz circuit, thereby limiting the search to the much smaller Hilbert space of feasible solutions. In this work we develop mixing operators for a wide range of scheduling problems including the flexible job shop problem. These mixing operators are based on a special control scheme defined by a constraint graph model. After describing an explicit construction of those mixing operators, they are proven to be feasibility preserving, as well as exploring the feasible subspace. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the HICSS 2024 for possible publication

arXiv:2308.04014 [pdf, other]

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t… ▽ More Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset. △ Less

Submitted 6 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.15496 [pdf, other]

From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs

Authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken

Abstract: The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue… ▽ More The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.02454 [pdf, other]

Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

Authors: Carsten Hartmann, Lorenz Richter

Abstract: The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Un… ▽ More The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Understanding the specifics of DL, as compared to, say, other forms of nonlinear regression methods or statistical learning, is interesting from a mathematical perspective, but at the same time it is of crucial importance in practice: treating neural networks as mere black boxes might be sufficient in certain cases, but many applications require waterproof performance guarantees and a deeper understanding of what could go wrong and why it could go wrong. It is probably fair to say that, despite being mathematically well founded as a method to approximate complicated functions, DL is mostly still more like modern alchemy that is firmly in the hands of engineers and computer scientists. Nevertheless, it is evident that certain specifics of DL that could explain its success in applications demands systematic mathematical approaches. In this work, we review robustness issues of DL and particularly bridge concerns and attempts from approximation theory to statistical learning theory. Further, we review Bayesian Deep Learning as a means for uncertainty quantification and rigorous explainability. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.01198 [pdf, other]

Improved sampling via learned diffusions

Authors: Lorenz Richter, Julius Berner

Abstract: Recently, a series of papers proposed deep learning-based approaches to sample from target distributions using controlled diffusion processes, being trained only on the unnormalized target densities without access to samples. Building on previous work, we identify these approaches as special cases of a generalized Schrödinger bridge problem, seeking a stochastic evolution between a given prior dis… ▽ More Recently, a series of papers proposed deep learning-based approaches to sample from target distributions using controlled diffusion processes, being trained only on the unnormalized target densities without access to samples. Building on previous work, we identify these approaches as special cases of a generalized Schrödinger bridge problem, seeking a stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches. △ Less

Submitted 23 May, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted at ICLR 2024

Journal ref: International Conference on Learning Representations, 2024

arXiv:2306.00637 [pdf, other]

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

Authors: Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

Abstract: We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly… ▽ More We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results. Our approach also improves the quality of text-conditioned image generation based on our user preference study. The training requirements of our approach consists of 24,602 A100-GPU hours - compared to Stable Diffusion 2.1's 200,000 GPU hours. Our approach also requires less training data to achieve these results. Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance. In a broader comparison against SOTA models our approach is substantially more efficient and compares favorably in terms of image quality. We believe that this work motivates more emphasis on the prioritization of both performance and computational accessibility. △ Less

Submitted 29 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Corresponding to "Würstchen v2"

Journal ref: The Twelfth International Conference on Learning Representations (ICLR), 2024

arXiv:2301.06684 [pdf, other]

Co-analytic Counterexamples to Marstrand's Projection Theorem

Authors: Linus Richter

Abstract: Assuming $V=L$, we construct a plane set $E$ of Hausdorff dimension $1$ whose every orthogonal projection onto straight lines through the origin has Hausdorff dimension $0$. This is a counterexample to J. M. Marstrand's seminal projection theorem. While counterexamples had already been constructed decades ago, initially by R. O. Davies, the novelty of our result lies in the fact that $E$ is co-ana… ▽ More Assuming $V=L$, we construct a plane set $E$ of Hausdorff dimension $1$ whose every orthogonal projection onto straight lines through the origin has Hausdorff dimension $0$. This is a counterexample to J. M. Marstrand's seminal projection theorem. While counterexamples had already been constructed decades ago, initially by R. O. Davies, the novelty of our result lies in the fact that $E$ is co-analytic. Following Marstrand's original proof (and R. Kaufman's newer, and now standard, approach based on capacities), a counterexample to the projection theorem cannot be analytic, hence our counterexample is optimal. We then extend the result in a strong way: we show that for each $ε\in (0, 1)$ there exists a co-analytic set $E_ε$ of dimension $1 + ε$, each of whose orthogonal projections onto straight lines through the origin has Hausdorff dimension $ε$. The constructions of $E$ and $E_ε$ are by induction on the countable ordinals, applying a theorem by Z. Vidnyanszky. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: 35 pages, 3 figures

MSC Class: 03D32 (Primary) 28A75; 28A80 (Secondary)

arXiv:2212.12364 [pdf, other]

The new APD-Based Readout of the Crystal Barrel Calorimeter -- An Overview

Authors: CBELSA/TAPS Collaboration, :, C. Honisch, P. Klassen, J. Müllers, M. Urban, F. Afzal, J. Bieling, S. Ciupka, J. Hartmann, P. Hoffmeister, M. Lang, D. Schaab, C. Schmidt, M. Steinacher, D. Walther, R. Beck, K. -T. Brinkmann, V. Crede, H. Dutz, D. Elsner, W. Erni, E. Fix, F. Frommberger, M. Grüner , et al. (26 additional authors not shown)

Abstract: The Crystal Barrel is an electromagnetic calorimeter consisting of 1380 CsI(Tl) scintillators, and is currently installed at the CBELSA/TAPS experiment where it is used to detect decay products from photoproduction of mesons. The readout of the Crystal Barrel has been upgraded in order to integrate the detector into the first level of the trigger and to increase its sensitivity for neutral final s… ▽ More The Crystal Barrel is an electromagnetic calorimeter consisting of 1380 CsI(Tl) scintillators, and is currently installed at the CBELSA/TAPS experiment where it is used to detect decay products from photoproduction of mesons. The readout of the Crystal Barrel has been upgraded in order to integrate the detector into the first level of the trigger and to increase its sensitivity for neutral final states. The new readout uses avalanche photodiodes in the front-end and a dual back-end with branches optimized for energy and time measurement, respectively. An FPGA-based cluster finder processes the whole hit pattern within less than 100 ns. The important downside of APDs -- the temperature dependence of their gain -- is handled with a temperature stabilization and a compensating bias voltage supply. Additionally, a light pulser system allows the APDs' gains to be measured during beamtimes. △ Less

Submitted 16 January, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

arXiv:2211.14487 [pdf, other]

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

Authors: Mats L. Richter, Christopher Pal

Abstract: Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experi… ▽ More Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experiments. By further develo** and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model. This allows us to optimize the parameter-efficiency of a given architecture at low cost. Our method is computationally simple and can be done in an automated manner or even manually with minimal effort for most common architectures. We demonstrate the effectiveness of this approach by increasing parameter efficiency across past and current top-performing CNN-architectures. Specifically, our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes, including: VGG Nets, MobileNetV1, MobileNetV3, NASNet A (mobile), MnasNet, EfficientNet, and ConvNeXt - leading to a new SOTA result for each model class. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.11183 [pdf, other]

Causal Fairness Assessment of Treatment Allocation with Electronic Health Records

Authors: Linying Zhang, Lauren R. Richter, Yixin Wang, Anna Ostropolets, Noemie Elhadad, David M. Blei, George Hripcsak

Abstract: Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about b… ▽ More Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about bias. However, the application of causal fairness notions in evaluating the fairness of clinical decision-making with electronic health record (EHR) data remains an understudied domain. This study aims to address the methodological gap in assessing causal fairness of treatment allocation with electronic health records data. We propose a causal fairness algorithm to assess fairness in clinical decision-making. Our algorithm accounts for the heterogeneity of patient populations and identifies potential unfairness in treatment allocation by conditioning on patients who have the same likelihood to benefit from the treatment. We apply this framework to a patient cohort with coronary artery disease derived from an EHR database to evaluate the fairness of treatment decisions. In addition, we investigate the impact of social determinants of health on the assessment of causal fairness of treatment allocation. △ Less

Submitted 7 January, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.01517 [pdf]

doi 10.1021/jacs.2c11468

Hydration of a side-chain-free n-type semiconducting ladder polymer driven by electrochemical do**

Authors: Jiajie Guo, Lucas Q. Flagg, Duyen K. Tran, Shinya E. Chen, Ruipeng Li, Nagesh B. Kolhe, Rajiv Giridharagopal, Samson A. Jenekhe, Lee J. Richter, David S. Ginger

Abstract: We study the organic electrochemical transistors (OECTs) performance of the ladder polymer, poly(benzimidazobenzophenanthroline) (BBL) in an attempt to better understand how an apparently hydrophobic side-chain-free polymer is able to operate as an OECT with favorable redox kinetics in an aqueous environment. We examine two BBLs of different molecular masses from different sources. Both BBLs show… ▽ More We study the organic electrochemical transistors (OECTs) performance of the ladder polymer, poly(benzimidazobenzophenanthroline) (BBL) in an attempt to better understand how an apparently hydrophobic side-chain-free polymer is able to operate as an OECT with favorable redox kinetics in an aqueous environment. We examine two BBLs of different molecular masses from different sources. Both BBLs show significant film swelling during the initial reduction step. By combining electrochemical quartz crystal microbalance (eQCM) gravimetry, in-operando atomic force microscopy (AFM), and both ex-situ and in-operando grazing incidence wide-angle x-ray scattering (GIWAXS), we provide a detailed structural picture of the electrochemical charge injection process in BBL in the absence of any hydrophilic side-chains. Compared with ex-situ measurements, in-operando GIWAXS shows both more swelling upon electrochemical do** than has previously been recognized, and less contraction upon dedo**. The data show that BBL films undergo an irreversible hydration driven by the initial electrochemical do** cycle with significant water retention and lamellar expansion that persists across subsequent oxidation/reduction cycles. This swelling creates a hydrophilic environment that facilitates the subsequent fast hydrated ion transport in the absence of the hydrophilic side-chains used in many other polymer systems. Due to its rigid ladder backbone and absence of hydrophilic side-chains, the primary BBL water uptake does not significantly degrade the crystalline order, and the original dehydrated, unswelled state can be recovered after drying. The combination of do** induced hydrophilicity and robust crystalline order leads to efficient ionic transport and good stability. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 24 pages, 5 figures

arXiv:2211.01364 [pdf, other]

An optimal control perspective on diffusion-based generative modeling

Authors: Julius Berner, Lorenz Richter, Karen Ullrich

Abstract: We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control t… ▽ More We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback-Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples. △ Less

Submitted 26 March, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted for oral presentation at NeurIPS 2022 Workshop on Score-Based Methods

Journal ref: Transactions on Machine Learning Research, 2024

arXiv:2206.10588 [pdf, other]

Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning

Authors: Lorenz Richter, Julius Berner

Abstract: The combination of Monte Carlo methods and deep learning has recently led to efficient algorithms for solving partial differential equations (PDEs) in high dimensions. Related learning problems are often stated as variational formulations based on associated stochastic differential equations (SDEs), which allow the minimization of corresponding losses using gradient-based optimization methods. In… ▽ More The combination of Monte Carlo methods and deep learning has recently led to efficient algorithms for solving partial differential equations (PDEs) in high dimensions. Related learning problems are often stated as variational formulations based on associated stochastic differential equations (SDEs), which allow the minimization of corresponding losses using gradient-based optimization methods. In respective numerical implementations it is therefore crucial to rely on adequate gradient estimators that exhibit low variance in order to reach convergence accurately and swiftly. In this article, we rigorously investigate corresponding numerical aspects that appear in the context of linear Kolmogorov PDEs. In particular, we systematically compare existing deep learning approaches and provide theoretical explanations for their performances. Subsequently, we suggest novel methods that can be shown to be more robust both theoretically and numerically, leading to substantial performance improvements. △ Less

Submitted 5 August, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted at ICML 2022

Journal ref: Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 18649-18666

arXiv:2206.06628 [pdf, other]

doi 10.1137/22M1503464

Improving control based importance sampling strategies for metastable diffusions via adapted metadynamics

Authors: Enric Ribera Borrell, Jannes Quer, Lorenz Richter, Christof Schütte

Abstract: Sampling rare events in metastable dynamical systems is often a computationally expensive task and one needs to resort to enhanced sampling methods such as importance sampling. Since we can formulate the problem of finding optimal importance sampling controls as a stochastic optimization problem, this then brings additional numerical challenges and the convergence of corresponding algorithms might… ▽ More Sampling rare events in metastable dynamical systems is often a computationally expensive task and one needs to resort to enhanced sampling methods such as importance sampling. Since we can formulate the problem of finding optimal importance sampling controls as a stochastic optimization problem, this then brings additional numerical challenges and the convergence of corresponding algorithms might as well suffer from metastabilty. In this article, we address this issue by combining systematic control approaches with the heuristic adaptive metadynamics method. Crucially, we approximate the importance sampling control by a neural network, which makes the algorithm in principle feasible for high-dimensional applications. We can numerically demonstrate in relevant metastable problems that our algorithm is more effective than previous attempts and that only the combination of the two approaches leads to a satisfying convergence and therefore to an efficient sampling in certain metastable settings. △ Less

Submitted 3 October, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

MSC Class: 49-XX; 62-XX; 68-XX

arXiv:2112.03749 [pdf, other]

Interpolating between BSDEs and PINNs: deep learning for elliptic and parabolic boundary value problems

Authors: Nikolas Nüsken, Lorenz Richter

Abstract: Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting… ▽ More Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting on reformulations in terms of $\textit{backward stochastic differential equations}$ (BSDEs) and those aiming to minimize a regression-type $L^2$-error ($\textit{physics-informed neural networks}$, PINNs). In this paper, we review the literature and suggest a methodology based on the novel $\textit{diffusion loss}$ that interpolates between BSDEs and PINNs. Our contribution opens the door towards a unified understanding of numerical approaches for high-dimensional PDEs, as well as for implementations that combine the strengths of BSDEs and PINNs. The diffusion loss furthermore bears close similarities to $\textit{(least squares) temporal difference}$ objectives found in reinforcement learning. We also discuss eigenvalue problems and perform extensive numerical studies, including calculations of the ground state for nonlinear Schrödinger operators and committor functions relevant in molecular dynamics. △ Less

Submitted 29 January, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2110.05155 [pdf, other]

doi 10.1140/epja/s10050-021-00634-1

Observation of a structure in the M$_{pη}$ invariant mass distribution near 1700 MeV/$c^2$ in the $\mathbf{γp \rightarrow p π^0 η} $ reaction

Authors: V. Metag, M. Nanova, J. Hartmann, P. Mahlberg, F. Afzal, C. Bartels, D. Bayadilov, R. Beck, M. Becker, E. Blanke, K. -T. Brinkmann, S. Ciupka, V. Crede, M. Dieterle, H. Dutz, D. Elsner, F. Frommberger, A. Gridnev, M. Gottschall, M. Grüner, Ch. Hammann, J. Hannappel, W. Hillert, J. Hoff, Ph. Hoffmeister , et al. (52 additional authors not shown)

Abstract: The reaction $γp \rightarrow p π^0 η$ has been studied with the CBELSA/TAPS detector at the electron stretcher accelerator ELSA in Bonn for incident photon energies from threshold up to 3.1 GeV. This paper has been motivated by the recently claimed observation of a narrow structure in the M$_{Nη}$ invariant mass distribution at a mass of 1678 MeV/$c^2$. The existence of this structure cannot be co… ▽ More The reaction $γp \rightarrow p π^0 η$ has been studied with the CBELSA/TAPS detector at the electron stretcher accelerator ELSA in Bonn for incident photon energies from threshold up to 3.1 GeV. This paper has been motivated by the recently claimed observation of a narrow structure in the M$_{Nη}$ invariant mass distribution at a mass of 1678 MeV/$c^2$. The existence of this structure cannot be confirmed in the present work. Instead, for E$_γ$ = 1400 - 1500 MeV and the cut M$_{pπ^0} \le 1190 $ MeV/$c^2$ a statistically significant structure in the M$_{pη}$ invariant mass distribution near 1700 MeV/$c^2$ is observed with a width of $Γ\approx 35$ MeV/$c^2$ while the mass resolution is $σ_{res}$ = 5 MeV/$c^2$. Increasing the incident photon energy from 1420 to 1540 MeV this structure shifts in mass from $\approx$ 1700MeV/c$^2$ to $\approx$ 1725 MeV/$c^2$; the width increases to about 50 MeV/$c^2$ and decreases thereafter. The cross section associated with this structure reaches a maximum of $\approx$ 100 nb around E$_γ \approx$ 1490 MeV (W $\approx $ 1920 MeV), which coincides with the $p a_0$ threshold. Three scenarios are discussed which might be the origin of this structure in the M$_{pη}$ invariant mass distribution. The most likely interpretation is that it is due to a triangular singularity in the $γp \rightarrow p a_0 \rightarrow p π^0 η$ reaction △ Less

Submitted 18 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 16 pages, 18 figure

arXiv:2106.12307 [pdf, other]

doi 10.1109/ICMLA52953.2021.00159

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

Authors: Mats L. Richter, Julius Schöning, Anna Wiedenroth, Ulf Krumnack

Abstract: When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance. The features a convolutional layer can process are strictly limited by its re… ▽ More When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance. The features a convolutional layer can process are strictly limited by its receptive field. By layer-wise analyzing the size of the receptive fields, we can reliably predict sequences of layers that will not contribute qualitatively to the test accuracy in the given CNN architecture. Based on this analysis, we propose design strategies based on a so-called border layer. This layer allows to identify unproductive convolutional layers and hence to resolve these inefficiencies, optimize the explainability and the computational performance of CNNs. Since neither the strategies nor the analysis requires training of the actual model, these insights allow for a very efficient design process of CNN architectures, which might be automated in the future. △ Less

Submitted 5 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: Preprint

arXiv:2106.09526 [pdf, other]

Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

Authors: Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, Ulf Krumnack

Abstract: In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given… ▽ More In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given task. We further show that the observed effects are independent from previously reported pathological patterns like the ``tail pattern'' described in \cite{featurespace_saturation}. Finally we are able to show that saturation patterns converge early during training, allowing for a quicker cycle time during analysis △ Less

Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:2102.11830 [pdf, other]

Solving high-dimensional parabolic PDEs using the tensor train format

Authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken

Abstract: High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulat… ▽ More High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. Following this paradigm, we develop novel iterative schemes, involving either explicit and fast or implicit and accurate updates. We demonstrate in a number of examples that our methods achieve a favorable trade-off between accuracy and computational efficiency in comparison with state-of-the-art neural network based approaches. △ Less

Submitted 17 July, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.09606 [pdf, other]

Nonasymptotic bounds for suboptimal importance sampling

Authors: Carsten Hartmann, Lorenz Richter

Abstract: Importance sampling is a popular variance reduction method for Monte Carlo estimation, where a notorious question is how to design good proposal distributions. While in most cases optimal (zero-variance) estimators are theoretically possible, in practice only suboptimal proposal distributions are available and it can often be observed numerically that those can reduce statistical performance signi… ▽ More Importance sampling is a popular variance reduction method for Monte Carlo estimation, where a notorious question is how to design good proposal distributions. While in most cases optimal (zero-variance) estimators are theoretically possible, in practice only suboptimal proposal distributions are available and it can often be observed numerically that those can reduce statistical performance significantly, leading to large relative errors and therefore counteracting the original intention. In this article, we provide nonasymptotic lower and upper bounds on the relative error in importance sampling that depend on the deviation of the actual proposal from optimality, and we thus identify potential robustness issues that importance sampling may have, especially in high dimensions. We focus on path sampling problems for diffusion processes, for which generating good proposals comes with additional technical challenges, and we provide numerous numerical examples that support our findings. △ Less

Submitted 18 February, 2021; originally announced February 2021.

arXiv:2102.01582 [pdf, other]

doi 10.1007/978-3-030-86340-1_11

Size Matters

Authors: Mats L. Richter, Wolf Byttner, Ulf Krumnack, Ludwdig Schallner, Justin Shenk

Abstract: Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no sim… ▽ More Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no simple relationship between input size and model performance (no `bigger is better'), but that each each network has a preferred input size, for which it shows best results. We investigate this phenomenon by applying different methods, including spectral analysis of layer activations and probe classifiers, showing that there are characteristic features depending on the network architecture. From this we find that the size of discriminatory features is critically influencing how the inference process is distributed among the layers. △ Less

Submitted 9 February, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: Preprint

Journal ref: Artificial Neural Networks and Machine Learning ICANN 2021 133-144

arXiv:2010.10436 [pdf, other]

VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Authors: Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco J. R. Ruiz, Ömer Deniz Akyildiz

Abstract: We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under… ▽ More We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE. △ Less

Submitted 29 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

arXiv:2008.12288 [pdf, other]

Model Order Reduction for (Stochastic-) Delay Equations With Error Bounds

Authors: Simon Becker, Lorenz Richter

Abstract: We analyze a structure-preserving model order reduction technique for delay and stochastic delay equations based on the balanced truncation method and provide a system theoretic interpretation. Transferring error bounds based on Hankel operators to delay systems, we find error estimates for the difference between the dynamics of the full and reduced model. This analysis also yields new error bound… ▽ More We analyze a structure-preserving model order reduction technique for delay and stochastic delay equations based on the balanced truncation method and provide a system theoretic interpretation. Transferring error bounds based on Hankel operators to delay systems, we find error estimates for the difference between the dynamics of the full and reduced model. This analysis also yields new error bounds for bilinear systems and stochastic systems with multiplicative noise and non-zero initial states. △ Less

Submitted 27 August, 2020; originally announced August 2020.

arXiv:2006.08679 [pdf, other]

Feature Space Saturation during Training

Authors: Mats L. Richter, Justin Shenk, Wolf Byttner, Anders Arpteg, Mikael Huss

Abstract: We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we… ▽ More We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we derive layer saturation - the ratio between the eigenspace dimension and layer width. We show that saturation seems to indicate which layers contribute to network performance. We demonstrate how to alter layer saturation in a neural network by changing network depth, filter sizes and input resolution. Furthermore, we show that well-chosen input resolution increases network performance by distributing the inference process more evenly across the network. △ Less

Submitted 22 November, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 45 pages, 41 figures; author order changed in v5 to reflect additional contribution; for code see http://github.com/MLRichter/phd-lab and http://github.com/delve-team/delve

MSC Class: 68T07 ACM Class: I.2.6

Journal ref: British Machine Vision Conference (BMVC) 2021

arXiv:2005.05409 [pdf, other]

Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space

Authors: Nikolas Nüsken, Lorenz Richter

Abstract: Optimal control of diffusion processes is intimately connected to the problem of solving certain Hamilton-Jacobi-Bellman equations. Building on recent machine learning inspired approaches towards high-dimensional PDEs, we investigate the potential of $\textit{iterative diffusion optimisation}$ techniques, in particular considering applications in importance sampling and rare event simulation, and… ▽ More Optimal control of diffusion processes is intimately connected to the problem of solving certain Hamilton-Jacobi-Bellman equations. Building on recent machine learning inspired approaches towards high-dimensional PDEs, we investigate the potential of $\textit{iterative diffusion optimisation}$ techniques, in particular considering applications in importance sampling and rare event simulation, and focusing on problems without diffusion control, with linearly controlled drift and running costs that depend quadratically on the control. More generally, our methods apply to nonlinear parabolic PDEs with a certain shift invariance. The choice of an appropriate loss function being a central element in the algorithmic design, we develop a principled framework based on divergences between path measures, encompassing various existing methods. Motivated by connections to forward-backward SDEs, we propose and study the novel $\textit{log-variance}$ divergence, showing favourable properties of corresponding Monte Carlo estimators. The promise of the developed approach is exemplified by a range of high-dimensional and metastable numerical examples. △ Less

Submitted 29 January, 2023; v1 submitted 11 May, 2020; originally announced May 2020.

arXiv:1912.06212 [pdf, other]

doi 10.3847/1538-4357/ab5d2d

The 1.28 GHz MeerKAT DEEP2 Image

Authors: T. Mauch, W. D. Cotton, J. J. Condon, A. M. Matthews, T. D. Abbott, R. M. Adam, M. A. Aldera, K. M. B. Asad, E. F. Bauermeister, T. G. H. Bennett, H. Bester, D. H. Botha, L. R. S. Brederode, Z. B. Brits, S. J. Buchner, J. P. Burger, F. Camilo, J. M. Chalmers, T. Cheetham, D. de Villiers, M. S. de Villiers, M. A. Dikgale-Mahlakoana, L. J. du Toit, S. W. P. Esterhuyse, G. Fadana , et al. (79 additional authors not shown)

Abstract: We present the confusion-limited 1.28 GHz MeerKAT DEEP2 image covering one $\approx 68'$ FWHM primary beam area with $7.6''$ FWHM resolution and $0.55 \pm 0.01$ $μ$Jy/beam rms noise. Its J2000 center position $α=04^h 13^m 26.4^s$, $δ=-80^\circ 00' 00''$ was selected to minimize artifacts caused by bright sources. We introduce the new 64-element MeerKAT array and describe commissioning observations… ▽ More We present the confusion-limited 1.28 GHz MeerKAT DEEP2 image covering one $\approx 68'$ FWHM primary beam area with $7.6''$ FWHM resolution and $0.55 \pm 0.01$ $μ$Jy/beam rms noise. Its J2000 center position $α=04^h 13^m 26.4^s$, $δ=-80^\circ 00' 00''$ was selected to minimize artifacts caused by bright sources. We introduce the new 64-element MeerKAT array and describe commissioning observations to measure the primary beam attenuation pattern, estimate telescope pointing errors, and pinpoint $(u,v)$ coordinate errors caused by offsets in frequency or time. We constructed a 1.4 GHz differential source count by combining a power-law count fit to the DEEP2 confusion $P(D)$ distribution from $0.25$ to $10$ $μ$Jy with counts of individual DEEP2 sources between $10$ $μ$Jy and $2.5$ mJy. Most sources fainter than $S \sim 100$ $μ$Jy are distant star-forming galaxies obeying the FIR/radio correlation, and sources stronger than $0.25$ $μ$Jy account for $\sim93\%$ of the radio background produced by star-forming galaxies. For the first time, the DEEP2 source count has reached the depth needed to reveal the majority of the star formation history of the universe. A pure luminosity evolution of the 1.4 GHz local luminosity function consistent with the Madau & Dickinson (2014) model for the evolution of star-forming galaxies based on UV and infrared data underpredicts our 1.4 GHz source count in the range $-5 \lesssim \log[S(\mathrm{Jy})] \lesssim -4$. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: 20 pages, 18 figures. Accepted for publication in ApJ

arXiv:1912.06113 [pdf, other]

Error bounds for model reduction of feedback-controlled linear stochastic dynamics on Hilbert spaces

Authors: Simon Becker, Carsten Hartmann, Martin Redmann, Lorenz Richter

Abstract: We analyze structure-preserving model order reduction methods for Ornstein-Uhlenbeck processes and linear S(P)DEs with multiplicative noise based on balanced truncation. For the first time, we include in this study the analysis of non-zero initial conditions. We moreover allow for feedback-controlled dynamics for solving stochastic optimal control problems with reduced-order models and prove novel… ▽ More We analyze structure-preserving model order reduction methods for Ornstein-Uhlenbeck processes and linear S(P)DEs with multiplicative noise based on balanced truncation. For the first time, we include in this study the analysis of non-zero initial conditions. We moreover allow for feedback-controlled dynamics for solving stochastic optimal control problems with reduced-order models and prove novel error bounds for a class of linear quadratic regulator problems. We provide numerical evidence for the bounds and discuss the application of our approach to enhanced sampling methods from non-equilibrium statistical mechanics. △ Less

Submitted 17 March, 2022; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: comments welcome

arXiv:1907.08589 [pdf, other]

Spectral Analysis of Latent Representations

Authors: Justin Shenk, Mats L. Richter, Anders Arpteg, Mikael Huss

Abstract: We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook fo… ▽ More We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook for future applications of this metric by outlining the behaviour of layer saturation in different neural architectures and problems. We further show that saturation is related to the generalization and predictive performance of neural networks. △ Less

Submitted 19 July, 2019; originally announced July 2019.

Comments: 13 pages, 16 figures, code: https://github.com/delve-team/delve

arXiv:1901.09195 [pdf, other]

doi 10.1063/1.5090271

Variational approach to rare event simulation using least-squares regression

Authors: Carsten Hartmann, Omar Kebiri, Lara Neureither, Lorenz Richter

Abstract: We propose an adaptive importance sampling scheme for the simulation of rare events when the underlying dynamics is given by a diffusion. The scheme is based on a Gibbs variational principle that is used to determine the optimal (i.e. zero-variance) change of measure and exploits the fact that the latter can be rephrased as a stochastic optimal control problem. The control problem can be solved by… ▽ More We propose an adaptive importance sampling scheme for the simulation of rare events when the underlying dynamics is given by a diffusion. The scheme is based on a Gibbs variational principle that is used to determine the optimal (i.e. zero-variance) change of measure and exploits the fact that the latter can be rephrased as a stochastic optimal control problem. The control problem can be solved by a stochastic approximation algorithm, using the Feynman-Kac representation of the associated dynamic programming equations, and we discuss numerical aspects for high-dimensional problems along with simple toy examples. △ Less

Submitted 16 April, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

Comments: 28 pages, 7 figures

MSC Class: 65C05 (primary); 65C30; 92C40 (secondary)

arXiv:1806.10140 [pdf, other]

doi 10.1093/mnras/sty1789

The Stripe 82 1-2 GHz Very Large Array Snapshot Survey: Multiwavelength Counterparts

Authors: Matthew Prescott, I. H. Whittam, M. J. Jarvis, K. McAlpine, L. L. Richter, S. Fine, T. Mauch, I. Heywood, M. Vaccari

Abstract: We have combined spectrosopic and photometric data from the Sloan Digital Sky Survey (SDSS) with $1.4$ GHz radio observations, conducted as part of the Stripe 82 $1-2$ GHz Snapshot Survey using the Karl G. Jansky Very Large Array (VLA), which covers $\sim100$ sq degrees, to a flux limit of 88 $μ$Jy rms. Cross-matching the $11\,768$ radio source components with optical data via visual inspection re… ▽ More We have combined spectrosopic and photometric data from the Sloan Digital Sky Survey (SDSS) with $1.4$ GHz radio observations, conducted as part of the Stripe 82 $1-2$ GHz Snapshot Survey using the Karl G. Jansky Very Large Array (VLA), which covers $\sim100$ sq degrees, to a flux limit of 88 $μ$Jy rms. Cross-matching the $11\,768$ radio source components with optical data via visual inspection results in a final sample of $4\,795$ cross-matched objects, of which $1\,996$ have spectroscopic redshifts and $2\,799$ objects have photometric redshifts. Three previously undiscovered Giant Radio Galaxies (GRGs) were found during the cross-matching process, which would have been missed using automated techniques. For the objects with spectroscopy we separate radio-loud Active Galactic Nuclei (AGN) and star-forming galaxies (SFGs) using three diagnostics and then further divide our radio-loud AGN into the HERG and LERG populations. A control matched sample of HERGs and LERGs, matched on stellar mass, redshift and radio luminosity, reveals that the host galaxies of LERGs are redder and more concentrated than HERGs. By combining with near-infrared data, we demonstrate that LERGs also follow a tight $K-z$ relationship. These results imply the LERG population are hosted by population of massive, passively evolving early-type galaxies. We go on to show that HERGs, LERGs, QSOs and star-forming galaxies in our sample all reside in different regions of a WISE colour-colour diagram. This cross-matched sample bridges the gap between previous `wide but shallow' and `deep but narrow' samples and will be useful for a number of future investigations. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: 17 pages, 19 figures. Resubmitted to MNRAS after the initial comments

arXiv:1804.01933 [pdf, other]

doi 10.3847/1538-4357/aab35a

Revival of the magnetar PSR J1622-4950: observations with MeerKAT, Parkes, XMM-Newton, Swift, Chandra, and NuSTAR

Authors: F. Camilo, P. Scholz, M. Serylak, S. Buchner, M. Merryfield, V. M. Kaspi, R. F. Archibald, M. Bailes, A. Jameson, W. van Straten, J. Sarkissian, J. E. Reynolds, S. Johnston, G. Hobbs, T. D. Abbott, R. M. Adam, G. B. Adams, T. Alberts, R. Andreas, K. M. B. Asad, D. E. Baker, T. Baloyi, E. F. Bauermeister, T. Baxana, T. G. H. Bennett , et al. (183 additional authors not shown)

Abstract: New radio (MeerKAT and Parkes) and X-ray (XMM-Newton, Swift, Chandra, and NuSTAR) observations of PSR J1622-4950 indicate that the magnetar, in a quiescent state since at least early 2015, reactivated between 2017 March 19 and April 5. The radio flux density, while variable, is approximately 100x larger than during its dormant state. The X-ray flux one month after reactivation was at least 800x la… ▽ More New radio (MeerKAT and Parkes) and X-ray (XMM-Newton, Swift, Chandra, and NuSTAR) observations of PSR J1622-4950 indicate that the magnetar, in a quiescent state since at least early 2015, reactivated between 2017 March 19 and April 5. The radio flux density, while variable, is approximately 100x larger than during its dormant state. The X-ray flux one month after reactivation was at least 800x larger than during quiescence, and has been decaying exponentially on a 111+/-19 day timescale. This high-flux state, together with a radio-derived rotational ephemeris, enabled for the first time the detection of X-ray pulsations for this magnetar. At 5%, the 0.3-6 keV pulsed fraction is comparable to the smallest observed for magnetars. The overall pulsar geometry inferred from polarized radio emission appears to be broadly consistent with that determined 6-8 years earlier. However, rotating vector model fits suggest that we are now seeing radio emission from a different location in the magnetosphere than previously. This indicates a novel way in which radio emission from magnetars can differ from that of ordinary pulsars. The torque on the neutron star is varying rapidly and unsteadily, as is common for magnetars following outburst, having changed by a factor of 7 within six months of reactivation. △ Less

Submitted 5 April, 2018; originally announced April 2018.

Comments: Published in ApJ (2018 April 5); 13 pages, 4 figures

Journal ref: ApJ 856 (2018) 180

arXiv:1709.01289 [pdf, other]

The MeerKAT Fornax Survey

Authors: P. Serra, W. J. G. de Blok, G. L. Bryan, S. Colafrancesco, R. -J. Dettmar, B. S. Frank, F. Govoni, G. I. G. Józsa, R. C. Kraan-Korteweg, S. I. Loubser, F. M. Maccagni, M. Murgia, T. A. Oosterloo, R. F. Peletier, R. Pizzo, M. Ramatsoku, L. Richter, M. W. L. Smith, S. C. Trager, J. H. van Gorkom, M. A. W. Verheijen

Abstract: We present the science case and observations plan of the MeerKAT Fornax Survey, an HI and radio continuum survey of the Fornax galaxy cluster to be carried out with the SKA precursor MeerKAT. Fornax is the second most massive cluster within 20 Mpc and the largest nearby cluster in the southern hemisphere. Its low X-ray luminosity makes it representative of the environment where most galaxies live… ▽ More We present the science case and observations plan of the MeerKAT Fornax Survey, an HI and radio continuum survey of the Fornax galaxy cluster to be carried out with the SKA precursor MeerKAT. Fornax is the second most massive cluster within 20 Mpc and the largest nearby cluster in the southern hemisphere. Its low X-ray luminosity makes it representative of the environment where most galaxies live and where substantial galaxy evolution takes place. Fornax's ongoing growth makes it an excellent laboratory for studying the assembly of clusters, the physics of gas accretion and strip** in galaxies falling in the cluster, and the connection between these processes and the neutral medium in the cosmic web. We will observe a region of 12 deg$^2$ reaching a projected distance of 1.5 Mpc from the cluster centre. This will cover a wide range of environment density out to the outskirts of the cluster, where gas-rich in-falling groups are found. We will: study the HI morphology of resolved galaxies down to a column density of a few times 1e+19 cm$^{-2}$ at a resolution of 1 kpc; measure the slope of the HI mass function down to M(HI) 5e+5 M(sun); and attempt to detect HI in the cosmic web reaching a column density of 1e+18 cm$^{-2}$ at a resolution of 10 kpc. △ Less

Submitted 5 September, 2017; originally announced September 2017.

Comments: Proceedings of Science, "MeerKAT Science: On the Pathway to the SKA", Stellenbosch, 25-27 May 2016

arXiv:1703.09921 [pdf, other]

doi 10.1088/1475-7516/2017/07/025

Dark matter in the Reticulum II dSph: a radio search

Authors: Marco Regis, Laura Richter, Sergio Colafrancesco

Abstract: We present a deep radio search in the Reticulum II dwarf spheroidal (dSph) galaxy performed with the Australia Telescope Compact Array. Observations were conducted at 16 cm wavelength, with an rms sensitivity of 0.01 mJy/beam, and with the goal of searching for synchrotron emission induced by annihilation or decay of weakly interacting massive particles (WIMPs). Data were complemented with observa… ▽ More We present a deep radio search in the Reticulum II dwarf spheroidal (dSph) galaxy performed with the Australia Telescope Compact Array. Observations were conducted at 16 cm wavelength, with an rms sensitivity of 0.01 mJy/beam, and with the goal of searching for synchrotron emission induced by annihilation or decay of weakly interacting massive particles (WIMPs). Data were complemented with observations on large angular scales taken with the KAT-7 telescope. We find no evidence for a diffuse emission from the dSph and we derive competitive bounds on the WIMP properties. In addition, we detect more than 200 new background radio sources. Among them, we show there are two compelling candidates for being the radio counterpart of the possible gamma-ray emission reported by other groups using Fermi-LAT data. △ Less

Submitted 25 July, 2017; v1 submitted 29 March, 2017; originally announced March 2017.

Comments: 24 pages, 13 figure panels, 2 tables. v2 to match published version

Journal ref: JCAP 07 (2017) 025

arXiv:1606.02929 [pdf, ps, other]

doi 10.1093/mnras/stw1040

Engineering and Science Highlights of the KAT-7 Radio Telescope

Authors: A. R. Foley, T. Alberts, R P. Armstrong, A. Barta, E. F. Bauermeister, H. Bester, S. Blose, R. S. Booth, D. H. Botha, S. J. Buchner, C. Carignan, T. Cheetham, K. Cloete, G. Coreejes, R. C. Crida, S. D. Cross, F. Curtolo, A. Dikgale, M. S. de Villiers, L. J. du Toit, S. W. P. Esterhuyse, B. Fanaroff, R. P. Fender, M. Fijalkowski, D. Fourie , et al. (78 additional authors not shown)

Abstract: The construction of the KAT-7 array in the Karoo region of the Northern Cape in South Africa was intended primarily as an engineering prototype for technologies and techniques applicable to the MeerKAT telescope. This paper looks at the main engineering and scien- tific highlights from this effort, and discusses their applicability to both MeerKAT and other next-generation radio telescopes. In par… ▽ More The construction of the KAT-7 array in the Karoo region of the Northern Cape in South Africa was intended primarily as an engineering prototype for technologies and techniques applicable to the MeerKAT telescope. This paper looks at the main engineering and scien- tific highlights from this effort, and discusses their applicability to both MeerKAT and other next-generation radio telescopes. In particular we found that the composite dish surface works well, but it becomes complicated to fabricate for a dish lacking circular symmetry; the Stir- ling cycle cryogenic system with ion pump to achieve vacuum works but demands much higher maintenance than an equivalent Gifford-McMahon cycle system; the ROACH (Recon- figurable Open Architecture Computing Hardware)-based correlator with SPEAD (Stream- ing Protocol for Exchanging Astronomical Data) protocol data transfer works very well and KATCP (Karoo Array Telescope Control Protocol) control protocol has proven very flexible and convenient. KAT-7 has also been used for scientific observations where it has a niche in map** low surface-brightness continuum sources, some extended HI halos and OH masers in star-forming regions. It can also be used to monitor continuum source variability, observe pulsars, and make VLBI observations △ Less

Submitted 9 June, 2016; originally announced June 2016.

arXiv:1605.09572 [pdf, ps, other]

doi 10.1093/mnras/stw1302

Simultaneous VLBA polarimetric observations of the v=$\{$1,2$\}$ J=1-0 and v=1, J=2-1 SiO maser emission toward VY CMa II: component-level polarization analysis

Authors: L. Richter, A. Kemball, J. Jonas

Abstract: This paper presents a component-level comparison of the polarized v=1 J =1-0, v=2 J=1-0 and v=1 J=2-1 SiO maser emission towards the supergiant star VY CMa at milliarcsecond-scale, as observed using the VLBA at $λ=7$mm and $λ=3$mm. An earlier paper considered overall maser morphology and constraints on SiO maser excitation and pum** derived from these data. The goal of the current paper is to us… ▽ More This paper presents a component-level comparison of the polarized v=1 J =1-0, v=2 J=1-0 and v=1 J=2-1 SiO maser emission towards the supergiant star VY CMa at milliarcsecond-scale, as observed using the VLBA at $λ=7$mm and $λ=3$mm. An earlier paper considered overall maser morphology and constraints on SiO maser excitation and pum** derived from these data. The goal of the current paper is to use the measured polarization properties of individual co-spatial components detected across multiple transitions to provide constraints on several competing theories for the transport of polarized maser emission. This approach minimizes the significant effects of spatial blending. We present several diagnostic tests designed to distinguish key features of competing theoretical models for maser polarization. The number of coincident features is limited by sensitivity however, particularly in the v=1 J=2-1 transition at 86 GHz, and deeper observations are needed. Preliminary conclusions based on the current data provide some support for: i) spin-independent solutions for linear polarization; ii) the influence of geometry on the distribution of fractional linear polarization with intensity; and, iii) $π/2$ rotations in linear polarization position angle arising from transitions across the Van Vleck angle ($\sin^2θ=2/3$) between the maser line-of-sight and magnetic field. There is weaker evidence for several enumerated non-Zeeman explanations for circular polarization. The expected 2:1 ratio in circular polarization between J=1-0 and J=2-1 predicted by standard Zeeman theory cannot unfortunately be tested conclusively due to insufficient coincident components. △ Less

Submitted 31 May, 2016; originally announced May 2016.

arXiv:1601.00891 [pdf, other]

doi 10.1186/s13059-016-1037-6

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Authors: Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed ME Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca , et al. (122 additional authors not shown)

Abstract: Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a… ▽ More Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction. △ Less

Submitted 2 January, 2016; originally announced January 2016.

Comments: Submitted to Genome Biology

arXiv:1407.5482 [pdf, other]

doi 10.1093/mnras/stv127

Local Group dSph radio survey with ATCA (II): Non-thermal diffuse emission

Authors: M. Regis, L. Richter, S. Colafrancesco, S. Profumo, W. J. G. de Blok, M. Massardi

Abstract: Our closest neighbours, the Local Group dwarf spheroidal (dSph) galaxies, are extremely quiescent and dim objects, where thermal and non-thermal diffuse emissions lack, so far, of detection. In order to possibly study the dSph interstellar medium, deep observations are required. They could reveal non-thermal emissions associated with the very-low level of star formation, or to particle dark matter… ▽ More Our closest neighbours, the Local Group dwarf spheroidal (dSph) galaxies, are extremely quiescent and dim objects, where thermal and non-thermal diffuse emissions lack, so far, of detection. In order to possibly study the dSph interstellar medium, deep observations are required. They could reveal non-thermal emissions associated with the very-low level of star formation, or to particle dark matter annihilating or decaying in the dSph halo. In this work, we employ radio observations of six dSphs, conducted with the Australia Telescope Compact Array in the frequency band 1.1-3.1 GHz, to test the presence of a diffuse component over typical scales of few arcmin and at an rms sensitivity below 0.05 mJy/beam. We observed the dSph fields with both a compact array and long baselines. Short spacings led to a synthesized beam of about 1 arcmin and were used for the extended emission search. The high-resolution data mapped background sources, which in turn were subtracted in the short-baseline maps, to reduce their confusion limit. We found no significant detection of a diffuse radio continuum component. After a detailed discussion on the modelling of the cosmic-ray (CR) electron distribution and on the dSph magnetic properties, we present bounds on several physical quantities related to the dSphs, such that the total radio flux, the angular shape of the radio emissivity, the equipartition magnetic field, and the injection and equilibrium distributions of CR electrons. Finally, we discuss the connection to far-infrared and X-ray observations. △ Less

Submitted 9 February, 2015; v1 submitted 18 July, 2014; originally announced July 2014.

Comments: 21 pages, 11 figure panels. Companion papers: arXiv:1407.5479 and arXiv:1407.4948. v3: minor revision, matches version accepted in MNRAS

Journal ref: MNRAS 448, 3747-3765 (2015)

arXiv:1407.5479 [pdf, other]

doi 10.1093/mnras/stu2747

Local Group dSph radio survey with ATCA (I): Observations and background sources

Authors: M. Regis, L. Richter, S. Colafrancesco, M. Massardi, W. J. G. de Blok, S. Profumo, N. Orford

Abstract: Dwarf spheroidal (dSph) galaxies are key objects in near-field cosmology, especially in connection to the study of galaxy formation and evolution at small scales. In addition, dSphs are optimal targets to investigate the nature of dark matter. However, while we begin to have deep optical photometric observations of the stellar population in these objects, little is known so far about their diffuse… ▽ More Dwarf spheroidal (dSph) galaxies are key objects in near-field cosmology, especially in connection to the study of galaxy formation and evolution at small scales. In addition, dSphs are optimal targets to investigate the nature of dark matter. However, while we begin to have deep optical photometric observations of the stellar population in these objects, little is known so far about their diffuse emission at any observing frequency, and hence on thermal and non-thermal plasma possibly residing within dSphs. In this paper, we present deep radio observations of six local dSphs performed with the Australia Telescope Compact Array at 16 cm wavelength. We mosaiced a region of radius of about one degree around three "classical" dSphs, Carina, Fornax, and Sculptor, and of about half of degree around three "ultra-faint" dSphs, BootesII, Segue2, and Hercules. The rms noise level is below 0.05 mJy for all the maps. The restoring beams FWHM ranged from 4.2 x 2.5 arcseconds to 30.0 x 2.1 arcseconds in the most elongated case. A catalogue including the 1392 sources detected in the six dSph fields is reported. The main properties of the background sources are discussed, with positions and fluxes of brightest objects compared with the FIRST, NVSS, and SUMSS observations of the same fields. The observed population of radio emitters in these fields is dominated by synchrotron sources. We compute the associated source number counts at 2 GHz down to fluxes of 0.25 mJy, which prove to be in agreement with AGN count models. △ Less

Submitted 9 February, 2015; v1 submitted 18 July, 2014; originally announced July 2014.

Comments: 18 pages, 9 figure panels. Companion papers: arXiv:1407.5482 and arXiv:1407.4948. v3: minor revision, matches version accepted in MNRAS

Journal ref: MNRAS 448, 3731-3746 (2015)

arXiv:1407.4948 [pdf, ps, other]

doi 10.1088/1475-7516/2014/10/016

Local Group dSph radio survey with ATCA (III): Constraints on Particle Dark Matter

Authors: M. Regis, S. Colafrancesco, S. Profumo, W. J. G. de Blok, M. Massardi, L. Richter

Abstract: We performed a deep search for radio synchrotron emissions induced by weakly interacting massive particles (WIMPs) annihilation or decay in six dwarf spheroidal (dSph) galaxies of the Local Group. Observations were conducted with the Australia Telescope Compact Array (ATCA) at 16 cm wavelength, with an rms sensitivity better than 0.05 mJy/beam in each field. In this work, we first discuss the unce… ▽ More We performed a deep search for radio synchrotron emissions induced by weakly interacting massive particles (WIMPs) annihilation or decay in six dwarf spheroidal (dSph) galaxies of the Local Group. Observations were conducted with the Australia Telescope Compact Array (ATCA) at 16 cm wavelength, with an rms sensitivity better than 0.05 mJy/beam in each field. In this work, we first discuss the uncertainties associated with the modeling of the expected signal, such as the shape of the dark matter (DM) profile and the dSph magnetic properties. We then investigate the possibility that point-sources detected in the proximity of the dSph optical center might be due to the emission from a DM cuspy profile. No evidence for an extended emission over a size of few arcmin (which is the DM halo size) has been detected. We present the associated bounds on the WIMP parameter space for different annihilation/decay final states and for different astrophysical assumptions. If the confinement of electrons and positrons in the dSph is such that the majority of their power is radiated within the dSph region, we obtain constraints on the WIMP annihilation rate which are well below the thermal value for masses up to few TeV. On the other hand, for conservative assumptions on the dSph magnetic properties, the bounds can be dramatically relaxed. We show however that, within the next 10 years and regardless of the astrophysical assumptions, it will be possible to progressively close in on the full parameter space of WIMPs by searching for radio signals in dSphs with SKA and its precursors. △ Less

Submitted 10 October, 2014; v1 submitted 18 July, 2014; originally announced July 2014.

Comments: 17 pages, 6 figure panels. Companion papers: arXiv:1407.5479 and arXiv:1407.5482. v3: minor revision, matches published version

Journal ref: JCAP 10 (2014) 016

arXiv:1309.2260 [pdf, ps, other]

doi 10.1093/mnras/stt1686

Simultaneous VLBA polarimetric observations of the v={1,2} J=1-0 and v=1, J=2-1 SiO maser emission toward VY CMa: maser morphology and pum**

Authors: Laura Richter, Athol Kemball, Justin Jonas

Abstract: This paper presents a milliarcsecond-scale comparison of the polarised component-level v=1 J=1-0, v=2 J=1-0 and v=1 J=2-1 SiO maser emission toward the supergiant star VY CMa. These observations used the VLBA at λ=7mm and λ=3mm over two epochs. The goal is to use the relative characteristics and spatial distribution of the transitions in individual resolved maser components to provide observationa… ▽ More This paper presents a milliarcsecond-scale comparison of the polarised component-level v=1 J=1-0, v=2 J=1-0 and v=1 J=2-1 SiO maser emission toward the supergiant star VY CMa. These observations used the VLBA at λ=7mm and λ=3mm over two epochs. The goal is to use the relative characteristics and spatial distribution of the transitions in individual resolved maser components to provide observational constraints on SiO maser excitation and pum** models. We find that many v=1 J=1-0 and v=2 J=1-0 features overlap in the second observing epoch, at comparable sensitivity. The v=2 J=1-0 emission is primarily located within several v=1 J=1-0 regions, and has a high degree of morphological similarity in the northeastern envelope, supportive of local collisional pum**. We find significantly higher spatial overlap between v=1 J=1-0 and J=2-1 features than generally previously reported. However, the overlap** v=1 J=2-1 emission is usually weaker than predicted by hydrodynamical models, possibly due to low-density collisional pum**. The overall maser morphology contains several large-scale features that are persistent over multiple years and are likely associated with intrinsic physical circumstellar conditions. Our data cannot distinguish between competing near-circumstellar bulk kinematic models but do provide evidence for localized inhomogeneous mass-loss. △ Less

Submitted 9 September, 2013; originally announced September 2013.

Comments: 14 pages, 14 figures, 1 table

arXiv:1305.3399 [pdf, other]

doi 10.1093/mnras/stt860

A return to strong radio flaring by Circinus X-1 observed with the Karoo Array Telescope test array KAT-7

Authors: R. P. Armstrong, R. P. Fender, G. D. Nicolson, S. Ratcliffe, M. Linares, J. Horrell, L. Richter, M. P. E. Schurch, M. Coriat, P. Woudt, J. Jonas, R. Booth, B. Fanaroff

Abstract: Circinus X-1 is a bright and highly variable X-ray binary which displays strong and rapid evolution in all wavebands. Radio flaring, associated with the production of a relativistic jet, occurs periodically on a ~17-day timescale. A longer-term envelope modulates the peak radio fluxes in flares, ranging from peaks in excess of a Jansky in the 1970s to an historic low of milliJanskys during the yea… ▽ More Circinus X-1 is a bright and highly variable X-ray binary which displays strong and rapid evolution in all wavebands. Radio flaring, associated with the production of a relativistic jet, occurs periodically on a ~17-day timescale. A longer-term envelope modulates the peak radio fluxes in flares, ranging from peaks in excess of a Jansky in the 1970s to an historic low of milliJanskys during the years 1994 to 2007. Here we report first observations of this source with the MeerKAT test array, KAT-7, part of the pathfinder development for the African dish component of the Square Kilometre Array (SKA), demonstrating successful scientific operation for variable and transient sources with the test array. The KAT-7 observations at 1.9 GHz during the period 13 December 2011 to 16 January 2012 reveal in temporal detail the return to the Jansky-level events observed in the 1970s. We compare these data to contemporaneous single-dish measurements at 4.8 and 8.5 GHz with the HartRAO 26-m telescope and X-ray monitoring from MAXI. We discuss whether the overall modulation and recent dramatic brightening is likely to be due to an increase in the power of the jet due to changes in accretion rate or changing Doppler boosting associated with a varying angle to the line of sight. △ Less

Submitted 15 May, 2013; originally announced May 2013.

Comments: 7 pages, 5 figures, accepted for publication in MNRAS 14 May 2013

arXiv:1110.5094 [pdf, ps, other]

doi 10.1088/0004-637X/743/1/69

Electric vector rotations of π/2 in polarized circumstellar SiO maser emission

Authors: A. J. Kemball, P. J. Diamond, L. Richter, I. Gonidakis, R. Xue

Abstract: This paper examines the detailed sub-milliarcsecond polarization properties of an individual SiO maser feature displaying a rotation in polarization electric vector position angle of approximately π/2 across the feature. Such rotations are a characteristic observational signature of circumstellar SiO masers detected toward a number of late-type, evolved stars. We employ a new calibration method fo… ▽ More This paper examines the detailed sub-milliarcsecond polarization properties of an individual SiO maser feature displaying a rotation in polarization electric vector position angle of approximately π/2 across the feature. Such rotations are a characteristic observational signature of circumstellar SiO masers detected toward a number of late-type, evolved stars. We employ a new calibration method for accurate circular VLBI polarimetry at millimeter wavelengths, to present the detailed Stokes {I,Q,U,V} properties for this feature. We analyze the fractional linear and circular polarization as a function of projected angular distance across the extent of the feature, and compare these measurements against several theoretical models proposed for sharp rotations of electric vector position angle in polarized SiO maser emission. We find that the rotation is most likely caused by the angle θbetween the line of sight and a projected magnetic field crossing the critical Van Vleck angle for maser propagation. The fractional linear polarization profile m_l(θ) is well-fit by standard models for polarized maser transport, but we find less agreement for the fractional circular polarization profile m_c(θ). △ Less

Submitted 23 October, 2011; originally announced October 2011.

Comments: 14 pages, 11 figures; to appear in Ap. J

Showing 1–50 of 52 results for author: Richter, L