Search | arXiv e-print repository

Injective Flows for parametric hypersurfaces

Authors: Marcello Massimo Negri, Jonathan Aellen, Volker Roth

Abstract: Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for para… ▽ More Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for parametric hypersurfaces and show that for such manifolds we can compute the Jacobian determinant exactly and efficiently, with the same cost as NFs. Furthermore, we show that for the subclass of star-like manifolds we can extend the proposed framework to always allow for a Cartesian representation of the density. We showcase the relevance of modeling densities on hypersurfaces in two settings. Firstly, we introduce a novel Objective Bayesian approach to penalized likelihood models by interpreting level-sets of the penalty as star-like manifolds. Secondly, we consider Bayesian mixture models and introduce a general method for variational inference by defining the posterior of mixture weights on the probability simplex. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.04573 [pdf, other]

Dynamic critical behavior of the chiral phase transition from the real-time functional renormalization group

Authors: Johannes V. Roth, Yunxin Ye, Sören Schlichting, Lorenz von Smekal

Abstract: In the chiral limit the complicated many-body dynamics around the second-order chiral phase transition of two-flavor QCD can be understood by appealing to universality. We present a novel formulation of the real-time functional renormalization group that describes the stochastic hydrodynamic equations of motion for systems in the same dynamic universality class, which corresponds to Model G in the… ▽ More In the chiral limit the complicated many-body dynamics around the second-order chiral phase transition of two-flavor QCD can be understood by appealing to universality. We present a novel formulation of the real-time functional renormalization group that describes the stochastic hydrodynamic equations of motion for systems in the same dynamic universality class, which corresponds to Model G in the Halperin-Hohenberg classification. Our approach preserves all relevant symmetries of such systems with reversible mode couplings. We show that the calculations indeed produce the non-trivial value $z=d/2$ for the dynamic critical exponent, where $d$ is the number of spatial dimensions. From the momentum and temperature dependence of the diffusion coefficient of the conserved charge densities, we extract the dimensionless universal scaling function. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 80 pages, 13 figures

arXiv:2308.09571 [pdf]

doi 10.1016/j.jocs.2024.102355

Physics-Informed Boundary Integral Networks (PIBI-Nets): A Data-Driven Approach for Solving Partial Differential Equations

Authors: Monika Nagy-Huber, Volker Roth

Abstract: Partial differential equations (PDEs) are widely used to describe relevant phenomena in dynamical systems. In real-world applications, we commonly need to combine formal PDE models with (potentially noisy) observations. This is especially relevant in settings where we lack information about boundary or initial conditions, or where we need to identify unknown model parameters. In recent years, Phys… ▽ More Partial differential equations (PDEs) are widely used to describe relevant phenomena in dynamical systems. In real-world applications, we commonly need to combine formal PDE models with (potentially noisy) observations. This is especially relevant in settings where we lack information about boundary or initial conditions, or where we need to identify unknown model parameters. In recent years, Physics-Informed Neural Networks (PINNs) have become a popular tool for this kind of problems. In high-dimensional settings, however, PINNs often suffer from computational problems because they usually require dense collocation points over the entire computational domain. To address this problem, we present Physics-Informed Boundary Integral Networks (PIBI-Nets) as a data-driven approach for solving PDEs in one dimension less than the original problem space. PIBI-Nets only require points at the computational domain boundary, while still achieving highly accurate results. Moreover, PIBI-Nets clearly outperform PINNs in several practical settings. Exploiting elementary properties of fundamental solutions of linear differential operators, we present a principled and simple way to handle point sources in inverse problems. We demonstrate the excellent performance of PIBI- Nets for the Laplace and Poisson equations, both on artificial datasets and within a real-world application concerning the reconstruction of groundwater flows. △ Less

Submitted 3 July, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Report number: volume 81, special issue "Machine Learning and Data Assimilation for Dynamical Systems II"

Journal ref: Journal of Computational Science, Elsevier, 2024

arXiv:2306.07255 [pdf, other]

Conditional Matrix Flows for Gaussian Graphical Models

Authors: Marcello Massimo Negri, F. Arend Torres, Volker Roth

Abstract: Studying conditional independence among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through $l_q$ regularization with $q\leq1$. However, most GMMs rely on the $l_1$ norm because the objective is highly non-convex for sub-$l_1$ pseudo-norms. In the frequentist formulation, the $l_1$… ▽ More Studying conditional independence among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through $l_q$ regularization with $q\leq1$. However, most GMMs rely on the $l_1$ norm because the objective is highly non-convex for sub-$l_1$ pseudo-norms. In the frequentist formulation, the $l_1$ norm relaxation provides the solution path as a function of the shrinkage parameter $λ$. In the Bayesian formulation, sparsity is instead encouraged through a Laplace prior, but posterior inference for different $λ$ requires repeated runs of expensive Gibbs samplers. Here we propose a general framework for variational inference with matrix-variate Normalizing Flow in GGMs, which unifies the benefits of frequentist and Bayesian frameworks. As a key improvement on previous work, we train with one flow a continuum of sparse regression models jointly for all regularization parameters $λ$ and all $l_q$ norms, including non-convex sub-$l_1$ pseudo-norms. Within one model we thus have access to (i) the evolution of the posterior for any $λ$ and any $l_q$ (pseudo-) norm, (ii) the marginal log-likelihood for model selection, and (iii) the frequentist solution paths through simulated annealing in the MAP limit. △ Less

Submitted 16 November, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: NeurIPS23 version

arXiv:2305.16846 [pdf, other]

Lagrangian Flow Networks for Conservation Laws

Authors: F. Arend Torres, Marcello Massimo Negri, Marco Inversi, Jonathan Aellen, Volker Roth

Abstract: We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via diffe… ▽ More We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via differentiable and invertible maps. This follows from classical theory of the existence and uniqueness of Lagrangian flows for smooth vector fields. Hence, we model fluid densities by transforming a base density with parameterized diffeomorphisms conditioned on time. The key benefit compared to methods relying on numerical ODE solvers or PINNs is that the analytic expression of the velocity is always consistent with changes in density. Furthermore, we require neither expensive numerical solvers, nor additional penalties to enforce the PDE. LFlows show higher predictive accuracy in density modeling tasks compared to competing models in 2D and 3D, while being computationally efficient. As a real-world application, we model bird migration based on sparse weather radar measurements. △ Less

Submitted 13 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2303.11817 [pdf, other]

Critical dynamics in a real-time formulation of the functional renormalization group

Authors: Johannes V. Roth, Lorenz von Smekal

Abstract: We present first calculations of critical spectral functions of the relaxational Models A, B, and C in the Halperin-Hohenberg classification using a real-time formulation of the functional renormalization group (FRG). We revisit the prediction by Son and Stephanov that the linear coupling of a conserved density to the non-conserved order parameter of Model A gives rise to critical Model-B dynamics… ▽ More We present first calculations of critical spectral functions of the relaxational Models A, B, and C in the Halperin-Hohenberg classification using a real-time formulation of the functional renormalization group (FRG). We revisit the prediction by Son and Stephanov that the linear coupling of a conserved density to the non-conserved order parameter of Model A gives rise to critical Model-B dynamics. We formulate both 1-loop and 2-loop self-consistent expansion schemes in the 1PI vertex functions as truncations of the effective average action suitable for real-time applications, and analyze in detail how the different critical dynamics are properly incorporated in the framework of the FRG on the closed-time path. We present results for the corresponding critical spectral functions, extract the dynamic critical exponents for Models A, B, and C, in two and three spatial dimensions, respectively, and compare the resulting values with recent results from the literature. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 52 pages, 9 figures

arXiv:2302.08739 [pdf]

Significantly increased magnetic anisotropy in Co nano-columnar multilayer structure via a unique sequential oblique-normal deposition approach

Authors: Arun Singh Dev, Sharanjeet Singh, Anup Kumar Bera, Pooja Gupta, Velaga Srihari, Pallavi Pandit, Matthias Schwartzkopf, Stephan V. Roth, Dileep Kumar

Abstract: Oblique/normal sequential deposition technique is used to create Co based unique multilayer structure [Co-oblique(4.4nm)/Co-normal (4.2 nm)]x10, where each Co-oblique layer is deposited at an oblique angle of 75deg, to induce large in-plane uniaxial magnetic anisotropy (UMA). Compared to the previous ripple, stress and oblique angle deposition (OAD) related studies on Cobalt in literature, one-ord… ▽ More Oblique/normal sequential deposition technique is used to create Co based unique multilayer structure [Co-oblique(4.4nm)/Co-normal (4.2 nm)]x10, where each Co-oblique layer is deposited at an oblique angle of 75deg, to induce large in-plane uniaxial magnetic anisotropy (UMA). Compared to the previous ripple, stress and oblique angle deposition (OAD) related studies on Cobalt in literature, one-order higher UMA with the easy axis of magnetization along the projection of the tilted nano-columns in the multilayer plane is observed. The multilayer retains magnetic anisotropy even after annealing at 450C. The in-plane UMA in this multilayer is found to be the combination of shape, and magneto-crystalline anisotropy (MCA) confirmed by the temperature-dependent grazing incidence small angle X-ray scattering (GISAXS), in situ reflection high energy electron diffraction (RHEED) and grazing incidence X-ray diffraction (GIXRD) measurements. The crystalline texturing of hcp Co in the multilayer minimizes spin-orbit coupling energy along the column direction, which couples with the shape anisotropy energies and results in preferential orientation of the easy magnetic axis along the projection of the columns in the multilayer plane. Reduction in UMA after annealing is attributed to diffusion/merging of columns and annihilating crystallographic texturing. The obtained one-order high UMA demonstrates the potential application of the unique structure engineering technique, which may have far-reaching advantages in magnetic thin films/multilayers and spintronic devices. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 22 pages, 10 figures

arXiv:2302.03582 [pdf]

Direct Linearly-Polarised Electroluminescence from Perovskite Nanoplatelet Superlattices

Authors: Junzhi Ye, Aobo Ren, Linjie Dai, Tomi Baikie, Renjun Guo, Debapriya Pal, Sebastian Gorgon, Julian E. Heger, Junyang Huang, Yuqi Sun, Rakesh Arul, Gianluca Grimaldi, Kaiwen Zhang, Javad Shamsi, Yi-Teng Huang, Hao Wang, Jiang Wu, A. Femius Koenderink, Laura Torrente Murciano, Matthias Schwartzkopf, Stephen V. Roth, Peter Muller-Buschbaum, Jeremy J. Baumberg, Samuel D. Stranks, Neil C. Greenham , et al. (4 additional authors not shown)

Abstract: Polarised light is critical for a wide range of applications, but is usually generated by filtering unpolarised light, which leads to significant energy losses and requires additional optics. Herein, the direct emission of linearly-polarised light is achieved from light-emitting diodes (LEDs) made of CsPbI3 perovskite nanoplatelet superlattices. Through use of solvents with different vapour pressu… ▽ More Polarised light is critical for a wide range of applications, but is usually generated by filtering unpolarised light, which leads to significant energy losses and requires additional optics. Herein, the direct emission of linearly-polarised light is achieved from light-emitting diodes (LEDs) made of CsPbI3 perovskite nanoplatelet superlattices. Through use of solvents with different vapour pressures, the self-assembly of perovskite nanoplatelets is achieved to enable fine control over the orientation (either face-up or edge-up) and therefore the transition dipole moment. As a result of the highly-uniform alignment of the nanoplatelets, as well as their strong quantum and dielectric confinement, large exciton fine-structure splitting is achieved at the film level, leading to pure-red LEDs exhibiting a high degree of linear polarisation of 74.4% without any photonic structures. This work unveils the possibilities of perovskite nanoplatelets as a highly promising source of linearly-polarised electroluminescence, opening up the development of next-generation 3D displays and optical communications from this highly versatile, solution-processable system. △ Less

Submitted 8 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 26 pages, 5 figures

arXiv:2302.00283 [pdf]

doi 10.1016/j.jmmm.2022.169663

Evolution of interface magnetism in Fe/Alq3 bilayer

Authors: Avinash Ganesh Khanderao, Sonia Kaushik, Arun Singh Dev, V. R. Reddy, Ilya Sergueev, Hans-Christian Wille, Pallavi Pandit, Stephan V Roth, Dileep Kumar

Abstract: Interface magnetism and topological structure of Fe on organic semiconductor film (Alq3) have been studied and compared with Fe film deposited directly on Si (100) substrate. To get information on the diffused Fe layer at the Fe/Alq3 interface, grazing incident nuclear resonance scattering (GINRS) measurements are made depth selective by introducing a 95% enriched thin 57Fe layer at the Interface… ▽ More Interface magnetism and topological structure of Fe on organic semiconductor film (Alq3) have been studied and compared with Fe film deposited directly on Si (100) substrate. To get information on the diffused Fe layer at the Fe/Alq3 interface, grazing incident nuclear resonance scattering (GINRS) measurements are made depth selective by introducing a 95% enriched thin 57Fe layer at the Interface and producing x-ray standing wave within the layered structure. Compared with Fe growth on Si substrate, where film exhibits a hyperfine field value of 32 T (Bulk Fe), a thick Fe- Alq3 interface has been found with reduced electron density and hyperfine fields providing evidence of deep penetration of Fe atoms into Alq3 film. Due to the soft nature of Alq3, Fe moments relax in the film plane. At the same time, Fe on Si has a resultant ~43 deg out-of-plane orientation of Fe moments at the Interface due to the stressed and rough Fe layer near Si. The evolution of magnetism at the Fe-Alq3 Interface is monitored using in-situ magneto-optical Kerr effect (MOKE) during the growth of Fe on the Alq3 surface and small-angle x-ray scattering (SAXS) measurements. It is found that the Fe atom tries to organize into clusters to minimize their surface/interface energy. The origin of the 2.4 nm thick magnetic dead layer at the Interface is attributed to the small Fe clusters of paramagnetic or superparamagnetic nature. The present work provides an understanding of interfacial magnetism at metal-organic interfaces and the topological study using the GI-NRS technique, which is made depth selective to probe magnetism of the diffused ferromagnetic layer, which is otherwise difficult for lab-based techniques. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Journal ref: Journal of Magnetism and Magnetic Materials, 560 (2022) 169663

arXiv:2206.01545 [pdf, other]

Mesh-free Eulerian Physics-Informed Neural Networks

Authors: Fabricio Arend Torres, Marcello Massimo Negri, Monika Nagy-Huber, Maxim Samarin, Volker Roth

Abstract: Physics-informed Neural Networks (PINNs) have recently emerged as a principled way to include prior physical knowledge in form of partial differential equations (PDEs) into neural networks. Although PINNs are generally viewed as mesh-free, current approaches still rely on collocation points within a bounded region, even in settings with spatially sparse signals. Furthermore, if the boundaries are… ▽ More Physics-informed Neural Networks (PINNs) have recently emerged as a principled way to include prior physical knowledge in form of partial differential equations (PDEs) into neural networks. Although PINNs are generally viewed as mesh-free, current approaches still rely on collocation points within a bounded region, even in settings with spatially sparse signals. Furthermore, if the boundaries are not known, the selection of such a region is difficult and often results in a large proportion of collocation points being selected in areas of low relevance. To resolve this severe drawback of current methods, we present a mesh-free and adaptive approach termed particle-density PINN (pdPINN), which is inspired by the microscopic viewpoint of fluid dynamics. The method is based on the Eulerian formulation and, different from classical mesh-free method, does not require the introduction of Lagrangian updates. We propose to sample directly from the distribution over the particle positions, eliminating the need to introduce boundaries while adaptively focusing on the most relevant regions. This is achieved by interpreting a non-negative physical quantity (such as the density or temperature) as an unnormalized probability distribution from which we sample with dynamic Monte Carlo methods. The proposed method leads to higher sample efficiency and improved performance of PINNs. These advantages are demonstrated on various experiments based on the continuity equations, Fokker-Planck equations, and the heat equation. △ Less

Submitted 1 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Preprint

arXiv:2204.07009 [pdf, other]

Learning Invariances with Generalised Input-Convex Neural Networks

Authors: Vitali Nesterov, Fabricio Arend Torres, Monika Nagy-Huber, Maxim Samarin, Volker Roth

Abstract: Considering smooth map**s from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such map**s. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as… ▽ More Considering smooth map**s from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such map**s. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2112.12568 [pdf, other]

doi 10.1103/PhysRevD.105.116017

Real-time methods for spectral functions

Authors: Johannes V. Roth, Dominik Schweitzer, Leon J. Sieke, Lorenz von Smekal

Abstract: In this paper we develop and compare different real-time methods to calculate spectral functions. These are classical-statistical simulations, the Gaussian state approximation (GSA), and the functional renormalization group (FRG) formulated on the Keldysh closed-time path. Our test-bed system is the quartic anharmonic oscillator, a single self-interacting bosonic degree of freedom, coupled to an e… ▽ More In this paper we develop and compare different real-time methods to calculate spectral functions. These are classical-statistical simulations, the Gaussian state approximation (GSA), and the functional renormalization group (FRG) formulated on the Keldysh closed-time path. Our test-bed system is the quartic anharmonic oscillator, a single self-interacting bosonic degree of freedom, coupled to an external heat bath providing dissipation analogous to the Caldeira-Leggett model. As our benchmark we use the spectral function from exact diagonalization with constant Ohmic dam**. To extend the GSA for the open system, we solve the corresponding Heisenberg-Langevin equations in the Gaussian approximation. For the real-time FRG, we introduce a novel general prescription to construct causal regulators based on introducing scale-dependent fictitious heat baths. Our results explicitly demonstrate how the discrete transition lines of the quantum system gradually build up the broad continuous structures in the classical spectral function as temperature increases. At sufficiently high temperatures, classical, GSA and exact-diagonalization results all coincide. The real-time FRG is able to reproduce the effective thermal mass, but overestimates broadening and only qualitatively describes higher excitations, at the present order of our combined vertex and loop expansion. As temperature is lowered, the GSA follows the ensemble average of the exact solution better than the classical spectral function. In the low-temperature strong-coupling regime, the qualitative features of the exact result are best captured by our real-time FRG calculation, with quantitative improvements to be expected at higher truncation orders. △ Less

Submitted 31 May, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: 35 pages, 8 figures, revised version accepted for publication in PRD

arXiv:2111.13185 [pdf, other]

doi 10.1007/978-3-030-92659-5_24

Learning Conditional Invariance through Cycle Consistency

Authors: Maxim Samarin, Vitali Nesterov, Mario Wieser, Aleksander Wieczorek, Sonali Parbhoo, Volker Roth

Abstract: Identifying meaningful and independent factors of variation in a dataset is a challenging learning task frequently addressed by means of deep latent variable models. This task can be viewed as learning symmetry transformations preserving the value of a chosen property along latent dimensions. However, existing approaches exhibit severe drawbacks in enforcing the invariance property in the latent s… ▽ More Identifying meaningful and independent factors of variation in a dataset is a challenging learning task frequently addressed by means of deep latent variable models. This task can be viewed as learning symmetry transformations preserving the value of a chosen property along latent dimensions. However, existing approaches exhibit severe drawbacks in enforcing the invariance property in the latent space. We address these shortcomings with a novel approach to cycle consistency. Our method involves two separate latent subspaces for the target property and the remaining input information, respectively. In order to enforce invariance as well as sparsity in the latent space, we incorporate semantic knowledge by using cycle consistency constraints relying on property side information. The proposed method is based on the deep information bottleneck and, in contrast to other approaches, allows using continuous target properties and provides inherent model selection capabilities. We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models with improved invariance properties. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: 16 pages, 3 figures, published at the DAGM German Conference on Pattern Recognition, Sep. 28 - Oct. 1, 2021

arXiv:2010.06477 [pdf, other]

3DMolNet: A Generative Network for Molecular Structures

Authors: Vitali Nesterov, Mario Wieser, Volker Roth

Abstract: With the recent advances in machine learning for quantum chemistry, it is now possible to predict the chemical properties of compounds and to generate novel molecules. Existing generative models mostly use a string- or graph-based representation, but the precise three-dimensional coordinates of the atoms are usually not encoded. First attempts in this direction have been proposed, where autoregres… ▽ More With the recent advances in machine learning for quantum chemistry, it is now possible to predict the chemical properties of compounds and to generate novel molecules. Existing generative models mostly use a string- or graph-based representation, but the precise three-dimensional coordinates of the atoms are usually not encoded. First attempts in this direction have been proposed, where autoregressive or GAN-based models generate atom coordinates. Those either lack a latent space in the autoregressive setting, such that a smooth exploration of the compound space is not possible, or cannot generalize to varying chemical compositions. We propose a new approach to efficiently generate molecular structures that are not restricted to a fixed size or composition. Our model is based on the variational autoencoder which learns a translation-, rotation-, and permutation-invariant low-dimensional representation of molecules. Our experiments yield a mean reconstruction error below 0.05 Angstrom, outperforming the current state-of-the-art methods by a factor of four, and which is even lower than the spatial quantization error of most chemical descriptors. The compositional and structural validity of newly generated molecules has been confirmed by quantum chemical methods in a set of experiments. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2006.13645 [pdf, other]

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

Authors: Maxim Samarin, Volker Roth, David Belius

Abstract: The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of wi… ▽ More The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of widths common in practice, trained on complex datasets such as ImageNet. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet, and find that their behavior deviates significantly from their finite-width NTK counterparts. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 10 pages

arXiv:2002.02782 [pdf, other]

Inverse Learning of Symmetries

Authors: Mario Wieser, Sonali Parbhoo, Aleksander Wieczorek, Volker Roth

Abstract: Symmetry transformations induce invariances which are frequently described with deep latent variable models. In many complex domains, such as the chemical space, invariances can be observed, yet the corresponding symmetry transformation cannot be formulated analytically. We propose to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captur… ▽ More Symmetry transformations induce invariances which are frequently described with deep latent variable models. In many complex domains, such as the chemical space, invariances can be observed, yet the corresponding symmetry transformation cannot be formulated analytically. We propose to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captures the target and the second subspace the remaining invariant information. Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser. Unlike previous methods, we focus on the challenging task of minimising mutual information in continuous domains. To this end, we base the calculation of mutual information on correlation matrices in combination with a bijective variable transformation. Extensive experiments demonstrate that our model outperforms state-of-the-art methods on artificial and molecular datasets. △ Less

Submitted 22 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: Accepted for publication at NeurIPS 2020

arXiv:2002.00815 [pdf, other]

Learning Extremal Representations with Deep Archetypal Analysis

Authors: Sebastian Mathias Keller, Maxim Samarin, Fabricio Arend Torres, Mario Wieser, Volker Roth

Abstract: Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature sp… ▽ More Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature space. Learning an appropriate feature space and identifying suitable archetypes simultaneously addresses this problem. This paper introduces a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a variational autoencoder, and an optimal representation with respect to the unknown archetypes can be learned end-to-end. The reformulation of linear Archetypal Analysis as deep variational information bottleneck, allows the incorporation of arbitrarily complex side information during training. Furthermore, an alternative prior, based on a modified Dirichlet distribution, is proposed. The real-world applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. In this experiment, it is demonstrated that exchanging the side information but kee** the same set of molecules, e. g. using as side information the heat capacity of each molecule instead of the band gap energy, will result in the identification of different archetypes. As an application, these learned representations of chemical space might reveal distinct starting points for de novo molecular design. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: Under review for publication at the International Journal of Computer Vision (IJCV). Extended version of our GCPR2019 paper "Deep Archetypal Analysis"

arXiv:1912.13480 [pdf, other]

doi 10.3390/e22020131

On the Difference Between the Information Bottleneck and the Deep Information Bottleneck

Authors: Aleksander Wieczorek, Volker Roth

Abstract: Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their… ▽ More Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their latent representation $T$ take the form of two Markov chains $T-X-Y$ and $X-T-Y$. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions $P(X,Y,T)$. We therefore show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement $T-X-Y$ is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model. △ Less

Submitted 31 December, 2019; originally announced December 2019.

arXiv:1908.05254 [pdf, other]

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Authors: Mike Wu, Sonali Parbhoo, Michael C. Hughes, Volker Roth, Finale Doshi-Velez

Abstract: Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interp… ▽ More Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interpretability as a separate process after training. In this work, we take a different approach and explicitly regularize deep models so that they are well-approximated by processes that humans can step-through in little time. Specifically, we train several families of deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. The resulting axis-aligned decision functions uniquely make tree regularized models easy for humans to interpret. Moreover, for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision tree in predefined, human-interpretable contexts. Using intuitive toy examples as well as medical tasks for patients in critical care and with HIV, we demonstrate that this new family of tree regularizers yield models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1908.04494, arXiv:1711.06178

arXiv:1908.04494 [pdf, other]

Regional Tree Regularization for Interpretability in Black Box Models

Authors: Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

Abstract: The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regio… ▽ More The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regional tree regularization, which encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Practitioners can define regions based on domain knowledge of contexts where different decision-making logic is needed. Across many datasets, our approach delivers more accurate predictions than simply training separate decision trees for each region, while producing simpler explanations than other neural net regularization schemes without sacrificing predictive power. Two healthcare case studies in critical care and HIV demonstrate how experts can improve understanding of deep models via our approach. △ Less

Submitted 16 March, 2020; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: AAAI 2020 (Oral)

arXiv:1904.03057 [pdf, other]

Tensor B-Spline Numerical Methods for PDEs: a High-Performance Alternative to FEM

Authors: Dmytro Shulga, Oleksii Morozov, Volker Roth, Felix Friedrich, Patrick Hunziker

Abstract: Tensor B-spline methods are a high-performance alternative to solve partial differential equations (PDEs). This paper gives an overview on the principles of Tensor B-spline methodology, shows their use and analyzes their performance in application examples, and discusses its merits. Tensors preserve the dimensional structure of a discretized PDE, which makes it possible to develop highly efficient… ▽ More Tensor B-spline methods are a high-performance alternative to solve partial differential equations (PDEs). This paper gives an overview on the principles of Tensor B-spline methodology, shows their use and analyzes their performance in application examples, and discusses its merits. Tensors preserve the dimensional structure of a discretized PDE, which makes it possible to develop highly efficient computational solvers. B-splines provide high-quality approximations, lead to a sparse structure of the system operator represented by shift-invariant separable kernels in the domain, and are mesh-free by construction. Further, high-order bases can easily be constructed from B-splines. In order to demonstrate the advantageous numerical performance of tensor B-spline methods, we studied the solution of a large-scale heat-equation problem (consisting of roughly 0.8 billion nodes!) on a heterogeneous workstation consisting of multi-core CPU and GPUs. Our experimental results nicely confirm the excellent numerical approximation properties of tensor B-splines, and their unique combination of high computational efficiency and low memory consumption, thereby showing huge improvements over standard finite-element methods (FEM). △ Less

Submitted 5 April, 2019; originally announced April 2019.

arXiv:1901.10799 [pdf, other]

Deep Archetypal Analysis

Authors: Sebastian Mathias Keller, Maxim Samarin, Mario Wieser, Volker Roth

Abstract: "Deep Archetypal Analysis" generates latent representations of high-dimensional datasets in terms of fractions of intuitively understandable basic entities called archetypes. The proposed method is an extension of linear "Archetypal Analysis" (AA), an unsupervised method to represent multivariate data points as sparse convex combinations of extremal elements of the dataset. Unlike the original for… ▽ More "Deep Archetypal Analysis" generates latent representations of high-dimensional datasets in terms of fractions of intuitively understandable basic entities called archetypes. The proposed method is an extension of linear "Archetypal Analysis" (AA), an unsupervised method to represent multivariate data points as sparse convex combinations of extremal elements of the dataset. Unlike the original formulation of AA, "Deep AA" can also handle side information and provides the ability for data-driven representation learning which reduces the dependence on expert knowledge. Our method is motivated by studies of evolutionary trade-offs in biology where archetypes are species highly adapted to a single task. Along these lines, we demonstrate that "Deep AA" also lends itself to the supervised exploration of chemical space, marking a distinct starting point for de novo molecular design. In the unsupervised setting we show how "Deep AA" is used on CelebA to identify archetypal faces. These can then be superimposed in order to generate new faces which inherit dominant traits of the archetypes they are based on. △ Less

Submitted 24 January, 2020; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: Published at the German Conference on Pattern Recognition 2019 (GCPR)

Journal ref: 41th German Conference on Pattern Recognition, GCPR 2019

arXiv:1812.06594 [pdf, other]

Computational EEG in Personalized Medicine: A study in Parkinson's Disease

Authors: Sebastian Mathias Keller, Maxim Samarin, Antonia Meyer, Vitalii Kosak, Ute Gschwandtner, Peter Fuhr, Volker Roth

Abstract: Recordings of electrical brain activity carry information about a person's cognitive health. For recording EEG signals, a very common setting is for a subject to be at rest with its eyes closed. Analysis of these recordings often involve a dimensionality reduction step in which electrodes are grouped into 10 or more regions (depending on the number of electrodes available). Then an average over ea… ▽ More Recordings of electrical brain activity carry information about a person's cognitive health. For recording EEG signals, a very common setting is for a subject to be at rest with its eyes closed. Analysis of these recordings often involve a dimensionality reduction step in which electrodes are grouped into 10 or more regions (depending on the number of electrodes available). Then an average over each group is taken which serves as a feature in subsequent evaluation. Currently, the most prominent features used in clinical practice are based on spectral power densities. In our work we consider a simplified grou** of electrodes into two regions only. In addition to spectral features we introduce a secondary, non-redundant view on brain activity through the lens of Tsallis Entropy $S_{q=2}$. We further take EEG measurements not only in an eyes closed (ec) but also in an eyes open (eo) state. For our cohort of healthy controls (HC) and individuals suffering from Parkinson's disease (PD), the question we are asking is the following: How well can one discriminate between HC and PD within this simplified, binary grou**? This question is motivated by the commercial availability of inexpensive and easy to use portable EEG devices. If enough information is retained in this binary grou**, then such simple devices could potentially be used as personal monitoring tools, as standard screening tools by general practitioners or as digital biomarkers for easy long term monitoring during neurological studies. △ Less

Submitted 2 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:811.07216

arXiv:1811.10347 [pdf, other]

Estimating Causal Effects With Partial Covariates For Clinical Interpretability

Authors: Sonali Parbhoo, Mario Wieser, Volker Roth

Abstract: Estimating the causal effects of an intervention in the presence of confounding is a frequently occurring problem in applications such as medicine. The task is challenging since there may be multiple confounding factors, some of which may be missing, and inferences must be made from high-dimensional, noisy measurements. In this paper, we propose a decision-theoretic approach to estimate the causal… ▽ More Estimating the causal effects of an intervention in the presence of confounding is a frequently occurring problem in applications such as medicine. The task is challenging since there may be multiple confounding factors, some of which may be missing, and inferences must be made from high-dimensional, noisy measurements. In this paper, we propose a decision-theoretic approach to estimate the causal effects of interventions where a subset of the covariates is unavailable for some patients during testing. Our approach uses the information bottleneck principle to perform a discrete, low-dimensional sufficient reduction of the covariate data to estimate a distribution over confounders. In doing so, we can estimate the causal effect of an intervention where only partial covariate information is available. Our results on a causal inference benchmark and a real application for treating sepsis show that our method achieves state-of-the-art performance, without sacrificing interpretability. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

arXiv:1811.07969 [pdf, other]

Informed MCMC with Bayesian Neural Networks for Facial Image Analysis

Authors: Adam Kortylewski, Mario Wieser, Andreas Morel-Forster, Aleksander Wieczorek, Sonali Parbhoo, Volker Roth, Thomas Vetter

Abstract: Computer vision tasks are difficult because of the large variability in the data that is induced by changes in light, background, partial occlusion as well as the varying pose, texture, and shape of objects. Generative approaches to computer vision allow us to overcome this difficulty by explicitly modeling the physical image formation process. Using generative object models, the analysis of an ob… ▽ More Computer vision tasks are difficult because of the large variability in the data that is induced by changes in light, background, partial occlusion as well as the varying pose, texture, and shape of objects. Generative approaches to computer vision allow us to overcome this difficulty by explicitly modeling the physical image formation process. Using generative object models, the analysis of an observed image is performed via Bayesian inference of the posterior distribution. This conceptually simple approach tends to fail in practice because of several difficulties stemming from sampling the posterior distribution: high-dimensionality and multi-modality of the posterior distribution as well as expensive simulation of the rendering process. The main difficulty of sampling approaches in a computer vision context is choosing the proposal distribution accurately so that maxima of the posterior are explored early and the algorithm quickly converges to a valid image interpretation. In this work, we propose to use a Bayesian Neural Network for estimating an image dependent proposal distribution. Compared to a standard Gaussian random walk proposal, this accelerates the sampler in finding regions of the posterior with high value. In this way, we can significantly reduce the number of samples needed to perform facial image analysis. △ Less

Submitted 29 November, 2018; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 2018

arXiv:1807.02326 [pdf, other]

Cause-Effect Deep Information Bottleneck For Systematically Missing Covariates

Authors: Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek, Volker Roth

Abstract: Estimating the causal effects of an intervention from high-dimensional observational data is difficult due to the presence of confounding. The task is often complicated by the fact that we may have a systematic missingness in our data at test time. Our approach uses the information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of informat… ▽ More Estimating the causal effects of an intervention from high-dimensional observational data is difficult due to the presence of confounding. The task is often complicated by the fact that we may have a systematic missingness in our data at test time. Our approach uses the information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information. Based on the sufficiently reduced covariate, we transfer the relevant information to cases where data is missing at test time, allowing us to reliably and accurately estimate the effects of an intervention, even where data is incomplete. Our results on causal inference benchmarks and a real application for treating sepsis show that our method achieves state-of-the art performance, without sacrificing interpretability. △ Less

Submitted 28 February, 2020; v1 submitted 6 July, 2018; originally announced July 2018.

arXiv:1804.06216 [pdf, other]

Learning Sparse Latent Representations with the Deep Copula Information Bottleneck

Authors: Aleksander Wieczorek, Mario Wieser, Damian Murezzan, Volker Roth

Abstract: Deep latent variable models are powerful tools for representation learning. In this paper, we adopt the deep information bottleneck model, identify its shortcomings and propose a model that circumvents them. To this end, we apply a copula transformation which, by restoring the invariance properties of the information bottleneck method, leads to disentanglement of the features in the latent space.… ▽ More Deep latent variable models are powerful tools for representation learning. In this paper, we adopt the deep information bottleneck model, identify its shortcomings and propose a model that circumvents them. To this end, we apply a copula transformation which, by restoring the invariance properties of the information bottleneck method, leads to disentanglement of the features in the latent space. Building on that, we show how this transformation translates to sparsity of the latent space in the new model. We evaluate our method on artificial and real data. △ Less

Submitted 19 April, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

Comments: Published as a conference paper at ICLR 2018. Aleksander Wieczorek and Mario Wieser contributed equally to this work

Journal ref: Conference track - ICLR 2018

arXiv:1801.07558 [pdf, other]

doi 10.1039/C9SM01975H

Dynamic characterization of cellulose nanofibrils in sheared and extended semi-dilute dispersions

Authors: Tomas Rosén, Nitesh Mittal, Stephan V. Roth, Peng Zhang, L. Daniel Söderberg, Fredrik Lundell

Abstract: New materials made through controlled assembly of dispersed cellulose nanofibrils (CNF) has the potential to develop into biobased competitors to some of the highest performing materials today. The performance of these new cellulose materials depends on how easily CNF alignment can be controlled with hydrodynamic forces, which are always in competition with a different process driving the system t… ▽ More New materials made through controlled assembly of dispersed cellulose nanofibrils (CNF) has the potential to develop into biobased competitors to some of the highest performing materials today. The performance of these new cellulose materials depends on how easily CNF alignment can be controlled with hydrodynamic forces, which are always in competition with a different process driving the system towards isotropy, called rotary diffusion. In this work, we present a flow-stop experiment using polarized optical microscopy (POM) to study the rotary diffusion of CNF dispersions in process relevant flows and concentrations. This is combined with small angle X-ray scattering (SAXS) experiments to analyze the true orientation distribution function (ODF) of the flowing fibrils. It is found that the rotary diffusion process of CNF occurs at multiple time scales, where the fastest scale seems to be dependent on the deformation history of the dispersion before the stop. At the same time, the hypothesis that rotary diffusion is dependent on the initial ODF does not hold as the same distribution can result in different diffusion time scales. The rotary diffusion is found to be faster in flows dominated by shear compared to pure extensional flows. Furthermore, the experimental setup can be used to quickly characterize the dynamic properties of flowing CNF and thus aid in determining the quality of the dispersion and its usability in material processes. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: 45 pages, 13 figures

Journal ref: Soft Matter, 2020, Advance Article

arXiv:1801.00933 [pdf, other]

New Directions for Trust in the Certificate Authority Ecosystem

Authors: Jan-Ole Malchow, Benjamin Güldenring, Volker Roth

Abstract: Many of the benefits we derive from the Internet require trust in the authenticity of HTTPS connections. Unfortunately, the public key certification ecosystem that underwrites this trust has failed us on numerous occasions. Towards an exploration of the root causes we present an update to the common knowledge about the Certificate Authority (CA) ecosystem. Based on our findings the certificate eco… ▽ More Many of the benefits we derive from the Internet require trust in the authenticity of HTTPS connections. Unfortunately, the public key certification ecosystem that underwrites this trust has failed us on numerous occasions. Towards an exploration of the root causes we present an update to the common knowledge about the Certificate Authority (CA) ecosystem. Based on our findings the certificate ecosystem currently undergoes a drastic transformation. Big steps towards ubiquitous encryption were made, however, on the expense of trust for authentication of communication partners. Furthermore we describe systemic problems rooted in misaligned incentives between players in the ecosystem. We depict that proposed security extensions do not correctly realign these incentives. As such we argue that it is worth considering alternative methods of authentication. As a first step in this direction we propose an insurance-based mechanism and we demonstrate that it is technically feasible. △ Less

Submitted 3 January, 2018; originally announced January 2018.

arXiv:1711.06178 [pdf, other]

Beyond Sparsity: Tree Regularization of Deep Models for Interpretability

Authors: Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

Abstract: The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with fe… ▽ More The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: To appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material

arXiv:1711.02489 [pdf, other]

doi 10.1021/acs.jpcc.7b11105

Evaluating alignment of elongated nanoparticles in cylindrical geometries through small angle X-ray scattering experiments

Authors: Tomas Rosén, Christophe Brouzet, Stephan V. Roth, Fredrik Lundell, L. Daniel Söderberg

Abstract: The increased availability and brilliance of new X-ray facilities have in the recent years opened up the possibility to characterize the motion of dispersed nanoparticles in various microfluidic applications. One of these applications is the process of making strong continuous filaments through hydrodynamic alignment and assembly of cellulose nanofibrils (CNF) demonstrated by Håkansson et al. [Nat… ▽ More The increased availability and brilliance of new X-ray facilities have in the recent years opened up the possibility to characterize the motion of dispersed nanoparticles in various microfluidic applications. One of these applications is the process of making strong continuous filaments through hydrodynamic alignment and assembly of cellulose nanofibrils (CNF) demonstrated by Håkansson et al. [Nature communications 5, 2014]. In this process it is vital to study the alignment of the nanofibrils in the flow, as this in turn affects the final material properties of the dried filament. Small angle X-ray scattering (SAXS) is a well-suited characterization technique for this, which typically provides the alignment in a projected plane perpendicular to the beam direction. In this work, we demonstrate a simple method to reconstruct the full three-dimensional (3D) orientation distribution function (ODF) from a SAXS-experiment through the assumption that the azimuthal angle of the nanofibril around the flow direction is distributed uniformly; an assumption that is approximately valid in the flow-focusing process. For demonstrational purposes, the experimental results from Håkansson et al. (2014) have been revised, resulting in a small correction to the presented order parameters. The results are then directly compared with simple numerical models to describe the increased alignment of CNF both in the flowing system and during the drying process. The proposed reconstruction method will allow for further improvements of theoretical or numerical simulations and consequently open up new possibilities for optimizing assembly processes, which include flow-alignment of elongated nanoparticles. △ Less

Submitted 3 November, 2017; originally announced November 2017.

Comments: 37 pages, 10 figures, supplementary material

arXiv:1701.06171 [pdf, other]

Greedy Structure Learning of Hierarchical Compositional Models

Authors: Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter

Abstract: In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. Existing approaches to this problem are limited by making strong a-priori assumptions about the object's geometric structure and require segmented training data for learning. In this paper, we propose a… ▽ More In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. Existing approaches to this problem are limited by making strong a-priori assumptions about the object's geometric structure and require segmented training data for learning. In this paper, we propose a novel framework for learning hierarchical compositional models (HCMs) which do not suffer from the mentioned limitations. We present a generalized formulation of HCMs and describe a greedy structure learning framework that consists of two phases: Bottom-up part learning and top-down model composition. Our framework integrates the foreground-background segmentation problem into the structure learning task via a background model. As a result, we can jointly optimize for the number of layers in the hierarchy, the number of parts per layer and a foreground-background segmentation based on class labels only. We show that the learned HCMs are semantically meaningful and achieve competitive results when compared to other generative object models at object classification on a standard transfer learning dataset. △ Less

Submitted 14 April, 2019; v1 submitted 22 January, 2017; originally announced January 2017.

Comments: CVPR 2019

arXiv:1611.00261 [pdf, other]

Causal Compression

Authors: Aleksander Wieczorek, Volker Roth

Abstract: We propose a new method of discovering causal relationships in temporal data based on the notion of causal compression. To this end, we adopt the Pearlian graph setting and the directed information as an information theoretic tool for quantifying causality. We introduce chain rule for directed information and use it to motivate causal sparsity. We show two applications of the proposed method: caus… ▽ More We propose a new method of discovering causal relationships in temporal data based on the notion of causal compression. To this end, we adopt the Pearlian graph setting and the directed information as an information theoretic tool for quantifying causality. We introduce chain rule for directed information and use it to motivate causal sparsity. We show two applications of the proposed method: causal time series segmentation which selects time points capturing the incoming and outgoing causal flow between time points belonging to different signals, and causal bipartite graph recovery. We prove that modelling of causality in the adopted set-up only requires estimating the copula density of the data distribution and thus does not depend on its marginals. We evaluate the method on time resolved gene expression data. △ Less

Submitted 1 November, 2016; originally announced November 2016.

arXiv:1605.05856 [pdf, other]

doi 10.1145/2987443.2987457

Towards Better Internet Citizenship: Reducing the Footprint of Internet-wide Scans by Topology Aware Prefix Selection

Authors: Johannes Klick, Stephan Lau, Matthias Wählisch, Volker Roth

Abstract: Internet service discovery is an emerging topic to study the deployment of protocols. Towards this end, our community periodically scans the entire advertised IPv4 address space. In this paper, we question this principle. Being good Internet citizens means that we should limit scan traffic to what is necessary. We conducted a study of scan data, which shows that several prefixes do not accommodate… ▽ More Internet service discovery is an emerging topic to study the deployment of protocols. Towards this end, our community periodically scans the entire advertised IPv4 address space. In this paper, we question this principle. Being good Internet citizens means that we should limit scan traffic to what is necessary. We conducted a study of scan data, which shows that several prefixes do not accommodate any host of interest and the network topology is fairly stable. We argue that this allows us to collect representative data by scanning less. In our paper, we explore the idea to scan all prefixes once and then identify prefixes of interest for future scanning. Based on our analysis of the censys.io data set (4.1 TB data encompassing 28 full IPv4 scans within 6 months) we found that we can reduce scan traffic between 25-90% and miss only 1-10% of the hosts, depending on desired trade-offs and protocols. △ Less

Submitted 14 September, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: 7 pages, 6 figures, 1 table. Published in Proc. of ACM IMC, 2016

ACM Class: C.2.5; C.2.1; C.2.3

Journal ref: Proceedings of ACM Internet Measurement Conference (IMC) 2016

arXiv:1510.01485 [pdf, other]

Bayesian Markov Blanket Estimation

Authors: Dinu Kaufmann, Sonali Parbhoo, Aleksander Wieczorek, Sebastian Keller, David Adametz, Volker Roth

Abstract: This paper considers a Bayesian view for estimating a sub-network in a Markov random field. The sub-network corresponds to the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we deriv… ▽ More This paper considers a Bayesian view for estimating a sub-network in a Markov random field. The sub-network corresponds to the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we derive analytic expressions for posterior conditionals. Subsequently, we develop an inference scheme which makes use of the factorization. As a result, estimation of a sub-network is possible without inferring an entire network. Since the resulting Gibbs sampler scales linearly with the number of variables, it can handle relatively large neighbourhoods. The proposed scheme results in faster convergence and superior mixing of the Markov chain than existing Bayesian network estimation techniques. △ Less

Submitted 6 October, 2015; originally announced October 2015.

Comments: 16 pages, 5 figures

arXiv:1504.03701 [pdf, other]

Probabilistic Clustering of Time-Evolving Distance Data

Authors: Julia E. Vogt, Marius Kloft, Stefan Stark, Sudhir S. Raman, Sandhya Prabhakaran, Volker Roth, Gunnar Rätsch

Abstract: We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identi… ▽ More We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time. △ Less

Submitted 14 April, 2015; originally announced April 2015.

arXiv:1301.6263 [pdf, other]

A Secure Submission System for Online Whistleblowing Platforms

Authors: Volker Roth, Benjamin Güldenring, Eleanor Rieffel, Sven Dietrich, Lars Ries

Abstract: Whistleblower laws protect individuals who inform the public or an authority about governmental or corporate misconduct. Despite these laws, whistleblowers frequently risk reprisals and sites such as WikiLeaks emerged to provide a level of anonymity to these individuals. However, as countries increase their level of network surveillance and Internet protocol data retention, the mere act of using a… ▽ More Whistleblower laws protect individuals who inform the public or an authority about governmental or corporate misconduct. Despite these laws, whistleblowers frequently risk reprisals and sites such as WikiLeaks emerged to provide a level of anonymity to these individuals. However, as countries increase their level of network surveillance and Internet protocol data retention, the mere act of using anonymizing software such as Tor, or accessing a whistleblowing website through an SSL channel might be incriminating enough to lead to investigations and repercussions. As an alternative submission system we propose an online advertising network called AdLeaks. AdLeaks leverages the ubiquity of unsolicited online advertising to provide complete sender unobservability when submitting disclosures. AdLeaks ads compute a random function in a browser and submit the outcome to the AdLeaks infrastructure. Such a whistleblower's browser replaces the output with encrypted information so that the transmission is indistinguishable from that of a regular browser. Its back-end design assures that AdLeaks must process only a fraction of the resulting traffic in order to receive disclosures with high probability. We describe the design of AdLeaks and evaluate its performance through analysis and experimentation. △ Less

Submitted 26 January, 2013; originally announced January 2013.

Comments: An abridged version has been accepted for publication in the proceedings of Financial Cryptography and Data Security 2013

arXiv:1206.6433 [pdf]

Copula Mixture Model for Dependency-seeking Clustering

Authors: Melanie Rey, Volker Roth

Abstract: We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend mixtures of Canonical Correlation Analysis to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesia… ▽ More We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend mixtures of Canonical Correlation Analysis to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesian mixture, while providing efficient MCMC inference. Experiments on synthetic and real data demonstrate that the increased flexibility of the copula mixture significantly improves the clustering and the interpretability of the results. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1206.4632 [pdf]

A Complete Analysis of the l_1,p Group-Lasso

Authors: Julia Vogt, Volker Roth

Abstract: The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified activ… ▽ More The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2. △ Less

Submitted 18 June, 2012; originally announced June 2012.

Comments: ICML2012

arXiv:0912.0398 [pdf]

doi 10.1063/1.3380823

Complete description of re-entrant phase behaviour in a charge variable colloidal model system

Authors: Patrick Wette, Ina Klassen, Dirk Holland-Moritz, Dieter M. Herlach, Hans Joachim Schoepe, Nina Lorenz, Holger Reiber, Thomas Palberg, Stephan V. Roth

Abstract: In titration experiments with NaOH we have determined the full phase diagram of charged colloidal spheres in dependence on the particle density n, the particle effective charge Zeff and the concentration of screening electrolyte c using microscopy, light and Ultra Small Angle X-Ray Scattering (USAXS). For sufficiently large n the system crystallizes upon increasing Zeff at constant c and melts u… ▽ More In titration experiments with NaOH we have determined the full phase diagram of charged colloidal spheres in dependence on the particle density n, the particle effective charge Zeff and the concentration of screening electrolyte c using microscopy, light and Ultra Small Angle X-Ray Scattering (USAXS). For sufficiently large n the system crystallizes upon increasing Zeff at constant c and melts upon increasing c at only slightly altered Zeff. In contrast to earlier work equilibrium phase boundaries are consistent with a universal melting line prediction from computer simulation, if the elasticity effective charge is used. This charge accounts for both counter-ion condensation and many body effects. △ Less

Submitted 2 December, 2009; originally announced December 2009.

Comments: 10 pages, 3Figures

Journal ref: THE JOURNAL OF CHEMICAL PHYSICS 132, 131102 (2010)

arXiv:0901.1046 [pdf, ps, other]

doi 10.1021/nn800147a

Preparation and electrical properties of cobalt-platinum nanoparticle monolayers deposited by the Langmuir-Blodgett technique

Authors: Vesna Aleksandrovic, Denis Greshnykh, Igor Randjelovic, Andreas Frömsdorf, Andreas Kornowski, Stephan Volkher Roth, Christian Klinke, Horst Weller

Abstract: The Langmuir-Blodgett technique was utilized and optimized to produce closed monolayers of cobalt-platinum nanoparticles over vast areas. It is shown that sample preparation, "dip** angle", and subphase type have a strong impact on the quality of the produced films. The amount of ligands on the nanoparticles surface must be minimized, the dip** angle must be around 105$^{\circ}$, while the g… ▽ More The Langmuir-Blodgett technique was utilized and optimized to produce closed monolayers of cobalt-platinum nanoparticles over vast areas. It is shown that sample preparation, "dip** angle", and subphase type have a strong impact on the quality of the produced films. The amount of ligands on the nanoparticles surface must be minimized, the dip** angle must be around 105$^{\circ}$, while the glycol subphase is necessary to obtain nanoparticle monolayers. The achieved films were characterized by scanning electron microscopy (SEM) and grazing incidence x-ray scattering (GISAXS). The electrical properties of the deposited films were studied by direct current (DC) measurements showing a discrepancy to the variable range hop** transport from the granular metal model, and favoring the simple thermal activated charge transport. SEM, GISAXS as well as DC measurements confirm a narrow size distribution and high ordering of the deposited films. △ Less

Submitted 8 January, 2009; originally announced January 2009.

Comments: 8 pages, 6 figures

Journal ref: ACS Nano 2 (2008) 1123

Showing 1–41 of 41 results for author: Roth, V