Search | arXiv e-print repository

GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations

Authors: Rick Wilming, Artur Dox, Hjalmar Schulz, Marta Oliveira, Benedict Clark, Stefan Haufe

Abstract: Large pre-trained language models have become popular for many applications and form an important backbone of many downstream tasks in natural language processing (NLP). Applying 'explainable artificial intelligence' (XAI) techniques to enrich such models' outputs is considered crucial for assuring their quality and shedding light on their inner workings. However, large language models are trained… ▽ More Large pre-trained language models have become popular for many applications and form an important backbone of many downstream tasks in natural language processing (NLP). Applying 'explainable artificial intelligence' (XAI) techniques to enrich such models' outputs is considered crucial for assuring their quality and shedding light on their inner workings. However, large language models are trained on a plethora of data containing a variety of biases, such as gender biases, affecting model weights and, potentially, behavior. Currently, it is unclear to what extent such biases also impact model explanations in possibly unfavorable ways. We create a gender-controlled text dataset, GECO, in which otherwise identical sentences appear in male and female forms. This gives rise to ground-truth 'world explanations' for gender classification tasks, enabling the objective evaluation of the correctness of XAI methods. We also provide GECOBench, a rigorous quantitative evaluation framework benchmarking popular XAI methods, applying them to pre-trained language models fine-tuned to different degrees. This allows us to investigate how pre-training induces undesirable bias in model explanations and to what extent fine-tuning can mitigate such explanation bias. We show a clear dependency between explanation performance and the number of fine-tuned layers, where XAI methods are observed to particularly benefit from fine-tuning or complete retraining of embedding layers. Remarkably, this relationship holds for models achieving similar classification performance on the same task. With that, we highlight the utility of the proposed gender-controlled dataset and novel benchmarking approach for research and development of novel XAI methods. All code including dataset generation, model training, evaluation and visualization is available at: https://github.com/braindatalab/gecobench △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.03386 [pdf, other]

Learning Long Range Dependencies on Graphs via Random Walks

Authors: Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt

Abstract: Message-passing graph neural networks (GNNs), while excelling at capturing local relationships, often struggle with long-range dependencies on graphs. Conversely, graph transformers (GTs) enable information exchange between all nodes but oversimplify the graph structure by treating them as a set of fixed-length vectors. This work proposes a novel architecture, NeuralWalker, that overcomes the limi… ▽ More Message-passing graph neural networks (GNNs), while excelling at capturing local relationships, often struggle with long-range dependencies on graphs. Conversely, graph transformers (GTs) enable information exchange between all nodes but oversimplify the graph structure by treating them as a set of fixed-length vectors. This work proposes a novel architecture, NeuralWalker, that overcomes the limitations of both methods by combining random walks with message passing. NeuralWalker achieves this by treating random walks as sequences, allowing for the application of recent advances in sequence models in order to capture long-range dependencies within these walks. Based on this concept, we propose a framework that offers (1) more expressive graph representations through random walk sequences, (2) the ability to utilize any sequence model for capturing long-range dependencies, and (3) the flexibility by integrating various GNN and GT architectures. Our experimental evaluations demonstrate that NeuralWalker achieves significant performance improvements on 19 graph and node benchmark datasets, notably outperforming existing methods by up to 13% on the PascalVoc-SP and COCO-SP datasets. Code is available at https://github.com/BorgwardtLab/NeuralWalker. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.00935 [pdf, other]

VERTECS: A COTS-based payload interface board to enable next generation astronomical imaging payloads

Authors: Ezra Fielding, Victor H. Schulz, Keenan A. A. Chatar, Kei Sano, Akitoshi Hanazawa

Abstract: Due to advances in observation and imaging technologies, modern astronomical satellites generate large volumes of data. This necessitates efficient onboard data processing and high-speed data downlink. Reflecting this trend is the VERTECS 6U Astronomical Nanosatellite. Designed for the observation of Extragalactic Background Light (EBL), this mission is expected to generate a substantial amount of… ▽ More Due to advances in observation and imaging technologies, modern astronomical satellites generate large volumes of data. This necessitates efficient onboard data processing and high-speed data downlink. Reflecting this trend is the VERTECS 6U Astronomical Nanosatellite. Designed for the observation of Extragalactic Background Light (EBL), this mission is expected to generate a substantial amount of image data, particularly within the confines of CubeSat capabilities. This paper introduces the VERTECS Camera Control Board (CCB), an open-source payload interface board leveraging Commercial Off-The-Shelf (COTS) components, with a Raspberry Pi Compute Module 4 at its core. The VERTECS CCB hardware and software have been designed from the ground up to serve as the sole interface between the VERTECS bus system and astronomical imaging payload, while providing compute capability not usually seen in nanosatellites of this class. Responsible for mission data processing, it will facilitate high-speed data transfer from the imaging payload via gigabit Ethernet, while also providing a high-bitrate serial connection to the payload X-band transmitter for mission data downlink. Additional interfaces for secondary payloads are provided via USB-C and standard 15-pin camera connectors. The Raspberry Pi embedded within the VERTECS CCB operates on a standard Linux distribution, streamlining the software development process. Beyond addressing the current mission's payload control and data handling requirements, the CCB sets the stage for future missions with heightened data demands. Furthermore, it supports the adoption of machine learning and other compute-intensive applications in orbit. This paper delves into the development of the VERTECS CCB, offering insights into the design and validation of this next-generation payload interface, to ensure that it can survive the rigors of space flight. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 10 pages, to be presented at SPIE Software and Cyberinfrastructure for Astronomy VIII

arXiv:2405.12261 [pdf]

EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods

Authors: Benedict Clark, Rick Wilming, Artur Dox, Paul Eschenbach, Sami Hached, Daniel ** Wodke, Michias Taye Zewdie, Uladzislau Bruila, Marta Oliveira, Hjalmar Schulz, Luca Matteo Cornils, Danny Panknin, Ahcène Boubekki, Stefan Haufe

Abstract: The evolving landscape of explainable artificial intelligence (XAI) aims to improve the interpretability of intricate machine learning (ML) models, yet faces challenges in formalisation and empirical validation, being an inherently unsupervised process. In this paper, we bring together various benchmark datasets and novel performance metrics in an initial benchmarking platform, the Explainable AI… ▽ More The evolving landscape of explainable artificial intelligence (XAI) aims to improve the interpretability of intricate machine learning (ML) models, yet faces challenges in formalisation and empirical validation, being an inherently unsupervised process. In this paper, we bring together various benchmark datasets and novel performance metrics in an initial benchmarking platform, the Explainable AI Comparison Toolkit (EXACT), providing a standardised foundation for evaluating XAI methods. Our datasets incorporate ground truth explanations for class-conditional features, and leveraging novel quantitative metrics, this platform assesses the performance of post-hoc XAI methods in the quality of the explanations they produce. Our recent findings have highlighted the limitations of popular XAI methods, as they often struggle to surpass random baselines, attributing significance to irrelevant features. Moreover, we show the variability in explanations derived from different equally performing model architectures. This initial benchmarking platform therefore aims to allow XAI researchers to test and assure the high quality of their newly developed methods. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2312.03687 [pdf, other]

MatterGen: a generative model for inorganic materials design

Authors: Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Ryota Tomioka, Tian Xie

Abstract: The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing s… ▽ More The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing stable crystals, or can only satisfy a very limited set of property constraints. Here, we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. To enable this, we introduce a new diffusion-based generative process that produces crystalline structures by gradually refining atom types, coordinates, and the periodic lattice. We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset. Compared to prior generative models, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, novel materials with desired chemistry, symmetry, as well as mechanical, electronic and magnetic properties. Finally, we demonstrate multi-property materials design capabilities by proposing structures that have both high magnetic density and a chemical composition with low supply-chain risk. We believe that the quality of generated materials and the breadth of MatterGen's capabilities represent a major advancement towards creating a universal generative model for materials design. △ Less

Submitted 29 January, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 13 pages main text, 35 pages supplementary information

arXiv:2312.03368 [pdf, other]

Bottom-Up Instance Segmentation of Catheters for Chest X-Rays

Authors: Francesca Boccardi, Axel Saalbach, Heinrich Schulz, Samuele Salti, Ilyas Sirazitdinov

Abstract: Chest X-ray (CXR) is frequently employed in emergency departments and intensive care units to verify the proper placement of central lines and tubes and to rule out related complications. The automation of the X-ray reading process can be a valuable support tool for non-specialist technicians and minimize reporting delays due to non-availability of experts. While existing solutions for automated c… ▽ More Chest X-ray (CXR) is frequently employed in emergency departments and intensive care units to verify the proper placement of central lines and tubes and to rule out related complications. The automation of the X-ray reading process can be a valuable support tool for non-specialist technicians and minimize reporting delays due to non-availability of experts. While existing solutions for automated catheter segmentation and malposition detection show promising results, the disentanglement of individual catheters remains an open challenge, especially in complex cases where multiple devices appear superimposed in the X-ray projection. Moreover, conventional top-down instance segmentation methods are ineffective on such thin and long devices, that often extend through the entire image. In this paper, we propose a deep learning approach based on associative embeddings for catheter instance segmentation, able to overcome those limitations and effectively handle device intersections. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2309.08604 [pdf, other]

Tuning Pythia for Forward Physics Experiments

Authors: Max Fieg, Felix Kling, Holger Schulz, Torbjörn Sjöstrand

Abstract: Event generators like Pythia play an important role in physics studies at the Large Hadron Collider (LHC). While they make accurate predictions in the central region, i.e. at pseudorapidities $η<5$, a disagreement between Pythia and measurements in the forward region, $η>7$, has been observed. We introduce a dedicated forward physics tune for the Pythia event generator to be used for forward physi… ▽ More Event generators like Pythia play an important role in physics studies at the Large Hadron Collider (LHC). While they make accurate predictions in the central region, i.e. at pseudorapidities $η<5$, a disagreement between Pythia and measurements in the forward region, $η>7$, has been observed. We introduce a dedicated forward physics tune for the Pythia event generator to be used for forward physics studies at the LHC, which uses a more flexible modelling of beam remnant hadronization and is tuned to available particle spectra measured by LHCf. Furthermore, we provide an uncertainty estimate on the new tune in a data-driven way which can be used as a means of flux uncertainty for future forward physics studies. We demonstrate an application of our tune by showing the updated neutrino and dark photon spectra at the FASER experiment. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Report number: DESY-23-133

arXiv:2301.05722 [pdf, other]

doi 10.1016/j.cpc.2023.108834

ULYSSES, Universal LeptogeneSiS Equation Solver: version 2

Authors: Alessandro Granelli, Christopher Leslie, Yuber F. Perez-Gonzalez, Holger Schulz, Brian Shuve, Jessica Turner, Rosie Walker

Abstract: ULYSSES is a Python package that calculates the baryon asymmetry produced from leptogenesis in the context of a type-I seesaw mechanism. In this release, the new features include code which solves the Boltzmann equations for low-scale leptogenesis; the complete Boltzmann equations for thermal leptogenesis applying proper quantum statistics without assuming kinetic equilibrium of the right-handed n… ▽ More ULYSSES is a Python package that calculates the baryon asymmetry produced from leptogenesis in the context of a type-I seesaw mechanism. In this release, the new features include code which solves the Boltzmann equations for low-scale leptogenesis; the complete Boltzmann equations for thermal leptogenesis applying proper quantum statistics without assuming kinetic equilibrium of the right-handed neutrinos; and, primordial black hole-induced leptogenesis. ULYSSES version 2 has the added functionality of a pre-provided script for a two-dimensional grid scan of the parameter space. As before, the emphasis of the code is on user flexibility, rapid evaluation and is publicly available at https://github.com/earlyuniverse/ulysses. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 24 pages, 2 figures

Journal ref: Comput. Phys. Commun. 291 (2023) 108834

arXiv:2301.01840 [pdf, other]

Towards a Unified User Interface for Visual Analysis of Retinal Data in Ophthalmology

Authors: Martin Röhlig, Lars Nonnemann, Hans-Jörg Schulz, Oliver Stachs, Heidrun Schumann

Abstract: The visual analysis of retinal data contributes to the understanding of a wide range of eye diseases. For the evaluation of cross-sectional studies, ophthalmologists rely on workflows and toolsets established in their work environment. That is, they know what tools and data are needed at each step of their workflow. Yet, manually operating the various tools, including activation, data handling, or… ▽ More The visual analysis of retinal data contributes to the understanding of a wide range of eye diseases. For the evaluation of cross-sectional studies, ophthalmologists rely on workflows and toolsets established in their work environment. That is, they know what tools and data are needed at each step of their workflow. Yet, manually operating the various tools, including activation, data handling, or view arrangement, can be cumbersome and time-consuming. We thus introduce a new visualization-supported toolchaining approach that combines workflow, tools, and data. First, we provide access to the tools required for each step of the workflow. Second, we handle the exchange of data between these tools. Third, we organize the views of the tools on screen using suitable layouts. Fourth, we visualize the connection between workflow, tools, and data to support the data analysis. We demonstrate our approach with a use case in ophthalmic research and report on initial feedback from experts. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: 12 pages, 4 figures

ACM Class: I.3.8

arXiv:2212.09081 [pdf, other]

Riemannian Optimization for Variance Estimation in Linear Mixed Models

Authors: Lena Sembach, Jan Pablo Burgard, Volker H. Schulz

Abstract: Variance parameter estimation in linear mixed models is a challenge for many classical nonlinear optimization algorithms due to the positive-definiteness constraint of the random effects covariance matrix. We take a completely novel view on parameter estimation in linear mixed models by exploiting the intrinsic geometry of the parameter space. We formulate the problem of residual maximum likelihoo… ▽ More Variance parameter estimation in linear mixed models is a challenge for many classical nonlinear optimization algorithms due to the positive-definiteness constraint of the random effects covariance matrix. We take a completely novel view on parameter estimation in linear mixed models by exploiting the intrinsic geometry of the parameter space. We formulate the problem of residual maximum likelihood estimation as an optimization problem on a Riemannian manifold. Based on the introduced formulation, we give geometric higher-order information on the problem via the Riemannian gradient and the Riemannian Hessian. Based on that, we test our approach with Riemannian optimization algorithms numerically. Our approach yields a higher quality of the variance parameter estimates compared to existing approaches. △ Less

Submitted 18 December, 2022; originally announced December 2022.

arXiv:2208.03687 [pdf, ps, other]

Gâteaux semiderivative approach applied to shape optimization for contact problems

Authors: Nico Goldammer, Volker H. Schulz, Kathrin Welker

Abstract: Shape optimization problems constrained by variational inequalities (VI) are non-smooth and non-convex optimization problems. The non-smoothness arises due to the variational inequality constraint, which makes it challenging to derive optimality conditions. Besides the non-smoothness there are complementary aspects due to the VIs as well as distributed, non-linear, non-convex and infinite-dimensio… ▽ More Shape optimization problems constrained by variational inequalities (VI) are non-smooth and non-convex optimization problems. The non-smoothness arises due to the variational inequality constraint, which makes it challenging to derive optimality conditions. Besides the non-smoothness there are complementary aspects due to the VIs as well as distributed, non-linear, non-convex and infinite-dimensional aspects due to the shapes which complicate to set up an optimality system and, thus, to develop efficient solution algorithms. In this paper, we consider Gâteaux semiderivatives in order to formulate optimality conditions. In the application, we concentrate on a shape optimization problem constrained by the contact problem. △ Less

Submitted 11 June, 2024; v1 submitted 7 August, 2022; originally announced August 2022.

MSC Class: 49Q10; 49J40; 35Q93; 65K15

arXiv:2205.03168 [pdf, other]

Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-Ray Data

Authors: Joceline Ziegler, Bjarne Pfitzner, Heinrich Schulz, Axel Saalbach, Bert Arnrich

Abstract: Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact o… ▽ More Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of $0.94$ on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets $ε\in$ {1, 3, 6, 10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of $0.94$ for $ε$ = 6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of $0.76$ in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training. △ Less

Submitted 30 May, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

arXiv:2204.11977 [pdf, other]

Surfaces of section for geodesic flows of closed surfaces

Authors: Gonzalo Contreras, Gerhard Knieper, Marco Mazzucchelli, Benjamin H. Schulz

Abstract: We prove several results concerning the existence of surfaces of section for the geodesic flows of closed orientable Riemannian surfaces. The surfaces of section $Σ$ that we construct are either Birkhoff sections, meaning that they intersect every sufficiently long orbit segment of the geodesic flow, or at least they have some hyperbolic components in $\partialΣ$ as limit sets of the orbits of the… ▽ More We prove several results concerning the existence of surfaces of section for the geodesic flows of closed orientable Riemannian surfaces. The surfaces of section $Σ$ that we construct are either Birkhoff sections, meaning that they intersect every sufficiently long orbit segment of the geodesic flow, or at least they have some hyperbolic components in $\partialΣ$ as limit sets of the orbits of the geodesic flow that do not return to $Σ$. In order to prove these theorems, we provide a study of configurations of simple closed geodesics of closed orientable Riemannian surfaces, which may have independent interest. Our arguments are based on Grayson's curve shortening flow. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 32 pages, 11 figures

MSC Class: 53C22; 37D40; 53D25

arXiv:2203.07175 [pdf, other]

A Linear View on Shape Optimization

Authors: Stephan Schmidt, Volker H. Schulz

Abstract: Shapes do not define a linear space. This paper explores the linear structure of deformations as a representation of shapes. This transforms shape optimization to a variant of optimal control. The numerical challenges of this point of view are highlighted and a novel linear version of the second shape derivative is proposed leading to particular algorithms of shape Newton type. Shapes do not define a linear space. This paper explores the linear structure of deformations as a representation of shapes. This transforms shape optimization to a variant of optimal control. The numerical challenges of this point of view are highlighted and a novel linear version of the second shape derivative is proposed leading to particular algorithms of shape Newton type. △ Less

Submitted 14 March, 2022; originally announced March 2022.

MSC Class: 49M15; 49M41; 49Q10

arXiv:2203.05090 [pdf, other]

doi 10.1088/1361-6471/ac865e

The Forward Physics Facility at the High-Luminosity LHC

Authors: Jonathan L. Feng, Felix Kling, Mary Hall Reno, Juan Rojo, Dennis Soldin, Luis A. Anchordoqui, Jamie Boyd, Ahmed Ismail, Lucian Harland-Lang, Kevin J. Kelly, Vishvas Pandey, Sebastian Trojanowski, Yu-Dai Tsai, Jean-Marco Alameddine, Takeshi Araki, Akitaka Ariga, Tomoko Ariga, Kento Asai, Alessandro Bacchetta, Kincso Balazs, Alan J. Barr, Michele Battistin, Jianming Bian, Caterina Bertone, Weidong Bai , et al. (211 additional authors not shown)

Abstract: High energy collisions at the High-Luminosity Large Hadron Collider (LHC) produce a large number of particles along the beam collision axis, outside of the acceptance of existing LHC experiments. The proposed Forward Physics Facility (FPF), to be located several hundred meters from the ATLAS interaction point and shielded by concrete and rock, will host a suite of experiments to probe Standard Mod… ▽ More High energy collisions at the High-Luminosity Large Hadron Collider (LHC) produce a large number of particles along the beam collision axis, outside of the acceptance of existing LHC experiments. The proposed Forward Physics Facility (FPF), to be located several hundred meters from the ATLAS interaction point and shielded by concrete and rock, will host a suite of experiments to probe Standard Model (SM) processes and search for physics beyond the Standard Model (BSM). In this report, we review the status of the civil engineering plans and the experiments to explore the diverse physics signals that can be uniquely probed in the forward region. FPF experiments will be sensitive to a broad range of BSM physics through searches for new particle scattering or decay signatures and deviations from SM expectations in high statistics analyses with TeV neutrinos in this low-background environment. High statistics neutrino detection will also provide valuable data for fundamental topics in perturbative and non-perturbative QCD and in weak interactions. Experiments at the FPF will enable synergies between forward particle production at the LHC and astroparticle physics to be exploited. We report here on these physics topics, on infrastructure, detector, and simulation studies, and on future directions to realize the FPF's physics potential. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 429 pages, contribution to Snowmass 2021

Report number: UCI-TR-2022-01, CERN-PBC-Notes-2022-001, FERMILAB-PUB-22-094-ND-SCD-T, INT-PUB-22-006, BONN-TH-2022-04

arXiv:2203.01174 [pdf, other]

doi 10.1093/mnras/stac1991

Spherical accretion of collisional gas in modified gravity I: self-similar solutions and a new cosmological hydrodynamical code

Authors: Han Zhang, Tobias Weinzierl, Holger Schulz, Baojiu Li

Abstract: The spherical collapse scenario has great importance in cosmology since it captures several crucial aspects of structure formation. The presence of self-similar solutions in the Einstein-de Sitter (EdS) model greatly simplifies its analysis, making it a powerful tool to gain valuable insights into the real and more complicated physical processes involved in galaxy formation. While there has been a… ▽ More The spherical collapse scenario has great importance in cosmology since it captures several crucial aspects of structure formation. The presence of self-similar solutions in the Einstein-de Sitter (EdS) model greatly simplifies its analysis, making it a powerful tool to gain valuable insights into the real and more complicated physical processes involved in galaxy formation. While there has been a large body of research to incorporate various additional physical processes into spherical collapse, the effect of modified gravity (MG) models, which are popular alternatives to the $ΛCDM$ paradigm to explain the cosmic acceleration, is still not well understood in this scenario. In this paper, we study the spherical accretion of collisional gas in a particular MG model, which is a rare case that also admits self-similar solutions. The model displays interesting behaviours caused by the enhanced gravity and a screening mechanism. Despite the strong effects of MG, we find that its self-similar solution agrees well with that of the EdS model. These results are used to assess a new cosmological hydrodynamical code for spherical collapse simulations introduced here, which is based on the hyperbolic partial differential equation engine ExaHyPE 2. Its good agreement with the theoretical predictions confirms the reliability of this code in modelling astrophysical processes in spherical collapse. We will use this code to study the evolution of gas in more realistic MG models in future work. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 20 pages, 13 figures

arXiv:2202.05084 [pdf, ps, other]

Geodesic Anosov flows, hyperbolic closed geodesics and stable ergodicity

Authors: Gerhard Knieper, Benjamin H. Schulz

Abstract: In this paper we show that the geodesic flow of a Finsler metric is Anosov if and only if there exists a $C^2$ open neighborhood of Finsler metrics all of whose closed geodesics are hyperbolic. For surfaces this result holds also for Riemannian metrics. This follows from a recent result of Contreras and Mazzucchelli. Furthermore, geodesic flows of Riemannian or Finsler metrics on surfaces are… ▽ More In this paper we show that the geodesic flow of a Finsler metric is Anosov if and only if there exists a $C^2$ open neighborhood of Finsler metrics all of whose closed geodesics are hyperbolic. For surfaces this result holds also for Riemannian metrics. This follows from a recent result of Contreras and Mazzucchelli. Furthermore, geodesic flows of Riemannian or Finsler metrics on surfaces are $C^2$ stably ergodic if and only if they are Anosov. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: 8 pages

MSC Class: 37J46 (Primary); 53C22 (Secondary)

arXiv:2110.11862 [pdf, other]

Graph Filtration Kernels

Authors: Till Hendrik Schulz, Pascal Welke, Stefan Wrobel

Abstract: The majority of popular graph kernels is based on the concept of Haussler's $\mathcal{R}$-convolution kernel and defines graph similarities in terms of mutual substructures. In this work, we enrich these similarity measures by considering graph filtrations: Using meaningful orders on the set of edges, which allow to construct a sequence of nested graphs, we can consider a graph at multiple granula… ▽ More The majority of popular graph kernels is based on the concept of Haussler's $\mathcal{R}$-convolution kernel and defines graph similarities in terms of mutual substructures. In this work, we enrich these similarity measures by considering graph filtrations: Using meaningful orders on the set of edges, which allow to construct a sequence of nested graphs, we can consider a graph at multiple granularities. For one thing, this provides access to features on different levels of resolution. Furthermore, rather than to simply compare frequencies of features in graphs, it allows for their comparison in terms of when and for how long they exist in the sequences. In this work, we propose a family of graph kernels that incorporate these existence intervals of features. While our approach can be applied to arbitrary graph features, we particularly highlight Weisfeiler-Lehman vertex labels, leading to efficient kernels. We show that using Weisfeiler-Lehman labels over certain filtrations strictly increases the expressive power over the ordinary Weisfeiler-Lehman procedure in terms of deciding graph isomorphism. In fact, this result directly yields more powerful graph kernels based on such features and has implications to graph neural networks due to their close relationship to the Weisfeiler-Lehman method. We empirically validate the expressive power of our graph kernels and show significant improvements over state-of-the-art graph kernels in terms of predictive performance on various real-world benchmark datasets. △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2109.10905 [pdf, other]

doi 10.1016/j.physrep.2022.04.004

The Forward Physics Facility: Sites, Experiments, and Physics Potential

Authors: Luis A. Anchordoqui, Akitaka Ariga, Tomoko Ariga, Weidong Bai, Kincso Balazs, Brian Batell, Jamie Boyd, Joseph Bramante, Mario Campanelli, Adrian Carmona, Francesco G. Celiberto, Grigorios Chachamis, Matthew Citron, Giovanni De Lellis, Albert De Roeck, Hans Dembinski, Peter B. Denton, Antonia Di Crecsenzo, Milind V. Diwan, Liam Dougherty, Herbi K. Dreiner, Yong Du, Rikard Enberg, Yasaman Farzan, Jonathan L. Feng , et al. (56 additional authors not shown)

Abstract: The Forward Physics Facility (FPF) is a proposal to create a cavern with the space and infrastructure to support a suite of far-forward experiments at the Large Hadron Collider during the High Luminosity era. Located along the beam collision axis and shielded from the interaction point by at least 100 m of concrete and rock, the FPF will house experiments that will detect particles outside the acc… ▽ More The Forward Physics Facility (FPF) is a proposal to create a cavern with the space and infrastructure to support a suite of far-forward experiments at the Large Hadron Collider during the High Luminosity era. Located along the beam collision axis and shielded from the interaction point by at least 100 m of concrete and rock, the FPF will house experiments that will detect particles outside the acceptance of the existing large LHC experiments and will observe rare and exotic processes in an extremely low-background environment. In this work, we summarize the current status of plans for the FPF, including recent progress in civil engineering in identifying promising sites for the FPF and the experiments currently envisioned to realize the FPF's physics potential. We then review the many Standard Model and new physics topics that will be advanced by the FPF, including searches for long-lived particles, probes of dark matter and dark sectors, high-statistics studies of TeV neutrinos of all three flavors, aspects of perturbative and non-perturbative QCD, and high-energy astroparticle physics. △ Less

Submitted 25 May, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: revised version, accepted by Physics Reports

Report number: BNL-222142-2021-FORE, CERN-PBC-Notes-2021-025, DESY-21-142, FERMILAB-CONF-21-452-AE-E-ND-PPD-T, KYUSHU-RCAPP-2021-01, LU TP 21-36, PITT-PACC-2118, SMU-HEP-21-10, UCI-TR-2021-22

Journal ref: Phys. Rept. 968 (2022), 1-50

arXiv:2106.13401 [pdf, other]

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Authors: Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet

Abstract: Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong unde… ▽ More Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: ICML 2021

arXiv:2105.12739 [pdf, other]

doi 10.1007/978-3-030-85262-7_8

Task inefficiency patterns for a wave equation solver

Authors: Holger Schulz, Gonzalo Brito Gadeschi, Oleksandr Rudyy, Tobias Weinzierl

Abstract: The orchestration of complex algorithms demands high levels of automation to use modern hardware efficiently. Task-based programming with OpenMP 5.0 is a prominent candidate to accomplish this goal. We study OpenMP 5.0's tasking in the context of a wave equation solver (ExaHyPE) using three different architectures and runtimes. We describe several task-scheduling flaws present in currently availab… ▽ More The orchestration of complex algorithms demands high levels of automation to use modern hardware efficiently. Task-based programming with OpenMP 5.0 is a prominent candidate to accomplish this goal. We study OpenMP 5.0's tasking in the context of a wave equation solver (ExaHyPE) using three different architectures and runtimes. We describe several task-scheduling flaws present in currently available runtimes, demonstrate how they impact performance and show how to work around them. Finally, we propose extensions to the OpenMP standard. △ Less

Submitted 12 July, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

arXiv:2104.14957 [pdf, other]

doi 10.1007/s11222-021-10071-1

A Riemannian Newton Trust-Region Method for Fitting Gaussian Mixture Models

Authors: Lena Sembach, Jan Pablo Burgard, Volker H. Schulz

Abstract: Gaussian Mixture Models are a powerful tool in Data Science and Statistics that are mainly used for clustering and density approximation. The task of estimating the model parameters is in practice often solved by the Expectation Maximization (EM) algorithm which has its benefits in its simplicity and low per-iteration costs. However, the EM converges slowly if there is a large share of hidden info… ▽ More Gaussian Mixture Models are a powerful tool in Data Science and Statistics that are mainly used for clustering and density approximation. The task of estimating the model parameters is in practice often solved by the Expectation Maximization (EM) algorithm which has its benefits in its simplicity and low per-iteration costs. However, the EM converges slowly if there is a large share of hidden information or overlap** clusters. Recent advances in manifold optimization for Gaussian Mixture Models have gained increasing interest. We introduce an explicit formula for the Riemannian Hessian for Gaussian Mixture Models. On top, we propose a new Riemannian Newton Trust-Region method which outperforms current approaches both in terms of runtime and number of iterations. We apply our method on clustering problems and density approximation tasks. Our method is very powerful for data with a large share of hidden information compared to existing methods. △ Less

Submitted 24 August, 2022; v1 submitted 30 April, 2021; originally announced April 2021.

Comments: 32 pages

ACM Class: G.1.6; G.3

Journal ref: Stat Comput 32, 8 (2022)

arXiv:2104.06777 [pdf, other]

Existence, Uniqueness and Numerical Modeling of Wine Fermentation Based on Integro-Differential Equations

Authors: Christina Schenk, Volker H. Schulz

Abstract: Predictive modeling is the key factor for saving time and resources with respect to manufacturing processes such as fermentation processes arising e.g.\ in food and chemical manufacturing processes. According to Zhang et al. (2002), the open-loop dynamics of yeast are highly dependent on the initial cell mass distribution. This can be modeled via population balance models describing the single-cel… ▽ More Predictive modeling is the key factor for saving time and resources with respect to manufacturing processes such as fermentation processes arising e.g.\ in food and chemical manufacturing processes. According to Zhang et al. (2002), the open-loop dynamics of yeast are highly dependent on the initial cell mass distribution. This can be modeled via population balance models describing the single-cell behavior of the yeast cell. There have already been several population balance models for wine fermentation in the literature. However, the new model introduced in this paper is much more detailed than the ones studied previously. This new model for the white wine fermentation process is based on a combination of components previously introduced in literature. It turns it into a system of highly nonlinear weakly hyperbolic partial/ordinary integro-differential equations. This model becomes very challenging from a theoretical and numerical point of view. Existence and uniqueness of solutions to a simplified version of the introduced problem is studied based on semigroup theory. For its numerical solution a numerical methodology based on a finite volume scheme combined with a time implicit scheme is derived. The impact of the initial cell mass distribution on the solution is studied and underlined with numerical results. The detailed model is compared to a simpler model based on ordinary differential equations. The observed differences for different initial distributions and the different models turn out to be smaller than expected. The outcomes of this paper are very interesting and useful for applied mathematicians, winemakers and process engineers. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: 26 pages, 20 figures

arXiv:2103.05751 [pdf, other]

BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

Authors: Wen**g Wang, Mohan Krishnamoorthy, Juliane Muller, Stephen Mrenna, Holger Schulz, Xiangyang Ju, Sven Leyffer, Zachary Marshall

Abstract: The parameters in Monte Carlo (MC) event generators are tuned on experimental measurements by evaluating the goodness of fit between the data and the MC predictions. The relative importance of each measurement is adjusted manually in an often time-consuming, iterative process to meet different experimental needs. In this work, we introduce several optimization formulations and algorithms with new… ▽ More The parameters in Monte Carlo (MC) event generators are tuned on experimental measurements by evaluating the goodness of fit between the data and the MC predictions. The relative importance of each measurement is adjusted manually in an often time-consuming, iterative process to meet different experimental needs. In this work, we introduce several optimization formulations and algorithms with new decision criteria for streamlining and automating this process. These algorithms are designed for two formulations: bilevel optimization and robust optimization. Both formulations are applied to the datasets used in the ATLAS A14 tune and to the dedicated hadronization datasets generated by the sherpa generator, respectively. The corresponding tuned generator parameters are compared using three metrics. We compare the quality of our automatic tunes to the published ATLAS A14 tune. Moreover, we analyze the impact of a pre-processing step that excludes data that cannot be described by the physics models used in the MC event generators. △ Less

Submitted 11 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: 87 pages, Submission to SciPost

arXiv:2103.05748 [pdf, other]

doi 10.1051/epjconf/202125103060

Apprentice for Event Generator Tuning

Authors: Mohan Krishnamoorthy, Holger Schulz, Xiangyang Ju, Wen**g Wang, Sven Leyffer, Zachary Marshall, Stephen Mrenna, Juliane Muller, James B. Kowalkowski

Abstract: Apprentice is a tool developed for event generator tuning. It contains a range of conceptual improvements and extensions over the tuning tool Professor. Its core functionality remains the construction of a multivariate analytic surrogate model to computationally expensive Monte-Carlo event generator predictions. The surrogate model is used for numerical optimization in chi-square minimization and… ▽ More Apprentice is a tool developed for event generator tuning. It contains a range of conceptual improvements and extensions over the tuning tool Professor. Its core functionality remains the construction of a multivariate analytic surrogate model to computationally expensive Monte-Carlo event generator predictions. The surrogate model is used for numerical optimization in chi-square minimization and likelihood evaluation. Apprentice also introduces algorithms to automate the selection of observable weights to minimize the effect of mis-modeling in the event generators. We illustrate our improvements for the task of MC-generator tuning and limit setting. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 9 pages, 2 figures, submitted to the 25th International Conference on Computing in High-Energy and Nuclear Physics

arXiv:2101.08104 [pdf, other]

A Generalized Weisfeiler-Lehman Graph Kernel

Authors: Till Hendrik Schulz, Tamás Horváth, Pascal Welke, Stefan Wrobel

Abstract: The Weisfeiler-Lehman graph kernels are among the most prevalent graph kernels due to their remarkable time complexity and predictive performance. Their key concept is based on an implicit comparison of neighborhood representing trees with respect to equality (i.e., isomorphism). This binary valued comparison is, however, arguably too rigid for defining suitable similarity measures over graphs. To… ▽ More The Weisfeiler-Lehman graph kernels are among the most prevalent graph kernels due to their remarkable time complexity and predictive performance. Their key concept is based on an implicit comparison of neighborhood representing trees with respect to equality (i.e., isomorphism). This binary valued comparison is, however, arguably too rigid for defining suitable similarity measures over graphs. To overcome this limitation, we propose a generalization of Weisfeiler-Lehman graph kernels which takes into account the similarity between trees rather than equality. We achieve this using a specifically fitted variation of the well-known tree edit distance which can efficiently be calculated. We empirically show that our approach significantly outperforms state-of-the-art methods in terms of predictive performance on datasets containing structurally more complex graphs beyond the typically considered molecular graphs. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: n/a

arXiv:2012.13423 [pdf, other]

doi 10.1109/ACCESS.2020.3028571

Improving Predictability of User-Affecting Metrics to Support Anomaly Detection in Cloud Services

Authors: Vilc Rufino, Mateus Nogueira, Alberto Avritzer, Daniel Menasché, Barbara Russo, Andrea Janes, Vincenzo Ferme, André Van Hoorn, Henning Schulz, Cabral Lima

Abstract: Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatur… ▽ More Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatures and system scalability. The proposed approach combines analytical modeling and load testing to find optimal configurations for the signature-based IDS. We apply a heavy-tail bi-modal modeling approach, where "long" jobs represent large resource consuming transactions, e.g., generated by DDoS attacks; the model was parametrized using results obtained from controlled experiments. For performance purposes, mean response time is the key metric to be minimized, whereas for security purposes, response time variance and classification accuracy must be taken into account. The key insights from our analysis are: (i) there is an optimal number of servers which minimizes the response time variance, (ii) the sweet-spot number of servers that minimizes response time variance and maximizes classification accuracy is typically smaller than or equal to the one that minimizes mean response time. Therefore, for security purposes, it may be worth slightly sacrificing performance to increase classification accuracy. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Journal ref: IEEE Access, vol. 8, p.198152-198167, 2020

arXiv:2010.03012 [pdf, other]

doi 10.1109/DLS51937.2020.00008

Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

Authors: Bita Hasheminezhad, Shahrzad Shirzad, Nanmiao Wu, Patrick Diehl, Hannes Schulz, Hartmut Kaiser

Abstract: Although recent scaling up approaches to training deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets, require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the preliminary designs of most available d… ▽ More Although recent scaling up approaches to training deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets, require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the preliminary designs of most available distributed deep learning frameworks, and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx offers a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system. △ Less

Submitted 19 April, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

arXiv:2010.00907 [pdf, other]

doi 10.1007/s11548-022-02621-3

Tubular Shape Aware Data Generation for Semantic Segmentation in Medical Imaging

Authors: Ilyas Sirazitdinov, Heinrich Schulz, Axel Saalbach, Steffen Renisch, Dmitry V. Dylov

Abstract: Chest X-ray is one of the most widespread examinations of the human body. In interventional radiology, its use is frequently associated with the need to visualize various tube-like objects, such as puncture needles, guiding sheaths, wires, and catheters. Detection and precise localization of these tube-like objects in the X-ray images is, therefore, of utmost value, catalyzing the development of a… ▽ More Chest X-ray is one of the most widespread examinations of the human body. In interventional radiology, its use is frequently associated with the need to visualize various tube-like objects, such as puncture needles, guiding sheaths, wires, and catheters. Detection and precise localization of these tube-like objects in the X-ray images is, therefore, of utmost value, catalyzing the development of accurate target-specific segmentation algorithms. Similar to the other medical imaging tasks, the manual pixel-wise annotation of the tubes is a resource-consuming process. In this work, we aim to alleviate the lack of the annotated images by using artificial data. Specifically, we present an approach for synthetic data generation of the tube-shaped objects, with a generative adversarial network being regularized with a prior-shape constraint. Our method eliminates the need for paired image--mask data and requires only a weakly-labeled dataset (10--20 images) to reach the accuracy of the fully-supervised models. We report the applicability of the approach for the task of segmenting tubes and catheters in the X-ray images, whereas the results should also hold for the other imaging modalities. △ Less

Submitted 7 December, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

Journal ref: International Journal of Computer Assisted Radiology and Surgery, V. 17, pp.1091-1099, 2022

arXiv:2008.13636 [pdf, ps, other]

doi 10.5281/zenodo.4009114

HL-LHC Computing Review: Common Tools and Community Software

Authors: HEP Software Foundation, :, Thea Aarrestad, Simone Amoroso, Markus Julian Atkinson, Joshua Bendavid, Tommaso Boccali, Andrea Bocci, Andy Buckley, Matteo Cacciari, Paolo Calafiura, Philippe Canal, Federico Carminati, Taylor Childers, Vitaliano Ciulli, Gloria Corti, Davide Costanzo, Justin Gage Dezoort, Caterina Doglioni, Javier Mauricio Duarte, Agnieszka Dziurda, Peter Elmer, Markus Elsing, V. Daniel Elvira, Giulio Eulisse , et al. (85 additional authors not shown)

Abstract: Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this doc… ▽ More Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful. △ Less

Submitted 31 August, 2020; originally announced August 2020.

Comments: 40 pages contribution to Snowmass 2021

Report number: HSF-DOC-2020-01

arXiv:2007.09150 [pdf, other]

doi 10.1016/j.cpc.2020.107813

ULYSSES: Universal LeptogeneSiS Equation Solver

Authors: Alessandro Granelli, Kristian Moffat, Yuber Perez-Gonzalez, Holger Schulz, Jessica Turner

Abstract: ULYSSES is a python package that calculates the baryon asymmetry produced from leptogenesis in the context of a type-I seesaw mechanism. The code solves the semi-classical Boltzmann equations for points in the model parameter space as specified by the user. We provide a selection of predefined Boltzmann equations as well as a plugin mechanism for externally provided models of leptogenesis. Further… ▽ More ULYSSES is a python package that calculates the baryon asymmetry produced from leptogenesis in the context of a type-I seesaw mechanism. The code solves the semi-classical Boltzmann equations for points in the model parameter space as specified by the user. We provide a selection of predefined Boltzmann equations as well as a plugin mechanism for externally provided models of leptogenesis. Furthermore, the ULYSSES code provides tools for multi-dimensional parameter space exploration. The emphasis of the code is on user flexibility and rapid evaluation. It is publicly available at https://github.com/earlyuniverse/ulysses △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 20 pages, 2 figures, 4 tables

Journal ref: Comput. Phys. Commun. 262 (2021) 107813

arXiv:2007.00015 [pdf, other]

doi 10.1103/PhysRevD.102.053010

Tau neutrinos at DUNE: new strategies, new opportunities

Authors: Pedro Machado, Holger Schulz, Jessica Turner

Abstract: We propose a novel analysis strategy, that leverages the unique capabilities of the DUNE experiment, to study tau neutrinos. We integrate collider physics ideas, such as jet clustering algorithms in combination with machine learning techniques, into neutrino measurements. Through the construction of a set of observables and kinematic cuts, we obtain a superior discrimination of the signal ($S$) ov… ▽ More We propose a novel analysis strategy, that leverages the unique capabilities of the DUNE experiment, to study tau neutrinos. We integrate collider physics ideas, such as jet clustering algorithms in combination with machine learning techniques, into neutrino measurements. Through the construction of a set of observables and kinematic cuts, we obtain a superior discrimination of the signal ($S$) over the background ($B$). In a single year, using the nominal neutrino beam mode, DUNE may achieve $S/\sqrt{B}$ of $3.3$ and $2.3$ for the hadronic and leptonic decay channels of the tau respectively. Operating in the tau-optimized beam mode would increase $S/\sqrt{B}$ to $8.8$ and $11$ for each of these channels. We premier the use of the analysis software Rivet, a tool ubiquitously used by the LHC experiments, in neutrino physics. For wider accessibility, we provide our analysis code. △ Less

Submitted 14 August, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

Comments: 10 pages, 9 figures. Figure labels enlarged, expanded captions and included url for code in references. Matches PRD accepted version

Journal ref: Phys. Rev. D 102, 053010 (2020)

arXiv:2006.13265 [pdf, other]

doi 10.1109/ACCESS.2021.3107163

Anomaly Detection in Medical Imaging with Deep Perceptual Autoencoders

Authors: Nina Shvetsova, Bart Bakker, Irina Fedulova, Heinrich Schulz, Dmitry V. Dylov

Abstract: Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in recognizing image anomalies, these methods still prove incapable of handling complex medical images, such as barely visible abnormalities in chest X-rays and metastases in lymph nodes. To address this problem, we introduce a new powerful method of i… ▽ More Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in recognizing image anomalies, these methods still prove incapable of handling complex medical images, such as barely visible abnormalities in chest X-rays and metastases in lymph nodes. To address this problem, we introduce a new powerful method of image anomaly detection. It relies on the classical autoencoder approach with a re-designed training pipeline to handle high-resolution, complex images and a robust way of computing an image abnormality score. We revisit the very problem statement of fully unsupervised anomaly detection, where no abnormal examples at all are provided during the model setup. We propose to relax this unrealistic assumption by using a very small number of anomalies of confined variability merely to initiate the search of hyperparameters of the model. We evaluate our solution on natural image datasets with a known benchmark, as well as on two medical datasets containing radiology and digital pathology images. The proposed approach suggests a new strong baseline for image anomaly detection and outperforms state-of-the-art approaches in complex medical image analysis tasks. △ Less

Submitted 13 September, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: The final authenticated publication is available online at https://ieeexplore.ieee.org/abstract/document/9521238

Journal ref: IEEE Access, vol. 9, pp. 118571-118583, 2021

arXiv:2004.13687 [pdf, other]

doi 10.1007/s41781-021-00055-1

Challenges in Monte Carlo event generator software for High-Luminosity LHC

Authors: The HSF Physics Event Generator WG, :, Andrea Valassi, Efe Yazgan, Josh McFayden, Simone Amoroso, Joshua Bendavid, Andy Buckley, Matteo Cacciari, Taylor Childers, Vitaliano Ciulli, Rikkert Frederix, Stefano Frixione, Francesco Giuli, Alexander Grohsjean, Christian Gütschow, Stefan Höche, Walter Hopkins, Philip Ilten, Dmitri Konstantinov, Frank Krauss, Qiang Li, Leif Lönnblad, Fabio Maltoni, Michelangelo Mangano , et al. (16 additional authors not shown)

Abstract: We review the main software and computing challenges for the Monte Carlo physics event generators used by the LHC experiments, in view of the High-Luminosity LHC (HL-LHC) physics programme. This paper has been prepared by the HEP Software Foundation (HSF) Physics Event Generator Working Group as an input to the LHCC review of HL-LHC computing, which has started in May 2020. We review the main software and computing challenges for the Monte Carlo physics event generators used by the LHC experiments, in view of the High-Luminosity LHC (HL-LHC) physics programme. This paper has been prepared by the HEP Software Foundation (HSF) Physics Event Generator Working Group as an input to the LHCC review of HL-LHC computing, which has started in May 2020. △ Less

Submitted 18 February, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: 20 pages; editors Andrea Valassi, Efe Yazgan and Josh McFayden; addressed additional comments by journal reviewers

Report number: CERN-LPCC-2020-002; FERMILAB-PUB-20-183-SCD-T; MCNET-20-15

Journal ref: Comput Softw Big Sci 5, 12 (2021)

arXiv:2003.01680 [pdf, other]

Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation

Authors: Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz

Abstract: Domain adaptation has recently become a key problem in dialogue systems research. Deep learning, while being the preferred technique for modeling such systems, works best given massive training data. However, in the real-world scenario, such resources aren't available for every new domain, so the ability to train with a few dialogue examples can be considered essential. Pre-training on large data… ▽ More Domain adaptation has recently become a key problem in dialogue systems research. Deep learning, while being the preferred technique for modeling such systems, works best given massive training data. However, in the real-world scenario, such resources aren't available for every new domain, so the ability to train with a few dialogue examples can be considered essential. Pre-training on large data sources and adapting to the target data has become the standard method for few-shot problems within the deep learning framework. In this paper, we present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset. Robust and diverse in response generation, our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset. △ Less

Submitted 6 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: Presented at DSTC8@AAAI 2020

ACM Class: I.2.7

arXiv:2002.07858 [pdf, other]

Grid-based minimization at scale: Feldman-Cousins corrections for SBN

Authors: Holger Schulz, Marianette Wospakrik, Mark Ross-Lonergan, Guanqun Ge, Saba Sehrish, Marc Paterno, Jim Kowalkowski, Wes Ketchum, Georgia Karagiorgi

Abstract: We present a computational model for the construction of Feldman-Cousins (FC) corrections frequently used in High Energy Physics (HEP) analysis. The program contains a grid-based minimization and is written in C++. Our algorithms exploit vectorization through Eigen3, yielding a single-core speed-up of 350 compared to the original implementation, and achieve MPI data parallelism by using DIY. We de… ▽ More We present a computational model for the construction of Feldman-Cousins (FC) corrections frequently used in High Energy Physics (HEP) analysis. The program contains a grid-based minimization and is written in C++. Our algorithms exploit vectorization through Eigen3, yielding a single-core speed-up of 350 compared to the original implementation, and achieve MPI data parallelism by using DIY. We demonstrate the application to scale very well at High Performance Computing (HPC) sites. We use HDF5 in conjunction with HighFive to write results of the calculation to file. △ Less

Submitted 18 February, 2020; originally announced February 2020.

Report number: FERMILAB-PUB-20-069-SCD

arXiv:2001.10028 [pdf, other]

doi 10.1103/PhysRevD.101.076002

Event Generation with Normalizing Flows

Authors: Christina Gao, Stefan Hoeche, Joshua Isaacson, Claudius Krause, Holger Schulz

Abstract: We present a novel integrator based on normalizing flows which can be used to improve the unweighting efficiency of Monte-Carlo event generators for collider physics simulations. In contrast to machine learning approaches based on surrogate models, our method generates the correct result even if the underlying neural networks are not optimally trained. We exemplify the new strategy using the examp… ▽ More We present a novel integrator based on normalizing flows which can be used to improve the unweighting efficiency of Monte-Carlo event generators for collider physics simulations. In contrast to machine learning approaches based on surrogate models, our method generates the correct result even if the underlying neural networks are not optimally trained. We exemplify the new strategy using the example of Drell-Yan type processes at the LHC, both at leading and partially at next-to-leading order QCD. △ Less

Submitted 20 April, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

Comments: 9 pages, 2 figures, 2 tables; v2: matches Journal version

Report number: FERMILAB-PUB-20-009-SCD-T, MCNET-20-03

Journal ref: Phys. Rev. D 101, 076002 (2020)

arXiv:2001.05922 [pdf, other]

Continual Learning for Domain Adaptation in Chest X-ray Classification

Authors: Matthias Lenga, Heinrich Schulz, Axel Saalbach

Abstract: Over the last years, Deep Learning has been successfully applied to a broad range of medical applications. Especially in the context of chest X-ray classification, results have been reported which are on par, or even superior to experienced radiologists. Despite this success in controlled experimental environments, it has been noted that the ability of Deep Learning models to generalize to data fr… ▽ More Over the last years, Deep Learning has been successfully applied to a broad range of medical applications. Especially in the context of chest X-ray classification, results have been reported which are on par, or even superior to experienced radiologists. Despite this success in controlled experimental environments, it has been noted that the ability of Deep Learning models to generalize to data from a new domain (with potentially different tasks) is often limited. In order to address this challenge, we investigate techniques from the field of Continual Learning (CL) including Joint Training (JT), Elastic Weight Consolidation (EWC) and Learning Without Forgetting (LWF). Using the ChestX-ray14 and the MIMIC-CXR datasets, we demonstrate empirically that these methods provide promising options to improve the performance of Deep Learning models on a target domain and to mitigate effectively catastrophic forgetting for the source domain. To this end, the best overall performance was obtained using JT, while for LWF competitive results could be achieved - even without accessing data from the source domain. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Journal ref: Proceedings of the Third Conference on Medical Imaging with Deep Learning, PMLR 121:413-423, 2020

arXiv:2001.03073 [pdf, other]

Technical Proposal: FASERnu

Authors: FASER Collaboration, Henso Abreu, Marco Andreini, Claire Antel, Akitaka Ariga, Tomoko Ariga, Caterina Bertone, Jamie Boyd, Andy Buckley, Franck Cadoux, David W. Casper, Francesco Cerutti, Xin Chen, Andrea Coccaro, Salvatore Danzeca, Liam Dougherty, Candan Dozen, Peter B. Denton, Yannick Favre, Deion Fellers, Jonathan L. Feng, Didier Ferrere, Jonathan Gall, Iftah Galon, Stephen Gibson , et al. (47 additional authors not shown)

Abstract: FASERnu is a proposed small and inexpensive emulsion detector designed to detect collider neutrinos for the first time and study their properties. FASERnu will be located directly in front of FASER, 480 m from the ATLAS interaction point along the beam collision axis in the unused service tunnel TI12. From 2021-23 during Run 3 of the 14 TeV LHC, roughly 1,300 electron neutrinos, 20,000 muon neutri… ▽ More FASERnu is a proposed small and inexpensive emulsion detector designed to detect collider neutrinos for the first time and study their properties. FASERnu will be located directly in front of FASER, 480 m from the ATLAS interaction point along the beam collision axis in the unused service tunnel TI12. From 2021-23 during Run 3 of the 14 TeV LHC, roughly 1,300 electron neutrinos, 20,000 muon neutrinos, and 20 tau neutrinos will interact in FASERnu with TeV-scale energies. With the ability to observe these interactions, reconstruct their energies, and distinguish flavors, FASERnu will probe the production, propagation, and interactions of neutrinos at the highest human-made energies ever recorded. The FASERnu detector will be composed of 1000 emulsion layers interleaved with tungsten plates. The total volume of the emulsion and tungsten is 25cm x 25cm x 1.35m, and the tungsten target mass is 1.2 tonnes. From 2021-23, 7 sets of emulsion layers will be installed, with replacement roughly every 20-50 1/fb in planned Technical Stops. In this document, we summarize FASERnu's physics goals and discuss the estimates of neutrino flux and interaction rates. We then describe the FASERnu detector in detail, including plans for assembly, transport, installation, and emulsion replacement, and procedures for emulsion readout and analyzing the data. We close with cost estimates for the detector components and infrastructure work and a timeline for the experiment. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 49 pages, 25 figures; submitted to the CERN LHCC on 28 October 2019

Report number: CERN-LHCC-2019-017, LHCC-P-015, UCI-TR-2019-25

arXiv:1912.05451 [pdf, other]

doi 10.21468/SciPostPhys.8.2.026

Robust Independent Validation of Experiment and Theory: Rivet version 3

Authors: C. Bierlich, A. Buckley, J. M. Butterworth, C. H. Christensen, L. Corpe, D. Grellscheid, J. F. Grosse-Oetringhaus, C. Gutschow, P. Karczmarczyk, J. Klein, L. Lonnblad, C. S. Pollard, P. Richardson, H. Schulz, F. Siegert

Abstract: First released in 2010, the Rivet library forms an important repository for analysis code, facilitating comparisons between measurements of the final state in particle collisions and theoretical calculations of those final states. We give an overview of Rivet's current design and implementation, its uptake for analysis preservation and physics results, and summarise recent developments including p… ▽ More First released in 2010, the Rivet library forms an important repository for analysis code, facilitating comparisons between measurements of the final state in particle collisions and theoretical calculations of those final states. We give an overview of Rivet's current design and implementation, its uptake for analysis preservation and physics results, and summarise recent developments including propagation of MC systematic-uncertainty weights, heavy-ion and $ep$ physics, and systems for detector emulation. In addition, we provide a short user guide that supplements and updates the Rivet user manual. △ Less

Submitted 6 February, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

Report number: MCnet-19-26

Journal ref: SciPost Phys. 8, 026 (2020)

arXiv:1912.02272 [pdf, other]

doi 10.1016/j.cpc.2020.107663

Multivariate Rational Approximation

Authors: Anthony P. Austin, Mohan Krishnamoorthy, Sven Leyffer, Stephen Mrenna, Juliane Muller, Holger Schulz

Abstract: We present two approaches for computing rational approximations to multivariate functions, motivated by their effectiveness as surrogate models for high-energy physics (HEP) applications. Our first approach builds on the Stieltjes process to efficiently and robustly compute the coefficients of the rational approximation. Our second approach is based on an optimization formulation that allows us to… ▽ More We present two approaches for computing rational approximations to multivariate functions, motivated by their effectiveness as surrogate models for high-energy physics (HEP) applications. Our first approach builds on the Stieltjes process to efficiently and robustly compute the coefficients of the rational approximation. Our second approach is based on an optimization formulation that allows us to include structural constraints on the rational approximation, resulting in a semi-infinite optimization problem that we solve using an outer approximation approach. We present results for synthetic and real-life HEP data, and we compare the approximation quality of our approaches with that of traditional polynomial approximations. △ Less

Submitted 2 December, 2019; originally announced December 2019.

MSC Class: 2010 MSC: 41A20; 41A63; 65D15

arXiv:1911.06394 [pdf, other]

The Eighth Dialog System Technology Challenge

Authors: Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sung** Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, **chao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta

Abstract: This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and eval… ▽ More This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and evaluation set-up for each track. We also summarize the results of the submitted systems to highlight the overall trends of the state-of-the-art technologies for the tasks. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: Submitted to NeurIPS 2019 3rd Conversational AI Workshop

arXiv:1909.08942 [pdf, other]

Synthetic CT Generation from MRI Using Improved DualGAN

Authors: Denis Prokopenko, Joël Valentin Stadelmann, Heinrich Schulz, Steffen Renisch, Dmitry V. Dylov

Abstract: Synthetic CT image generation from MRI scan is necessary to create radiotherapy plans without the need of co-registered MRI and CT scans. The chosen baseline adversarial model with cycle consistency permits unpaired image-to-image translation. Perceptual loss function term and coordinate convolutional layer were added to improve the quality of translated images. The proposed architecture was teste… ▽ More Synthetic CT image generation from MRI scan is necessary to create radiotherapy plans without the need of co-registered MRI and CT scans. The chosen baseline adversarial model with cycle consistency permits unpaired image-to-image translation. Perceptual loss function term and coordinate convolutional layer were added to improve the quality of translated images. The proposed architecture was tested on paired MRI-CT dataset, where the synthetic CTs were compared to corresponding original CT images. The MAE between the synthetic CT images and the real CT scans is 61 HU computed inside of the true CTs body shape. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Report number: MIDL/2019/ExtendedAbstract/S1em7ZOkFN

arXiv:1906.01906 [pdf, other]

Combining crowd-sourcing and deep learning to explore the meso-scale organization of shallow convection

Authors: Stephan Rasp, Hauke Schulz, Sandrine Bony, Bjorn Stevens

Abstract: Humans excel at detecting interesting patterns in images, for example those taken from satellites. This kind of anecdotal evidence can lead to the discovery of new phenomena. However, it is often difficult to gather enough data of subjective features for significant analysis. This paper presents an example of how two tools that have recently become accessible to a wide range of researchers, crowd-… ▽ More Humans excel at detecting interesting patterns in images, for example those taken from satellites. This kind of anecdotal evidence can lead to the discovery of new phenomena. However, it is often difficult to gather enough data of subjective features for significant analysis. This paper presents an example of how two tools that have recently become accessible to a wide range of researchers, crowd-sourcing and deep learning, can be combined to explore satellite imagery at scale. In particular, the focus is on the organization of shallow cumulus convection in the trade wind regions. Shallow clouds play a large role in the Earth's radiation balance yet are poorly represented in climate models. For this project four subjective patterns of organization were defined: Sugar, Flower, Fish and Gravel. On cloud labeling days at two institutes, 67 scientists screened 10,000 satellite images on a crowd-sourcing platform and classified almost 50,000 mesoscale cloud clusters. This dataset is then used as a training dataset for deep learning algorithms that make it possible to automate the pattern detection and create global climatologies of the four patterns. Analysis of the geographical distribution and large-scale environmental conditions indicates that the four patterns have some overlap with established modes of organization, such as open and closed cellular convection, but also differ in important ways. The results and dataset from this project suggests promising research questions. Further, this study illustrates that crowd-sourcing and deep learning complement each other well for the exploration of image datasets. △ Less

Submitted 21 April, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

arXiv:1905.09127 [pdf, other]

doi 10.21468/SciPostPhys.7.3.034

Event Generation with Sherpa 2.2

Authors: Enrico Bothmann, Gurpreet Singh Chahal, Stefan Höche, Johannes Krause, Frank Krauss, Silvan Kuttimalai, Sebastian Liebschner, Davide Napoletano, Marek Schönherr, Holger Schulz, Steffen Schumann, Frank Siegert

Abstract: Sherpa is a general-purpose Monte Carlo event generator for the simulation of particle collisions in high-energy collider experiments. We summarize essential features and improvements of the Sherpa 2.2 release series, which is heavily used for event generation in the analysis and interpretation of LHC Run 1 and Run 2 data. We highlight a decade of developments towards ever higher precision in the… ▽ More Sherpa is a general-purpose Monte Carlo event generator for the simulation of particle collisions in high-energy collider experiments. We summarize essential features and improvements of the Sherpa 2.2 release series, which is heavily used for event generation in the analysis and interpretation of LHC Run 1 and Run 2 data. We highlight a decade of developments towards ever higher precision in the simulation of particle-collision events. △ Less

Submitted 3 September, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: 39 pages, 18 figures, extended discussion

Report number: FERMILAB-PUB-19-218-T, SLAC-PUB-17433, IPPP/19/42, MCNET-19-11

Journal ref: SciPost Phys. 7, 034 (2019)

arXiv:1905.05120 [pdf, other]

doi 10.1103/PhysRevD.100.014024

Simulation of vector boson plus many jet final states at the high luminosity LHC

Authors: Stefan Höche, Stefan Prestel, Holger Schulz

Abstract: We present a novel event generation framework for the efficient simulation of vector boson plus multi-jet backgrounds at the high-luminosity LHC and at possible future hadron colliders. MPI parallelization of parton-level and particle-level event generation and storage of parton-level event information using the HDF5 data format allow us to obtain leading-order merged Monte-Carlo predictions with… ▽ More We present a novel event generation framework for the efficient simulation of vector boson plus multi-jet backgrounds at the high-luminosity LHC and at possible future hadron colliders. MPI parallelization of parton-level and particle-level event generation and storage of parton-level event information using the HDF5 data format allow us to obtain leading-order merged Monte-Carlo predictions with up to nine jets in the final state. The parton-level event samples generated in this manner correspond to an integrated luminosity of 3ab-1 and are made publicly available for future phenomenological studies. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: 10 pages, 8 figures, 5 tables

Report number: FERMILAB-PUB-19-192-T, LU-TP 19-14, MCNET-19-09

Journal ref: Phys. Rev. D 100, 014024 (2019)

arXiv:1904.08650 [pdf, other]

doi 10.1137/19M1257226

Efficient Techniques for Shape Optimization with Variational Inequalities using Adjoints

Authors: Daniel Luft, Volker H. Schulz, Kathrin Welker

Abstract: In general, standard necessary optimality conditions cannot be formulated in a straightforward manner for semi-smooth shape optimization problems. In this paper, we consider shape optimization problems constrained by variational inequalities of the first kind, so-called obstacle-type problems. Under appropriate assumptions, we prove existence of adjoints for regularized problems and convergence to… ▽ More In general, standard necessary optimality conditions cannot be formulated in a straightforward manner for semi-smooth shape optimization problems. In this paper, we consider shape optimization problems constrained by variational inequalities of the first kind, so-called obstacle-type problems. Under appropriate assumptions, we prove existence of adjoints for regularized problems and convergence to limiting objects of the unregularized problem. Moreover, we derive existence and closed form of shape derivatives for the regularized problem and prove convergence to a limit object. Based on this analysis, an efficient optimization algorithm is devised and tested numerically. △ Less

Submitted 29 January, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

MSC Class: 65K15; 49Q10; 49M29; 35Q93; 35J86; 49J40

Journal ref: SIAM J. Optim. 30(3) 1922-1953

arXiv:1812.07617 [pdf, other]

Towards Deep Conversational Recommendations

Authors: Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal

Abstract: There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendation is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goal-driven dialogue that often transforms naturally into more free-form chat. This paper provides two contributions. First, unt… ▽ More There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendation is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goal-driven dialogue that often transforms naturally into more free-form chat. This paper provides two contributions. First, until now there has been no publicly available large-scale dataset consisting of real-world dialogues centered around recommendations. To address this issue and to facilitate our exploration here, we have collected ReDial, a dataset consisting of over 10,000 conversations centered around the theme of providing movie recommendations. We make this data available to the community for further research. Second, we use this dataset to explore multiple facets of conversational recommendations. In particular we explore new neural architectures, mechanisms, and methods suitable for composing conversational recommendation systems. Our dataset allows us to systematically probe model sub-components addressing different parts of the overall problem domain ranging from: sentiment analysis and cold-start recommendation generation to detailed aspects of how natural language is used in this setting in the real world. We combine such sub-components into a full-blown dialogue system and examine its behavior. △ Less

Submitted 4 March, 2019; v1 submitted 18 December, 2018; originally announced December 2018.

Comments: 17 pages, 5 figures, Accepted at 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada

arXiv:1812.07023 [pdf, other]

From FiLM to Video: Multi-turn Question Answering with Multi-modal Context

Authors: Dat Tien Nguyen, Shikhar Sharma, Hannes Schulz, Layla El Asri

Abstract: Understanding audio-visual content and the ability to have an informative conversation about it have both been challenging areas for intelligent systems. The Audio Visual Scene-aware Dialog (AVSD) challenge, organized as a track of the Dialog System Technology Challenge 7 (DSTC7), proposes a combined task, where a system has to answer questions pertaining to a video given a dialogue with previous… ▽ More Understanding audio-visual content and the ability to have an informative conversation about it have both been challenging areas for intelligent systems. The Audio Visual Scene-aware Dialog (AVSD) challenge, organized as a track of the Dialog System Technology Challenge 7 (DSTC7), proposes a combined task, where a system has to answer questions pertaining to a video given a dialogue with previous question-answer pairs and the video itself. We propose for this task a hierarchical encoder-decoder model which computes a multi-modal embedding of the dialogue context. It first embeds the dialogue history using two LSTMs. We extract video and audio frames at regular intervals and compute semantic features using pre-trained I3D and VGGish models, respectively. Before summarizing both modalities into fixed-length vectors using LSTMs, we use FiLM blocks to condition them on the embeddings of the current question, which allows us to reduce the dimensionality considerably. Finally, we use an LSTM decoder that we train with scheduled sampling and evaluate using beam search. Compared to the modality-fusing baseline model released by the AVSD challenge organizers, our model achieves a relative improvements of more than 16%, scoring 0.36 BLEU-4 and more than 33%, scoring 0.997 CIDEr. △ Less

Submitted 17 December, 2018; originally announced December 2018.

Comments: Accepted for an Oral presentation at the DSTC7 workshop at AAAI 2019

arXiv:1811.09845 [pdf, other]

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

Authors: Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor

Abstract: Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly mo… ▽ More Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly more challenging than one-step generation tasks, as such a system must understand the contents of its generated images with respect to the feedback history, the current feedback, as well as the interactions among concepts present in the feedback history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, and apply simple transformations to existing objects. We believe our approach is an important step toward interactive generation. Code and data is available at: https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/ . △ Less

Submitted 23 September, 2019; v1 submitted 24 November, 2018; originally announced November 2018.

Comments: Accepted at ICCV 2019

Journal ref: Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV)

Showing 1–50 of 137 results for author: Schulz, H