-
Measurement Uncertainty: Relating the uncertainties of physical and virtual measurements
Authors:
Simon Cramer,
Tobias Müller,
Robert H. Schmitt
Abstract:
In the context of industrially mass-manufactured products, quality management is based on physically inspecting a small sample from a large batch and reasoning about the batch's quality conformance. When complementing physical inspections with predictions from machine learning models, it is crucial that the uncertainty of the prediction is known. Otherwise, the application of established quality m…
▽ More
In the context of industrially mass-manufactured products, quality management is based on physically inspecting a small sample from a large batch and reasoning about the batch's quality conformance. When complementing physical inspections with predictions from machine learning models, it is crucial that the uncertainty of the prediction is known. Otherwise, the application of established quality management concepts is not legitimate. Deterministic (machine learning) models lack quantification of their predictive uncertainty and are therefore unsuitable. Probabilistic (machine learning) models provide a predictive uncertainty along with the prediction. However, a concise relationship is missing between the measurement uncertainty of physical inspections and the predictive uncertainty of probabilistic models in their application in quality management. Here, we show how the predictive uncertainty of probabilistic (machine learning) models is related to the measurement uncertainty of physical inspections. This enables the use of probabilistic models for virtual inspections and integrates them into existing quality management concepts. Thus, we can provide a virtual measurement for any quality characteristic based on the process data and achieve a 100 percent inspection rate. In the field of Predictive Quality, the virtual measurement is of great interest. Based on our results, physical inspections with a low sampling rate can be accompanied by virtual measurements that allow an inspection rate of 100 percent. We add substantial value, especially to complex process chains, as faulty products/parts are identified promptly and upcoming process steps can be aborted.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Multi-Tracer Groundwater Dating in Southern Oman using Bayesian Modelling
Authors:
Viola Rädle,
Arne Kersting,
Maximilian Schmidt,
Lisa Ringena,
Julian Robertz,
Werner Aeschbach,
Markus Oberthaler,
Thomas Müller
Abstract:
In the scope of assessing aquifer systems in areas where freshwater is scarce, estimation of transit times is a vital step to quantify the effect of groundwater abstraction. Transit time distributions of different shapes, mean residence times, and contributions are used to represent the hydrogeological conditions in aquifer systems and are typically inferred from measured tracer concentrations by…
▽ More
In the scope of assessing aquifer systems in areas where freshwater is scarce, estimation of transit times is a vital step to quantify the effect of groundwater abstraction. Transit time distributions of different shapes, mean residence times, and contributions are used to represent the hydrogeological conditions in aquifer systems and are typically inferred from measured tracer concentrations by inverse modeling. In this study, a multi-tracer sampling campaign was conducted in the Salalah Plain in Southern Oman including CFCs, SF6, 39Ar, 14C, and 4He. Based on the data of three tracers, a two-component Dispersion Model (DMmix) and a nonparametric model with six age bins were assumed and evaluated using Bayesian statistics. In a Markov Chain Monte Carlo approach, the maximum likelihood parameter estimates and their uncertainties were determined. Model performance was assessed using Bayes factor and leave-one-out cross-validation. Both models suggest that the groundwater in the Salalah Plain is composed of a very young component below 30 yr and a very old component beyond 1,000 yr, with the nonparametric model performing slightly better than the DMmix model. All wells except one exhibit reasonable goodness of fit. Our results support the relevance of Bayesian modeling in hydrology and the potential of nonparametric models for an adequate representation of aquifer dynamics.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
A Deep Variational Approach to Clustering Survival Data
Authors:
Laura Manduchi,
Ričards Marcinkevičs,
Michela C. Massi,
Thomas Weikert,
Alexander Sauter,
Verena Gotta,
Timothy Müller,
Flavio Vasella,
Marian C. Neidert,
Marc Pfister,
Bram Stieltjes,
Julia E. Vogt
Abstract:
In this work, we study the problem of clustering survival data $-$ a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of bo…
▽ More
In this work, we study the problem of clustering survival data $-$ a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and censored survival times. We compare our model to the related work on clustering and mixture models for survival data in comprehensive experiments on a wide range of synthetic, semi-synthetic, and real-world datasets, including medical imaging data. Our method performs better at identifying clusters and is competitive at predicting survival times. Relying on novel generative assumptions, the proposed model offers a holistic perspective on clustering survival data and holds a promise of discovering subpopulations whose survival is regulated by different generative mechanisms.
△ Less
Submitted 10 March, 2022; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Neural Control Variates
Authors:
Thomas Müller,
Fabrice Rousselle,
Jan Novák,
Alexander Keller
Abstract:
We propose neural control variates (NCV) for unbiased variance reduction in parametric Monte Carlo integration. So far, the core challenge of applying the method of control variates has been finding a good approximation of the integrand that is cheap to integrate. We show that a set of neural networks can face that challenge: a normalizing flow that approximates the shape of the integrand and anot…
▽ More
We propose neural control variates (NCV) for unbiased variance reduction in parametric Monte Carlo integration. So far, the core challenge of applying the method of control variates has been finding a good approximation of the integrand that is cheap to integrate. We show that a set of neural networks can face that challenge: a normalizing flow that approximates the shape of the integrand and another neural network that infers the solution of the integral equation. We also propose to leverage a neural importance sampler to estimate the difference between the original integrand and the learned control variate. To optimize the resulting parametric estimator, we derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. When applied to light transport simulation, neural control variates are capable of matching the state-of-the-art performance of other unbiased approaches, while providing means to develop more performant, practical solutions. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
△ Less
Submitted 4 September, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
How a minimal learning agent can infer the existence of unobserved variables in a complex environment
Authors:
Katja Ried,
Benjamin Eva,
Thomas Müller,
Hans J. Briegel
Abstract:
According to a mainstream position in contemporary cognitive science and philosophy, the use of abstract compositional concepts is both a necessary and a sufficient condition for the presence of genuine thought. In this article, we show how the ability to develop and utilise abstract conceptual structures can be achieved by a particular kind of learning agents. More specifically, we provide and mo…
▽ More
According to a mainstream position in contemporary cognitive science and philosophy, the use of abstract compositional concepts is both a necessary and a sufficient condition for the presence of genuine thought. In this article, we show how the ability to develop and utilise abstract conceptual structures can be achieved by a particular kind of learning agents. More specifically, we provide and motivate a concrete operational definition of what it means for these agents to be in possession of abstract concepts, before presenting an explicit example of a minimal architecture that supports this capability. We then proceed to demonstrate how the existence of abstract conceptual structures can be operationally useful in the process of employing previously acquired knowledge in the face of new experiences, thereby vindicating the natural conjecture that the cognitive functions of abstraction and generalisation are closely related.
Keywords: concept formation, projective simulation, reinforcement learning, transparent artificial intelligence, theory formation, explainable artificial intelligence (XAI)
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
DLGNet: A Transformer-based Model for Dialogue Response Generation
Authors:
Oluwatobi Olabiyi,
Erik T. Mueller
Abstract:
Neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues can attributed to reasons including (1) short-range model architectures that capture limited temporal dependencies, (2) limitations of the maximum likelihood training objective, (3) the concave entropy profile of dialogue datasets r…
▽ More
Neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues can attributed to reasons including (1) short-range model architectures that capture limited temporal dependencies, (2) limitations of the maximum likelihood training objective, (3) the concave entropy profile of dialogue datasets resulting in short and generic responses, and (4) the out-of-vocabulary problem leading to generation of a large number of <UNK> tokens. On the other hand, transformer-based models such as GPT-2 have demonstrated an excellent ability to capture long-range structures in language modeling tasks. In this paper, we present DLGNet, a transformer-based model for dialogue modeling. We specifically examine the use of DLGNet for multi-turn dialogue response generation. In our experiments, we evaluate DLGNet on the open-domain Movie Triples dataset and the closed-domain Ubuntu Dialogue dataset. DLGNet models, although trained with only the maximum likelihood objective, achieve significant improvements over state-of-the-art multi-turn dialogue models. They also produce best performance to date on the two datasets based on several metrics, including BLEU, ROUGE, and distinct n-gram. Our analysis shows that the performance improvement is mostly due to the combination of (1) the long-range transformer architecture with (2) the injection of random informative paddings. Other contributing factors include the joint modeling of dialogue context and response, and the 100% tokenization coverage from the byte pair encoding (BPE).
△ Less
Submitted 4 September, 2019; v1 submitted 26 July, 2019;
originally announced August 2019.
-
A Persona-based Multi-turn Conversation Model in an Adversarial Learning Framework
Authors:
Oluwatobi O. Olabiyi,
Anish Khazane,
Erik T. Mueller
Abstract:
In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to multi-turn dialogue by modifying the state-of-the-art hredGAN architecture. To achieve this, we introduce an additional input modality into the encoder and decoder of hredGAN to capture other attributes such as speaker identity, location, sub-topics, and other external attributes that mig…
▽ More
In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to multi-turn dialogue by modifying the state-of-the-art hredGAN architecture. To achieve this, we introduce an additional input modality into the encoder and decoder of hredGAN to capture other attributes such as speaker identity, location, sub-topics, and other external attributes that might be available from the corpus of human-to-human interactions. The resulting persona hredGAN ($phredGAN$) shows better performance than both the existing persona-based Seq2Seq and hredGAN models when those external attributes are available in a multi-turn dialogue corpus. This superiority is demonstrated on TV drama series with character consistency (such as Big Bang Theory and Friends) and customer service interaction datasets such as Ubuntu dialogue corpus in terms of perplexity, BLEU, ROUGE, and Distinct n-gram scores.
△ Less
Submitted 29 April, 2019;
originally announced May 2019.
-
An Adversarial Learning Framework For A Persona-Based Multi-Turn Dialogue Model
Authors:
Oluwatobi Olabiyi,
Anish Khazane,
Alan Salimov,
Erik T. Mueller
Abstract:
In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to a multi-turn dialogue scenario by modifying the state-of-the-art hredGAN architecture to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments and so on. The proposed system, phredGAN has a persona-based HRED generator (PHRED) and a condi…
▽ More
In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to a multi-turn dialogue scenario by modifying the state-of-the-art hredGAN architecture to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments and so on. The proposed system, phredGAN has a persona-based HRED generator (PHRED) and a conditional discriminator. We also explore two approaches to accomplish the conditional discriminator: (1) phredGAN_a, a system that passes the attribute representation as an additional input into a traditional adversarial discriminator, and (2) phredGAN_d, a dual discriminator system which in addition to the adversarial discriminator, collaboratively predicts the attribute(s) that generated the input utterance. To demonstrate the superior performance of phredGAN over the persona Seq2Seq model, we experiment with two conversational datasets, the Ubuntu Dialogue Corpus (UDC) and TV series transcripts from the Big Bang Theory and Friends. Performance comparison is made with respect to a variety of quantitative measures as well as crowd-sourced human evaluation. We also explore the trade-offs from using either variant of phredGAN on datasets with many but weak attribute modalities (such as with Big Bang Theory and Friends) and ones with few but strong attribute modalities (customer-agent interactions in Ubuntu dataset).
△ Less
Submitted 26 June, 2019; v1 submitted 29 April, 2019;
originally announced May 2019.
-
Neural Importance Sampling
Authors:
Thomas Müller,
Brian McWilliams,
Fabrice Rousselle,
Markus Gross,
Jan Novák
Abstract:
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling lay…
▽ More
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling layers. Second, we propose to preprocess the inputs of neural networks using one-blob encoding, which stimulates localization of computation and improves inference. Third, we derive a gradient-descent-based optimization for the KL and the $χ^2$ divergence for the specific application of Monte Carlo integration with unnormalized stochastic estimates of the target distribution. Our approach enables fast and accurate inference and efficient sample generation independently of the dimensionality of the integration domain. We show its benefits on generating natural images and in two applications to light-transport simulation: first, we demonstrate learning of joint path-sampling densities in the primary sample space and importance sampling of multi-dimensional path prefixes thereof. Second, we use our technique to extract conditional directional densities driven by the product of incident illumination and the BSDF in the rendering equation, and we leverage the densities for path guiding. In all applications, our approach yields on-par or higher performance than competing techniques at equal sample count.
△ Less
Submitted 3 September, 2019; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Multi-turn Dialogue Response Generation in an Adversarial Learning Framework
Authors:
Oluwatobi Olabiyi,
Alan Salimov,
Anish Khazane,
Erik T. Mueller
Abstract:
We propose an adversarial learning approach for generating multi-turn dialogue responses. Our proposed framework, hredGAN, is based on conditional generative adversarial networks (GANs). The GAN's generator is a modified hierarchical recurrent encoder-decoder network (HRED) and the discriminator is a word-level bidirectional RNN that shares context and word embeddings with the generator. During in…
▽ More
We propose an adversarial learning approach for generating multi-turn dialogue responses. Our proposed framework, hredGAN, is based on conditional generative adversarial networks (GANs). The GAN's generator is a modified hierarchical recurrent encoder-decoder network (HRED) and the discriminator is a word-level bidirectional RNN that shares context and word embeddings with the generator. During inference, noise samples conditioned on the dialogue history are used to perturb the generator's latent space to generate several possible responses. The final response is the one ranked best by the discriminator. The hredGAN shows improved performance over existing methods: (1) it generalizes better than networks trained using only the log-likelihood criterion, and (2) it generates longer, more informative and more diverse responses with high utterance and topic relevance even with limited training data. This improvement is demonstrated on the Movie triples and Ubuntu dialogue datasets using both automatic and human evaluations.
△ Less
Submitted 26 June, 2019; v1 submitted 29 May, 2018;
originally announced May 2018.
-
Generalisation of structural knowledge in the hippocampal-entorhinal system
Authors:
James C. R. Whittington,
Timothy H. Muller,
Shirley Mark,
Caswell Barry,
Timothy E. J. Behrens
Abstract:
A central problem to understanding intelligence is the concept of generalisation. This allows previously learnt structure to be exploited to solve tasks in novel situations differing in their particularities. We take inspiration from neuroscience, specifically the hippocampal-entorhinal system known to be important for generalisation. We propose that to generalise structural knowledge, the represe…
▽ More
A central problem to understanding intelligence is the concept of generalisation. This allows previously learnt structure to be exploited to solve tasks in novel situations differing in their particularities. We take inspiration from neuroscience, specifically the hippocampal-entorhinal system known to be important for generalisation. We propose that to generalise structural knowledge, the representations of the structure of the world, i.e. how entities in the world relate to each other, need to be separated from representations of the entities themselves. We show, under these principles, artificial neural networks embedded with hierarchy and fast Hebbian memory, can learn the statistics of memories and generalise structural knowledge. Spatial neuronal representations mirroring those found in the brain emerge, suggesting spatial cognition is an instance of more general organising principles. We further unify many entorhinal cell types as basis functions for constructing transition graphs, and show these representations effectively utilise memories. We experimentally support model assumptions, showing a preserved relationship between entorhinal grid and hippocampal place cells across environments.
△ Less
Submitted 29 October, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Modelling collective motion based on the principle of agency
Authors:
Katja Ried,
Thomas Müller,
Hans J. Briegel
Abstract:
Collective motion is an intriguing phenomenon, especially considering that it arises from a set of simple rules governing local interactions between individuals. In theoretical models, these rules are normally \emph{assumed} to take a particular form, possibly constrained by heuristic arguments. We propose a new class of models, which describe the individuals as \emph{agents}, capable of deciding…
▽ More
Collective motion is an intriguing phenomenon, especially considering that it arises from a set of simple rules governing local interactions between individuals. In theoretical models, these rules are normally \emph{assumed} to take a particular form, possibly constrained by heuristic arguments. We propose a new class of models, which describe the individuals as \emph{agents}, capable of deciding for themselves how to act and learning from their experiences. The local interaction rules do not need to be postulated in this model, since they \emph{emerge} from the learning process. We apply this ansatz to a concrete scenario involving marching locusts, in order to model the phenomenon of density-dependent alignment. We show that our learning agent-based model can account for a Fokker-Planck equation that describes the collective motion and, most notably, that the agents can learn the appropriate local interactions, requiring no strong previous assumptions on their form. These results suggest that learning agent-based models are a powerful tool for studying a broader class of problems involving collective motion and animal agency in general.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks
Authors:
Simon Kallweit,
Thomas Müller,
Brian McWilliams,
Markus Gross,
Jan Novák
Abstract:
We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integrat…
▽ More
We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integration or diffusion theory. We approach the problem differently. Instead of simulating all light transport during rendering, we pre-learn the spatial and directional distribution of radiant flux from tens of cloud exemplars. To render a new scene, we sample visible points of the cloud and, for each, extract a hierarchical 3D descriptor of the cloud geometry with respect to the shading location and the light source. The descriptor is input to a deep neural network that predicts the radiance function for each shading configuration. We make the key observation that progressively feeding the hierarchical descriptor into the network enhances the network's ability to learn faster and predict with high accuracy while using few coefficients. We also employ a block design with residual connections to further improve performance. A GPU implementation of our method synthesizes images of clouds that are nearly indistinguishable from the reference solution within seconds interactively. Our method thus represents a viable solution for applications such as cloud design and, thanks to its temporal stability, also for high-quality production of animated content.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.
-
Biologists meet statisticians: A workshop for young scientists to foster interdisciplinary team work
Authors:
Benjamin Hofner,
Lea Vaas,
John-Philip Lawo,
Tina Müller,
Johannes Sikorski,
Dirk Repsilber
Abstract:
Life science and statistics have necessarily become essential partners. The need to plan complex, structured experiments, involving elaborated designs, and the need to analyse datasets in the era of systems biology and high throughput technologies has to build upon professional statistical expertise. On the other hand, conducting such analyses and also develo** improved or new methods, also for…
▽ More
Life science and statistics have necessarily become essential partners. The need to plan complex, structured experiments, involving elaborated designs, and the need to analyse datasets in the era of systems biology and high throughput technologies has to build upon professional statistical expertise. On the other hand, conducting such analyses and also develo** improved or new methods, also for novel kinds of data, has to build upon solid biological understanding and practise. However, the meeting of scientists of both fields is often hampered by a variety of communicative hurdles - which are based on field-specific working languages and cultural differences.
As a step towards a better mutual understanding, we developed a workshop concept bringing together young experimental biologists and statisticians, to work as pairs and learn to value each others competences and practise interdisciplinary communication in a casual atmosphere. The first implementation of our concept was a cooperation of the German Region of the International Biometrical Society and the Leibnitz Institute DSMZ-German Collection of Microorganisms and Cell Cultures (short: DSMZ), Braunschweig, Germany. We collected feedback in form of three questionnaires, oral comments, and gathered experiences for the improvement of this concept. The long-term challenge for both disciplines is the establishment of systematic schedules and strategic partnerships which use the proposed workshop concept to foster mutual understanding, to seed the necessary interdisciplinary cooperation network, and to start training the indispensable communication skills at the earliest possible phase of education.
△ Less
Submitted 28 August, 2012;
originally announced August 2012.