Search | arXiv e-print repository

Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry

Authors: Jonas Gregor Wiese, Lisa Wimmer, Theodore Papamarkou, Bernd Bischl, Stephan Günnemann, David Rügamer

Abstract: Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that… ▽ More Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2005.13629 [pdf, ps, other]

Simultaneous Diagonalization of Incomplete Matrices and Applications

Authors: Jean-Sébastien Coron, Luca Notarnicola, Gabor Wiese

Abstract: We consider the problem of recovering the entries of diagonal matrices $\{U_a\}_a$ for $a = 1,\ldots,t$ from multiple "incomplete" samples $\{W_a\}_a$ of the form $W_a=PU_aQ$, where $P$ and $Q$ are unknown matrices of low rank. We devise practical algorithms for this problem depending on the ranks of $P$ and $Q$. This problem finds its motivation in cryptanalysis: we show how to significantly impr… ▽ More We consider the problem of recovering the entries of diagonal matrices $\{U_a\}_a$ for $a = 1,\ldots,t$ from multiple "incomplete" samples $\{W_a\}_a$ of the form $W_a=PU_aQ$, where $P$ and $Q$ are unknown matrices of low rank. We devise practical algorithms for this problem depending on the ranks of $P$ and $Q$. This problem finds its motivation in cryptanalysis: we show how to significantly improve previous algorithms for solving the approximate common divisor problem and breaking CLT13 cryptographic multilinear maps. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: 16 pages

arXiv:2004.07754 [pdf, other]

Investigating Efficient Learning and Compositionality in Generative LSTM Networks

Authors: Sarah Fabi, Sebastian Otte, Jonas Gregor Wiese, Martin V. Butz

Abstract: When comparing human with artificial intelligence, one major difference is apparent: Humans can generalize very broadly from sparse data sets because they are able to recombine and reintegrate data components in compositional manners. To investigate differences in efficient learning, Joshua B. Tenenbaum and colleagues developed the character challenge: First an algorithm is trained in generating h… ▽ More When comparing human with artificial intelligence, one major difference is apparent: Humans can generalize very broadly from sparse data sets because they are able to recombine and reintegrate data components in compositional manners. To investigate differences in efficient learning, Joshua B. Tenenbaum and colleagues developed the character challenge: First an algorithm is trained in generating handwritten characters. In a next step, one version of a new type of character is presented. An efficient learning algorithm is expected to be able to re-generate this new character, to identify similar versions of this character, to generate new variants of it, and to create completely new character types. In the past, the character challenge was only met by complex algorithms that were provided with stochastic primitives. Here, we tackle the challenge without providing primitives. We apply a minimal recurrent neural network (RNN) model with one feedforward layer and one LSTM layer and train it to generate sequential handwritten character trajectories from one-hot encoded inputs. To manage the re-generation of untrained characters, when presented with only one example of them, we introduce a one-shot inference mechanism: the gradient signal is backpropagated to the feedforward layer weights only, leaving the LSTM layer untouched. We show that our model is able to meet the character challenge by recombining previously learned dynamic substructures, which are visible in the hidden LSTM states. Making use of the compositional abilities of RNNs in this way might be an important step towards bridging the gap between human and artificial intelligence. △ Less

Submitted 20 October, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: ICANN 2020

arXiv:1706.08568 [pdf, other]

Neural Question Answering at BioASQ 5B

Authors: Georg Wiese, Dirk Weissenborn, Mariana Neves

Abstract: This paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use FastQA, a state-of-the-art neural QA system. We exte… ▽ More This paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use FastQA, a state-of-the-art neural QA system. We extended it with biomedical word embeddings and changed its answer layer to be able to answer list questions in addition to factoid questions. We pre-trained the model on a large-scale open-domain QA dataset, SQuAD, and then fine-tuned the parameters on the BioASQ training set. With our approach, we achieve state-of-the-art results on factoid questions and competitive results on list questions. △ Less

Submitted 26 June, 2017; originally announced June 2017.

arXiv:1706.03610 [pdf, other]

Neural Domain Adaptation for Biomedical Question Answering

Authors: Georg Wiese, Dirk Weissenborn, Mariana Neves

Abstract: Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such as biomedicine, because datasets are generally too… ▽ More Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such as biomedicine, because datasets are generally too small to train a DL system from scratch. For example, the BioASQ dataset for biomedical QA comprises less then 900 factoid (single answer) and list (multiple answers) QA instances. In this work, we adapt a neural QA system trained on a large open-domain dataset (SQuAD, source) to a biomedical dataset (BioASQ, target) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create. Despite this fact, our systems achieve state-of-the-art results on factoid questions and competitive results on list questions. △ Less

Submitted 15 June, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

arXiv:1703.04816 [pdf, other]

Making Neural QA as Simple as Possible but not Simpler

Authors: Dirk Weissenborn, Georg Wiese, Laura Seiffe

Abstract: Recent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-to-end neural architectures for QA. Increasingly complex systems have been conceived without comparison to simpler neural baseline systems that would justify their complexity. In this work, we propose a simple heuristic that guides the development of neural baseline systems for th… ▽ More Recent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-to-end neural architectures for QA. Increasingly complex systems have been conceived without comparison to simpler neural baseline systems that would justify their complexity. In this work, we propose a simple heuristic that guides the development of neural baseline systems for the extractive QA task. We find that there are two ingredients necessary for building a high-performing neural QA system: first, the awareness of question words while processing the context and second, a composition function that goes beyond simple bag-of-words modeling, such as recurrent neural networks. Our results show that FastQA, a system that meets these two requirements, can achieve very competitive performance compared with existing models. We argue that this surprising finding puts results of previous systems and the complexity of recent QA datasets into perspective. △ Less

Submitted 8 June, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

Showing 1–6 of 6 results for author: Wiese, G