Search | arXiv e-print repository

Kalman Bayesian Neural Networks for Closed-form Online Learning

Authors: Philipp Wagner, Xinyang Wu, Marco F. Huber

Abstract: Compared to point estimates calculated by standard neural networks, Bayesian neural networks (BNN) provide probability distributions over the output predictions and model parameters, i.e., the weights. Training the weight distribution of a BNN, however, is more involved due to the intractability of the underlying Bayesian inference problem and thus, requires efficient approximations. In this paper… ▽ More Compared to point estimates calculated by standard neural networks, Bayesian neural networks (BNN) provide probability distributions over the output predictions and model parameters, i.e., the weights. Training the weight distribution of a BNN, however, is more involved due to the intractability of the underlying Bayesian inference problem and thus, requires efficient approximations. In this paper, we propose a novel approach for BNN learning via closed-form Bayesian inference. For this purpose, the calculation of the predictive distribution of the output and the update of the weight distribution are treated as Bayesian filtering and smoothing problems, where the weights are modeled as Gaussian random variables. This allows closed-form expressions for training the network's parameters in a sequential/online fashion without gradient descent. We demonstrate our method on several UCI datasets and compare it to the state of the art. △ Less

Submitted 30 November, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: 37th AAAI Conference on Artificial Intelligence (AAAI)

arXiv:2107.00360 [pdf, ps, other]

Towards Measuring Bias in Image Classification

Authors: Nina Schaaf, Omar de Mitri, Hang Beom Kim, Alexander Windberger, Marco F. Huber

Abstract: Convolutional Neural Networks (CNN) have become de fact state-of-the-art for the main computer vision tasks. However, due to the complex underlying structure their decisions are hard to understand which limits their use in some context of the industrial world. A common and hard to detect challenge in machine learning (ML) tasks is data bias. In this work, we present a systematic approach to uncove… ▽ More Convolutional Neural Networks (CNN) have become de fact state-of-the-art for the main computer vision tasks. However, due to the complex underlying structure their decisions are hard to understand which limits their use in some context of the industrial world. A common and hard to detect challenge in machine learning (ML) tasks is data bias. In this work, we present a systematic approach to uncover data bias by means of attribution maps. For this purpose, first an artificial dataset with a known bias is created and used to train intentionally biased CNNs. The networks' decisions are then inspected using attribution maps. Finally, meaningful metrics are used to measure the attribution maps' representativeness with respect to the known bias. The proposed study shows that some attribution map techniques highlight the presence of bias in the data better than others and metrics can support the identification of bias. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted for publication at the 30th International Conference on Artificial Neural Networks (ICANN)

arXiv:2011.07876 [pdf, other]

doi 10.1613/jair.1.12228

A Survey on the Explainability of Supervised Machine Learning

Authors: Nadia Burkart, Marco F. Huber

Abstract: Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or fifinance, is of paramount importance. The decision-making behind the black boxes requires it to be more tra… ▽ More Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or fifinance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions. △ Less

Submitted 16 November, 2020; originally announced November 2020.

Comments: Accepted for publication at the Journal of Artificial Intelligence Research (JAIR)

Journal ref: Journal of Artificial Intelligence Research (JAIR), 70:245-317, 2021

arXiv:2009.01730 [pdf, other]

Bayesian Perceptron: Towards fully Bayesian Neural Networks

Authors: Marco F. Huber

Abstract: Artificial neural networks (NNs) have become the de facto standard in machine learning. They allow learning highly nonlinear transformations in a plethora of applications. However, NNs usually only provide point estimates without systematically quantifying corresponding uncertainties. In this paper a novel approach towards fully Bayesian NNs is proposed, where training and predictions of a percept… ▽ More Artificial neural networks (NNs) have become the de facto standard in machine learning. They allow learning highly nonlinear transformations in a plethora of applications. However, NNs usually only provide point estimates without systematically quantifying corresponding uncertainties. In this paper a novel approach towards fully Bayesian NNs is proposed, where training and predictions of a perceptron are performed within the Bayesian inference framework in closed-form. The weights and the predictions of the perceptron are considered Gaussian random variables. Analytical expressions for predicting the perceptron's output and for learning the weights are provided for commonly used activation functions like sigmoid or ReLU. This approach requires no computationally expensive gradient calculations and further allows sequential learning. △ Less

Submitted 10 September, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted for publication at the 59th IEEE Conference on Decision and Control (CDC) 2020. v2: correction of typos

arXiv:1904.12054 [pdf, other]

Benchmark and Survey of Automated Machine Learning Frameworks

Authors: Marc-André Zöller, Marco F. Huber

Abstract: Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge o… ▽ More Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suits. △ Less

Submitted 26 January, 2021; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: Revised version accepted for publication at Journal of Artificial Intelligence Research (JAIR)

Journal ref: Journal of Artificial Intelligence Research 70 (2021) 409-472

arXiv:1904.05394 [pdf, ps, other]

Enhancing Decision Tree based Interpretation of Deep Neural Networks through L1-Orthogonal Regularization

Authors: Nina Schaaf, Marco F. Huber, Johannes Maucher

Abstract: One obstacle that so far prevents the introduction of machine learning models primarily in critical areas is the lack of explainability. In this work, a practicable approach of gaining explainability of deep artificial neural networks (NN) using an interpretable surrogate model based on decision trees is presented. Simply fitting a decision tree to a trained NN usually leads to unsatisfactory resu… ▽ More One obstacle that so far prevents the introduction of machine learning models primarily in critical areas is the lack of explainability. In this work, a practicable approach of gaining explainability of deep artificial neural networks (NN) using an interpretable surrogate model based on decision trees is presented. Simply fitting a decision tree to a trained NN usually leads to unsatisfactory results in terms of accuracy and fidelity. Using L1-orthogonal regularization during training, however, preserves the accuracy of the NN, while it can be closely approximated by small decision trees. Tests with different data sets confirm that L1-orthogonal regularization yields models of lower complexity and at the same time higher fidelity compared to other regularizers. △ Less

Submitted 3 October, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: 8 pages, 18th IEEE International Conference on Machine Learning and Applications (ICMLA) 2019

arXiv:1203.6754 [pdf, other]

doi 10.1109/CIP.2010.5604100

On Multi-Step Sensor Scheduling via Convex Optimization

Authors: Marco F. Huber

Abstract: Effective sensor scheduling requires the consideration of long-term effects and thus optimization over long time horizons. Determining the optimal sensor schedule, however, is equivalent to solving a binary integer program, which is computationally demanding for long time horizons and many sensors. For linear Gaussian systems, two efficient multi-step sensor scheduling approaches are proposed in t… ▽ More Effective sensor scheduling requires the consideration of long-term effects and thus optimization over long time horizons. Determining the optimal sensor schedule, however, is equivalent to solving a binary integer program, which is computationally demanding for long time horizons and many sensors. For linear Gaussian systems, two efficient multi-step sensor scheduling approaches are proposed in this paper. The first approach determines approximate but close to optimal sensor schedules via convex optimization. The second approach combines convex optimization with a \BB search for efficiently determining the optimal sensor schedule. △ Less

Submitted 30 March, 2012; originally announced March 2012.

Comments: 6 pages, appeared in the proceedings of the 2nd International Workshop on Cognitive Information Processing (CIP), Elba, Italy, June 2010

arXiv:1203.6750 [pdf, other]

Adaptive Gaussian Mixture Filter Based on Statistical Linearization

Authors: Marco F. Huber

Abstract: Gaussian mixtures are a common density representation in nonlinear, non-Gaussian Bayesian state estimation. Selecting an appropriate number of Gaussian components, however, is difficult as one has to trade of computational complexity against estimation accuracy. In this paper, an adaptive Gaussian mixture filter based on statistical linearization is proposed. Depending on the nonlinearity of the c… ▽ More Gaussian mixtures are a common density representation in nonlinear, non-Gaussian Bayesian state estimation. Selecting an appropriate number of Gaussian components, however, is difficult as one has to trade of computational complexity against estimation accuracy. In this paper, an adaptive Gaussian mixture filter based on statistical linearization is proposed. Depending on the nonlinearity of the considered estimation problem, this filter dynamically increases the number of components via splitting. For this purpose, a measure is introduced that allows for quantifying the locally induced linearization error at each Gaussian mixture component. The deviation between the nonlinear and the linearized state space model is evaluated for determining the splitting direction. The proposed approach is not restricted to a specific statistical linearization method. Simulations show the superior estimation performance compared to related approaches and common filtering algorithms. △ Less

Submitted 30 March, 2012; originally announced March 2012.

Comments: 8 pages, appeared in the proceedings of the 14th International Conference on Information Fusion, Chicago, Illinois, USA, July 2011. Correction of an error in formula (22). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5977694&isnumber=5977431

arXiv:1203.4345 [pdf, other]

doi 10.1109/TAC.2011.2179426

Robust Filtering and Smoothing with Gaussian Processes

Authors: Marc Peter Deisenroth, Ryan Turner, Marco F. Huber, Uwe D. Hanebeck, Carl Edward Rasmussen

Abstract: We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior pr… ▽ More We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. In this article, we present a principled algorithm for robust analytic smoothing in GP dynamic systems, which are increasingly used in robotics and control. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail. △ Less

Submitted 20 March, 2012; originally announced March 2012.

Comments: 7 pages, 1 figure, draft version of paper accepted at IEEE Transactions on Automatic Control

Showing 1–9 of 9 results for author: Huber, M F