Search | arXiv e-print repository

On the Convexity and Reliability of the Bethe Free Energy Approximation

Authors: Harald Leisenberger, Christian Knoll, Franz Pernkopf

Abstract: The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if… ▽ More The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if it is convex on a submanifold of its domain, the 'Bethe box'. For verifying its convexity, we derive two sufficient conditions that are based on the definiteness properties of the Bethe Hessian matrix: the first uses the concept of diagonal dominance, and the second decomposes the Bethe Hessian matrix into a sum of sparse matrices and characterizes the definiteness properties of the individual matrices in that sum. These theoretical results provide a simple way to estimate the critical phase transition temperature of a model. As a practical contribution we propose $\texttt{BETHE-MIN}$, a projected quasi-Newton method to efficiently find a minimum of the Bethe free energy. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the Journal of Machine Learning Research (JMLR) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2402.14781 [pdf, ps, other]

Rao-Blackwellising Bayesian Causal Inference

Authors: Christian Toth, Christian Knoll, Franz Pernkopf, Robert Peharz

Abstract: Bayesian causal inference, i.e., inferring a posterior over causal models for the use in downstream causal reasoning tasks, poses a hard computational inference problem that is little explored in literature. In this work, we combine techniques from order-based MCMC structure learning with recent advances in gradient-based graph learning into an effective Bayesian causal inference framework. Specif… ▽ More Bayesian causal inference, i.e., inferring a posterior over causal models for the use in downstream causal reasoning tasks, poses a hard computational inference problem that is little explored in literature. In this work, we combine techniques from order-based MCMC structure learning with recent advances in gradient-based graph learning into an effective Bayesian causal inference framework. Specifically, we decompose the problem of inferring the causal structure into (i) inferring a topological order over variables and (ii) inferring the parent sets for each variable. When limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time. We further use Gaussian processes to model the unknown causal mechanisms, which also allows their exact marginalisation. This introduces a Rao-Blackwellization scheme, where all components are eliminated from the model, except for the causal order, for which we learn a distribution via gradient-based optimisation. The combination of Rao-Blackwellization with our sequential inference procedure for causal orders yields state-of-the-art on linear and non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 8 pages + references + appendices (19 pages total)

arXiv:2401.05385 [pdf, other]

doi 10.23919/EuRAD58043.2023.10289631

Angle-Equivariant Convolutional Neural Networks for Interference Mitigation in Automotive Radar

Authors: Christian Oswald, Mate Toth, Paul Meissner, Franz Pernkopf

Abstract: In automotive applications, frequency modulated continuous wave (FMCW) radar is an established technology to determine the distance, velocity and angle of objects in the vicinity of the vehicle. The quality of predictions might be seriously impaired if mutual interference between radar sensors occurs. Previous work processes data from the entire receiver array in parallel to increase interference… ▽ More In automotive applications, frequency modulated continuous wave (FMCW) radar is an established technology to determine the distance, velocity and angle of objects in the vicinity of the vehicle. The quality of predictions might be seriously impaired if mutual interference between radar sensors occurs. Previous work processes data from the entire receiver array in parallel to increase interference mitigation quality using neural networks (NNs). However, these architectures do not generalize well across different angles of arrival (AoAs) of interferences and objects. In this paper we introduce fully convolutional neural network (CNN) with rank-three convolutions which is able to transfer learned patterns between different AoAs. Our proposed architecture outperforms previous work while having higher robustness and a lower number of trainable parameters. We evaluate our network on a diverse data set and demonstrate its angle equivariance. △ Less

Submitted 18 December, 2023; originally announced January 2024.

Comments: 4 pages, 3 figures

Journal ref: 2023 20th European Radar Conference (EuRAD) (pp. 135-138). IEEE

arXiv:2312.09790 [pdf, other]

End-to-End Training of Neural Networks for Automotive Radar Interference Mitigation

Authors: Christian Oswald, Mate Toth, Paul Meissner, Franz Pernkopf

Abstract: In this paper we propose a new method for training neural networks (NNs) for frequency modulated continuous wave (FMCW) radar mutual interference mitigation. Instead of training NNs to regress from interfered to clean radar signals as in previous work, we train NNs directly on object detection maps. We do so by performing a continuous relaxation of the cell-averaging constant false alarm rate (CA-… ▽ More In this paper we propose a new method for training neural networks (NNs) for frequency modulated continuous wave (FMCW) radar mutual interference mitigation. Instead of training NNs to regress from interfered to clean radar signals as in previous work, we train NNs directly on object detection maps. We do so by performing a continuous relaxation of the cell-averaging constant false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm for object detection using radar. With this new training objective we are able to increase object detection performance by a large margin. Furthermore, we introduce separable convolution kernels to strongly reduce the number of parameters and computational complexity of convolutional NN architectures for radar applications. We validate our contributions with experiments on real-world measurement data and compare them against signal processing interference mitigation methods. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Radar Conference (RADAR), 6 pages, 4 figures

arXiv:2311.10478 [pdf, other]

"UWBCarGraz" Dataset for Car Occupancy Detection using Ultra-Wideband Radar

Authors: Jakob Möderl, Stefan Posch, Franz Pernkopf, Klaus Witrisal

Abstract: We present a data-driven car occupancy detection algorithm using ultra-wideband radar based on the ResNet architecture. The algorithm is trained on a dataset of channel impulse responses obtained from measurements at three different activity levels of the occupants (i.e. breathing, talking, moving). We compare the presented algorithm against a state-of-the-art car occupancy detection algorithm bas… ▽ More We present a data-driven car occupancy detection algorithm using ultra-wideband radar based on the ResNet architecture. The algorithm is trained on a dataset of channel impulse responses obtained from measurements at three different activity levels of the occupants (i.e. breathing, talking, moving). We compare the presented algorithm against a state-of-the-art car occupancy detection algorithm based on variational message passing (VMP). Our presented ResNet architecture is able to outperform the VMP algorithm in terms of the area under the receiver operating curve (AUC) at low signal-to-noise ratios (SNRs) for all three activity levels of the target. Specifically, for an SNR of -20 dB the VMP detector achieves an AUC of 0.87 while the ResNet architecture achieves an AUC of 0.91 if the target is sitting still and breathing naturally. The difference in performance for the other activities is similar. To facilitate the implementation in the onboard computer of a car we perform an ablation study to optimize the tradeoff between performance and computational complexity for several ResNet architectures. The dataset used to train and evaluate the algorithm is openly accessible. This facilitates an easy comparison in future works. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: v1 (17.11.2023). 6 pages, 5 figures

arXiv:2306.00442 [pdf, other]

Fast Variational Block-Sparse Bayesian Learning

Authors: Jakob Möderl, Franz Pernkopf, Klaus Witrisal, Erik Leitinger

Abstract: We present a fast update rule for variational block-sparse Bayesian learning (SBL) methods. Based on a variational Bayesian approximation, we show that iterative updates of probability density functions (PDFs) of the prior precisions and weights can be expressed as a nonlinear first-order recurrence from one estimate of the parameters of the proxy PDFs to the next. In particular, for commonly used… ▽ More We present a fast update rule for variational block-sparse Bayesian learning (SBL) methods. Based on a variational Bayesian approximation, we show that iterative updates of probability density functions (PDFs) of the prior precisions and weights can be expressed as a nonlinear first-order recurrence from one estimate of the parameters of the proxy PDFs to the next. In particular, for commonly used prior PDFs such as Jeffrey's prior, the recurrence relation turns out to be a strictly increasing rational function. This property is the basis for two important analytical results. First, the determination of fixed points by solving for the roots of a polynomial. Second, the determination of the limit of the prior precision after an infinite sequence of update steps. These results are combined into a simplified single-step check for convergence/divergence of each prior precision. Consequently, our proposed criterion significantly reduces the computational complexity of the variational block-SBL algorithm, leading to a remarkable two orders of magnitude improvement in convergence speed shown by simulations. Moreover, the criterion provides valuable insights into the sparsity of the estimators obtained by different prior choices. △ Less

Submitted 13 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 10 pages, 4 figures, submitted to IEEE Transactions on Signal Processing on 1st of June, 2023, Major Revision on Dec. 3, 2023

arXiv:2303.07821 [pdf, ps, other]

Self-attention for Enhanced OAMP Detection in MIMO Systems

Authors: Alexander Fuchs, Christian Knoll, Nima N. Moghadam, Alexey Pak **liang Huang, Erik Leitinger, Franz Pernkopf

Abstract: Multiple-Input Multiple-Output (MIMO) systems are essential for wireless communications. Sinceclassical algorithms for symbol detection in MIMO setups require large computational resourcesor provide poor results, data-driven algorithms are becoming more popular. Most of the proposedalgorithms, however, introduce approximations leading to degraded performance for realistic MIMOsystems. In this pape… ▽ More Multiple-Input Multiple-Output (MIMO) systems are essential for wireless communications. Sinceclassical algorithms for symbol detection in MIMO setups require large computational resourcesor provide poor results, data-driven algorithms are becoming more popular. Most of the proposedalgorithms, however, introduce approximations leading to degraded performance for realistic MIMOsystems. In this paper, we introduce a neural-enhanced hybrid model, augmenting the analyticbackbone algorithm with state-of-the-art neural network components. In particular, we introduce aself-attention model for the enhancement of the iterative Orthogonal Approximate Message Passing(OAMP)-based decoding algorithm. In our experiments, we show that the proposed model canoutperform existing data-driven approaches for OAMP while having improved generalization to otherSNR values at limited computational overhead. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 8 pages, 2 figures, ICASSP 2023

ACM Class: I.2.1; H.1.1

arXiv:2303.03017 [pdf, other]

Variational Inference of Structured Line Spectra Exploiting Group-Sparsity

Authors: Jakob Möderl, Franz Pernkopf, Klaus Witrisal, Erik Leitinger

Abstract: In this paper, we present a variational inference algorithm that decomposes a signal into multiple groups of related spectral lines. The spectral lines in each group are associated with a group parameter common to all spectral lines within the group. The proposed algorithm jointly estimates the group parameters, the number of spetral lines within a group, and the number of groups exploiting a Bern… ▽ More In this paper, we present a variational inference algorithm that decomposes a signal into multiple groups of related spectral lines. The spectral lines in each group are associated with a group parameter common to all spectral lines within the group. The proposed algorithm jointly estimates the group parameters, the number of spetral lines within a group, and the number of groups exploiting a Bernoulli-Gamma-Gaussian hierarchical prior model which promotes sparse solutions. Aiming to maximize the evidence lower bound (ELBO), variational inference provides analytic approximations of the posterior probability density functions (PDFs) and also gives estimates of the additional model parameters such as the measurement noise variance. While the activation variables of the groups and the associated group parameters (such as fundamental frequencies and the corresponding higher order harmonics) are estimated as point estimates, the remaining parameters such as the complex amplitudes of the spectral lines and their precision parameters are estimated as approximate posterior PDFs. We demonstrate the versatility and performance of the proposed algorithm on three different inference problems. In particular, the proposed algorithm is applied to the multi-pitch estimation problem, the radar signal-based extended object estimation problem, and variational mode decomposition (VMD) using synthetic measurements and to real multi-pitch estimation problem using the Bach-10 dataset. The results show that the proposed algorithm outperforms state-of-the-art model-based and pre-trained algorithms on all three inference problems. △ Less

Submitted 31 May, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 13 Pages, 5 Figures. Submitted to IEEE Transactions on Signal Processing on 6th of March, 2023. Update 31.05.2023: Fixed wrong/missing internal references

arXiv:2210.07619 [pdf, other]

Variational Message Passing-Based Respiratory Motion Estimation and Detection Using Radar Signals

Authors: Jakob Möderl, Erik Leitinger, Franz Pernkopf, Klaus Witrisal

Abstract: We present a variational message passing (VMP) approach to detect the presence of a person based on their respiratory chest motion using multistatic ultra-wideband (UWB) radar. In the process, the respiratory motion is estimated for contact-free vital sign monitoring. The received signal is modeled by a backscatter channel and the respiratory motion and propagation channels are estimated using VMP… ▽ More We present a variational message passing (VMP) approach to detect the presence of a person based on their respiratory chest motion using multistatic ultra-wideband (UWB) radar. In the process, the respiratory motion is estimated for contact-free vital sign monitoring. The received signal is modeled by a backscatter channel and the respiratory motion and propagation channels are estimated using VMP. We use the evidence lower bound (ELBO) to approximate the model evidence for the detection. Numerical analyses and measurements demonstrate that the proposed method leads to a significant improvement in the detection performance compared to a fast Fourier transform (FFT)-based detector or an estimator-correlator, since the multipath components (MPCs) are better incorporated into the detection procedure. Specifically, the proposed method has a detection probability of 0.95 at -20 dB signal-to-noise ratio (SNR), while the estimator-correlator and FFT-based detector have 0.32 and 0.05, respectively. △ Less

Submitted 29 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 29.10.22: Updated with extension to multistatic radar systems. Submitted to ICASSP 2023, 4 pages + references, 4 figures, UWB radar rar occupancy detection, variational message passing

arXiv:2206.02063 [pdf, other]

Active Bayesian Causal Inference

Authors: Christian Toth, Lars Lorch, Christian Knoll, Andreas Krause, Franz Pernkopf, Robert Peharz, Julius von Kügelgen

Abstract: Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a B… ▽ More Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest. △ Less

Submitted 15 October, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022 camera-ready version. RP & JvK are shared last authors. 10 pages + Bibliography + Appendix (34 pages total)

arXiv:2202.05610 [pdf]

doi 10.1103/PhysRevAccelBeams.25.104601

Explainable Machine Learning for Breakdown Prediction in High Gradient RF Cavities

Authors: Christoph Obermair, Thomas Cartier-Michaud, Andrea Apollonio, William Millar, Lukas Felsberger, Lorenz Fischl, Holger Severin Bovbjerg, Daniel Wollmann, Walter Wuensch, Nuria Catalan-Lasheras, Marçà Boronat, Franz Pernkopf, Graeme Burt

Abstract: The occurrence of vacuum arcs or radio frequency (rf) breakdowns is one of the most prevalent factors limiting the high-gradient performance of normal conducting rf cavities in particle accelerators. In this paper, we search for the existence of previously unrecognized features related to the incidence of rf breakdowns by applying a machine learning strategy to high-gradient cavity data from CERN'… ▽ More The occurrence of vacuum arcs or radio frequency (rf) breakdowns is one of the most prevalent factors limiting the high-gradient performance of normal conducting rf cavities in particle accelerators. In this paper, we search for the existence of previously unrecognized features related to the incidence of rf breakdowns by applying a machine learning strategy to high-gradient cavity data from CERN's test stand for the Compact Linear Collider (CLIC). By interpreting the parameters of the learned models with explainable artificial intelligence (AI), we reverse-engineer physical properties for deriving fast, reliable, and simple rule-based models. Based on 6 months of historical data and dedicated experiments, our models show fractions of data with a high influence on the occurrence of breakdowns. Specifically, it is shown that the field emitted current following an initial breakdown is closely related to the probability of another breakdown occurring shortly thereafter. Results also indicate that the cavity pressure should be monitored with increased temporal resolution in future experiments, to further explore the vacuum activity associated with breakdowns. △ Less

Submitted 8 December, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2201.10360 [pdf, other]

doi 10.1109/JSTSP.2021.3062452

Resource-efficient Deep Neural Networks for Automotive Radar Interference Mitigation

Authors: Johanna Rock, Wolfgang Roth, Mate Toth, Paul Meissner, Franz Pernkopf

Abstract: Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous vehicles. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Algorithms and models operating on radar data are required to run the early processing steps on specialized radar sensor hardware.… ▽ More Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous vehicles. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Algorithms and models operating on radar data are required to run the early processing steps on specialized radar sensor hardware. This specialized hardware typically has strict resource-constraints, i.e. a low memory capacity and low computational power. Convolutional Neural Network (CNN)-based approaches for denoising and interference mitigation yield promising results for radar processing in terms of performance. Regarding resource-constraints, however, CNNs typically exceed the hardware's capacities by far. In this paper we investigate quantization techniques for CNN-based denoising and interference mitigation of radar signals. We analyze the quantization of (i) weights and (ii) activations of different CNN-based model architectures. This quantization results in reduced memory requirements for model storage and during inference. We compare models with fixed and learned bit-widths and contrast two different methodologies for training quantized CNNs, i.e. the straight-through gradient estimator and training distributions over discrete weights. We illustrate the importance of structurally small real-valued base models for quantization and show that learned bit-widths yield the smallest models. We achieve a memory reduction of around 80\% compared to the real-valued baseline. Due to practical reasons, however, we recommend the use of 8 bits for weights and activations, which results in models that require only 0.2 megabytes of memory. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 15 pages; published in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Recent Advances in Automotive Radar Signal Processing, Volume: 15, Issue: 4, June 2021. arXiv admin note: text overlap with arXiv:2011.12706

Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 4, pp. 927-940, June 2021

arXiv:2110.01955 [pdf, other]

Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks

Authors: Alexander Fuchs, Christian Knoll, Franz Pernkopf

Abstract: Deep neural networks rely heavily on normalization methods to improve their performance and learning behavior. Although normalization methods spurred the development of increasingly deep and efficient architectures, they also increase the vulnerability with respect to noise and input corruptions. In most applications, however, noise is ubiquitous and diverse; this can often lead to complete failur… ▽ More Deep neural networks rely heavily on normalization methods to improve their performance and learning behavior. Although normalization methods spurred the development of increasingly deep and efficient architectures, they also increase the vulnerability with respect to noise and input corruptions. In most applications, however, noise is ubiquitous and diverse; this can often lead to complete failure of machine learning systems as they fail to cope with mismatches between the input distribution during training- and test-time. The most common normalization method, batch normalization, reduces the distribution shift during training but is agnostic to changes in the input distribution during test time. This makes batch normalization prone to performance degradation whenever noise is present during test-time. Sample-based normalization methods can correct linear transformations of the activation distribution but cannot mitigate changes in the distribution shape; this makes the network vulnerable to distribution changes that cannot be reflected in the normalization parameters. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. This reduces the mismatch between the training and test-time distribution by minimizing the 1-D Wasserstein distance. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions and thus improves the classification performance without the need for retraining or fine-tuning the model. △ Less

Submitted 5 October, 2021; originally announced October 2021.

ACM Class: I.2.0; I.4.0

arXiv:2108.01991 [pdf, ps, other]

Lung Sound Classification Using Co-tuning and Stochastic Normalization

Authors: Truc Nguyen, Franz Pernkopf

Abstract: In this paper, we use pre-trained ResNet models as backbone architectures for classification of adventitious lung sounds and respiratory diseases. The knowledge of the pre-trained model is transferred by using vanilla fine-tuning, co-tuning, stochastic normalization and the combination of the co-tuning and stochastic normalization techniques. Furthermore, data augmentation in both time domain and… ▽ More In this paper, we use pre-trained ResNet models as backbone architectures for classification of adventitious lung sounds and respiratory diseases. The knowledge of the pre-trained model is transferred by using vanilla fine-tuning, co-tuning, stochastic normalization and the combination of the co-tuning and stochastic normalization techniques. Furthermore, data augmentation in both time domain and time-frequency domain is used to account for the class imbalance of the ICBHI and our multi-channel lung sound dataset. Additionally, we apply spectrum correction to consider the variations of the recording device properties on the ICBHI dataset. Empirically, our proposed systems mostly outperform all state-of-the-art lung sound classification systems for the adventitious lung sounds and respiratory diseases of both datasets. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Submitted to IEEE BE Transaction

arXiv:2105.00929 [pdf, other]

Complex-valued Convolutional Neural Networks for Enhanced Radar Signal Denoising and Interference Mitigation

Authors: Alexander Fuchs, Johanna Rock, Mate Toth, Paul Meissner, Franz Pernkopf

Abstract: Autonomous driving highly depends on capable sensors to perceive the environment and to deliver reliable information to the vehicles' control systems. To increase its robustness, a diversified set of sensors is used, including radar sensors. Radar is a vital contribution of sensory information, providing high resolution range as well as velocity measurements. The increased use of radar sensors in… ▽ More Autonomous driving highly depends on capable sensors to perceive the environment and to deliver reliable information to the vehicles' control systems. To increase its robustness, a diversified set of sensors is used, including radar sensors. Radar is a vital contribution of sensory information, providing high resolution range as well as velocity measurements. The increased use of radar sensors in road traffic introduces new challenges. As the so far unregulated frequency band becomes increasingly crowded, radar sensors suffer from mutual interference between multiple radar sensors. This interference must be mitigated in order to ensure a high and consistent detection sensitivity. In this paper, we propose the use of Complex-Valued Convolutional Neural Networks (CVCNNs) to address the issue of mutual interference between radar sensors. We extend previously developed methods to the complex domain in order to process radar data according to its physical characteristics. This not only increases data efficiency, but also improves the conservation of phase information during filtering, which is crucial for further processing, such as angle estimation. Our experiments show, that the use of CVCNNs increases data efficiency, speeds up network training and substantially improves the conservation of phase information during interference removal. △ Less

Submitted 29 April, 2021; originally announced May 2021.

Journal ref: IEEE International Radar Conference 2021

arXiv:2104.14921 [pdf, ps, other]

Crackle Detection In Lung Sounds Using Transfer Learning And Multi-Input Convolitional Neural Networks

Authors: Truc Nguyen, Franz Pernkopf

Abstract: Large annotated lung sound databases are publicly available and might be used to train algorithms for diagnosis systems. However, it might be a challenge to develop a well-performing algorithm for small non-public data, which have only a few subjects and show differences in recording devices and setup. In this paper, we use transfer learning to tackle the mismatch of the recording setup. This allo… ▽ More Large annotated lung sound databases are publicly available and might be used to train algorithms for diagnosis systems. However, it might be a challenge to develop a well-performing algorithm for small non-public data, which have only a few subjects and show differences in recording devices and setup. In this paper, we use transfer learning to tackle the mismatch of the recording setup. This allows us to transfer knowledge from one dataset to another dataset for crackle detection in lung sounds. In particular, a single input convolutional neural network (CNN) model is pre-trained on a source domain using ICBHI 2017, the largest publicly available database of lung sounds. We use log-mel spectrogram features of respiratory cycles of lung sounds. The pre-trained network is used to build a multi-input CNN model, which shares the same network architecture for respiratory cycles and their corresponding respiratory phases. The multi-input model is then fine-tuned on the target domain of our self-collected lung sound database for classifying crackles and normal lung sounds. Our experimental results show significant performance improvements of 9.84% (absolute) in F-score on the target domain using the multi-input CNN model based on transfer learning for crackle detection in adventitious lung sound classification task. △ Less

Submitted 30 April, 2021; originally announced April 2021.

Comments: Under Review in Proceeding of EMBC 2021

arXiv:2104.06666 [pdf, other]

End-to-end Keyword Spotting using Neural Architecture Search and Quantization

Authors: David Peter, Wolfgang Roth, Franz Pernkopf

Abstract: This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms. After a suitable KWS model is found with NAS, we conduct quantization of weights and activations to… ▽ More This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms. After a suitable KWS model is found with NAS, we conduct quantization of weights and activations to reduce the memory footprint. We conduct extensive experiments on the Google speech commands dataset. In particular, we compare our end-to-end approach to mel-frequency cepstral coefficient (MFCC) based systems. For quantization, we compare fixed bit-width quantization and trained bit-width quantization. Using NAS only, we were able to obtain a highly efficient model with an accuracy of 95.55% using 75.7k parameters and 13.6M operations. Using trained bit-width quantization, the same model achieves a test accuracy of 93.76% while using on average only 2.91 bits per activation and 2.51 bits per weight. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:2012.10138

arXiv:2103.13443 [pdf, other]

Blind Speech Separation and Dereverberation using Neural Beamforming

Authors: Lukas Pfeifenberger, Franz Pernkopf

Abstract: In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triple… ▽ More In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER). △ Less

Submitted 4 November, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 13 pages, 9 figures

arXiv:2012.10138 [pdf, other]

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

Authors: David Peter, Wolfgang Roth, Franz Pernkopf

Abstract: This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting (KWS) in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy while minimizing the number of operations per inference. Using NAS only, we were able to obtai… ▽ More This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting (KWS) in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy while minimizing the number of operations per inference. Using NAS only, we were able to obtain a highly efficient model with 95.4% accuracy on the Google speech commands dataset with 494.8 kB of memory usage and 19.6 million operations. Additionally, weight quantization is used to reduce the memory consumption even further. We show that weight quantization to low bit-widths (e.g. 1 bit) can be used without substantial loss in accuracy. By increasing the number of input features from 10 MFCC to 20 MFCC we were able to increase the accuracy to 96.3% at 340.1 kB of memory usage and 27.1 million operations. △ Less

Submitted 18 December, 2020; originally announced December 2020.

arXiv:2012.02529 [pdf, other]

doi 10.1109/RADAR42522.2020.9114627

Deep Interference Mitigation and Denoising of Real-World FMCW Radar Signals

Authors: Johanna Rock, Mate Toth, Paul Meissner, Franz Pernkopf

Abstract: Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous cars. Key performance factors are a fine range resolution and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Sensors must be capable of detec… ▽ More Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous cars. Key performance factors are a fine range resolution and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Sensors must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we evaluate a Convolutional Neural Network (CNN)-based approach for interference mitigation on real-world radar measurements. We combine real measurements with simulated interference in order to create input-output data suitable for training the model. We analyze the performance to model complexity relation on simulated and measurement data, based on an extensive parameter search. Further, a finite sample size performance comparison shows the effectiveness of the model trained on either simulated or real data as well as for transfer learning. A comparative performance analysis with the state of the art emphasizes the potential of CNN-based models for interference mitigation and denoising of real-world measurements, also considering resource constraints of the hardware. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 2020 IEEE International Radar Conference (RADAR)

arXiv:2011.12706 [pdf, other]

Quantized Neural Networks for Radar Interference Mitigation

Authors: Johanna Rock, Wolfgang Roth, Paul Meissner, Franz Pernkopf

Abstract: Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous vehicles. Key performance factors are weather resistance and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Algorithms and models operating o… ▽ More Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous vehicles. Key performance factors are weather resistance and the possibility to directly measure velocity. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is inevitable and must be dealt with. Algorithms and models operating on radar data in early processing stages are required to run directly on specialized hardware, i.e. the radar sensor. This specialized hardware typically has strict resource-constraints, i.e. a low memory capacity and low computational power. Convolutional Neural Network (CNN)-based approaches for denoising and interference mitigation yield promising results for radar processing in terms of performance. However, these models typically contain millions of parameters, stored in hundreds of megabytes of memory, and require additional memory during execution. In this paper we investigate quantization techniques for CNN-based denoising and interference mitigation of radar signals. We analyze the quantization potential of different CNN-based model architectures and sizes by considering (i) quantized weights and (ii) piecewise constant activation functions, which results in reduced memory requirements for model storage and during the inference step respectively. △ Less

Submitted 1 December, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: ITEM Workshop at ECML-PKDD 2020

arXiv:2010.11773 [pdf, other]

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Authors: Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf

Abstract: We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach by also considering the model size. Both methods are motiva… ▽ More We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach by also considering the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options. △ Less

Submitted 22 September, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: Accepted at ICPR 2020, fixed Figure 5

arXiv:2008.09566 [pdf, other]

Differentiable TAN Structure Learning for Bayesian Network Classifiers

Authors: Wolfgang Roth, Franz Pernkopf

Abstract: Learning the structure of Bayesian networks is a difficult combinatorial optimization problem. In this paper, we consider learning of tree-augmented naive Bayes (TAN) structures for Bayesian network classifiers with discrete input features. Instead of performing a combinatorial optimization over the space of possible graph structures, the proposed method learns a distribution over graph structures… ▽ More Learning the structure of Bayesian networks is a difficult combinatorial optimization problem. In this paper, we consider learning of tree-augmented naive Bayes (TAN) structures for Bayesian network classifiers with discrete input features. Instead of performing a combinatorial optimization over the space of possible graph structures, the proposed method learns a distribution over graph structures. After training, we select the most probable structure of this distribution. This allows for a joint training of the Bayesian network parameters along with its TAN structure using gradient-based optimization. The proposed method is agnostic to the specific loss and only requires that it is differentiable. We perform extensive experiments using a hybrid generative-discriminative loss based on the discriminative probabilistic margin. Our method consistently outperforms random TAN structures and Chow-Liu TAN structures. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: Accepted at PGM 2020

arXiv:2007.11477 [pdf, other]

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement

Authors: Lukas Pfeifenberger, Matthias Zöhrer, Günther Schindler, Wolfgang Roth, Holger Fröning, Franz Pernkopf

Abstract: While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches. This need for resource-efficient machine learning is primarily driven by the demand for embedded systems and their usage in ubiquitous computing and IoT applications. In this article, we provide a resource-efficient approach for mult… ▽ More While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches. This need for resource-efficient machine learning is primarily driven by the demand for embedded systems and their usage in ubiquitous computing and IoT applications. In this article, we provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs). In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations. This speech mask is used to obtain either the Minimum Variance Distortionless Response (MVDR) or Generalized Eigenvalue (GEV) beamformer. In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible while still obtaining an audio quality almost on par to single-precision DNNs and a slightly larger Word Error Rate (WER) for single speaker scenarios using the WSJ0 speech corpus. △ Less

Submitted 22 July, 2020; originally announced July 2020.

arXiv:2007.11465 [pdf, ps, other]

Wasserstein Routed Capsule Networks

Authors: Alexander Fuchs, Franz Pernkopf

Abstract: Capsule networks offer interesting properties and provide an alternative to today's deep neural network architectures. However, recent approaches have failed to consistently achieve competitive results across different image datasets. We propose a new parameter efficient capsule architecture, that is able to tackle complex tasks by using neural networks trained with an approximate Wasserstein obje… ▽ More Capsule networks offer interesting properties and provide an alternative to today's deep neural network architectures. However, recent approaches have failed to consistently achieve competitive results across different image datasets. We propose a new parameter efficient capsule architecture, that is able to tackle complex tasks by using neural networks trained with an approximate Wasserstein objective to dynamically select capsules throughout the entire architecture. This approach focuses on implementing a robust routing scheme, which can deliver improved results using little overhead. We perform several ablation studies verifying the proposed concepts and show that our network is able to substantially outperform other capsule approaches by over 1.2 % on CIFAR-10, using fewer parameters. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: 8 pages, 3 figures

ACM Class: I.2.10

arXiv:2001.03048 [pdf, other]

Resource-Efficient Neural Networks for Embedded Systems

Authors: Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani

Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges… ▽ More While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality. △ Less

Submitted 7 April, 2024; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: arXiv admin note: text overlap with arXiv:1812.02240; accepted at JMLR

arXiv:1910.04536 [pdf, other]

Deep Structured Mixtures of Gaussian Processes

Authors: Martin Trapp, Robert Peharz, Franz Pernkopf, Carl E. Rasmussen

Abstract: Gaussian Processes (GPs) are powerful non-parametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, local-expert techniques proposed so far are ei… ▽ More Gaussian Processes (GPs) are powerful non-parametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, local-expert techniques proposed so far are either not well-principled, come with limited approximation guarantees, or lead to intractable models. In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii) when used as GP approximation, captures predictive uncertainties consistently better than previous expert-based approximations. In a variety of experiments, we show that deep structured mixtures have a low approximation error and often perform competitive or outperform prior work. △ Less

Submitted 26 April, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: AISTATS 2020

arXiv:1907.04708 [pdf, other]

Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning (Full Version)

Authors: Bernhard K. Aichernig, Roderick Bloem, Masoud Ebrahimi, Martin Horn, Franz Pernkopf, Wolfgang Roth, Astrid Rupp, Martin Tappler, Markus Tranninger

Abstract: Models play an essential role in the design process of cyber-physical systems. They form the basis for simulation and analysis and help in identifying design problems as early as possible. However, the construction of models that comprise physical and digital behavior is challenging. Therefore, there is considerable interest in learning such hybrid behavior by means of machine learning which requi… ▽ More Models play an essential role in the design process of cyber-physical systems. They form the basis for simulation and analysis and help in identifying design problems as early as possible. However, the construction of models that comprise physical and digital behavior is challenging. Therefore, there is considerable interest in learning such hybrid behavior by means of machine learning which requires sufficient and representative training data covering the behavior of the physical system adequately. In this work, we exploit a combination of automata learning and model-based testing to generate sufficient training data fully automatically. Experimental results on a platooning scenario show that recurrent neural networks learned with this data achieved significantly better results compared to models learned from randomly generated data. In particular, the classification error for crash detection is reduced by a factor of five and a similar F1-score is obtained with up to three orders of magnitude fewer training samples. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: This is an extended version of the conference paper "Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning" accepted for presentation at IFIP-ICTSS 2019, the 31st International Conference on Testing Software and Systems in Paris, France

arXiv:1906.10044 [pdf, other]

Complex Signal Denoising and Interference Mitigation for Automotive Radar Using Convolutional Neural Networks

Authors: Johanna Rock, Mate Toth, Elmar Messner, Paul Meissner, Franz Pernkopf

Abstract: Driver assistance systems as well as autonomous cars have to rely on sensors to perceive their environment. A heterogeneous set of sensors is used to perform this task robustly. Among them, radar sensors are indispensable because of their range resolution and the possibility to directly measure velocity. Since more and more radar sensors are deployed on the streets, mutual interference must be dea… ▽ More Driver assistance systems as well as autonomous cars have to rely on sensors to perceive their environment. A heterogeneous set of sensors is used to perform this task robustly. Among them, radar sensors are indispensable because of their range resolution and the possibility to directly measure velocity. Since more and more radar sensors are deployed on the streets, mutual interference must be dealt with. In the so far unregulated automotive radar frequency band, a sensor must be capable of detecting, or even mitigating the harmful effects of interference, which include a decreased detection sensitivity. In this paper, we address this issue with Convolutional Neural Networks (CNNs), which are state-of-the-art machine learning tools. We show that the ability of CNNs to find structured information in data while preserving local information enables superior denoising performance. To achieve this, CNN parameters are found using training with simulated data and integrated into the automotive radar signal processing chain. The presented method is compared with the state of the art, highlighting its promising performance. Hence, CNNs can be employed for interference mitigation as an alternative to conventional signal processing methods. Code and pre-trained models are available at https://github.com/johanna-rock/imRICnn. △ Less

Submitted 25 June, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: FUSION 2019; 8 pages

arXiv:1906.05180 [pdf, other]

Parameterized Structured Pruning for Deep Neural Networks

Authors: Guenther Schindler, Wolfgang Roth, Franz Pernkopf, Holger Froening

Abstract: As a result of the growing size of Deep Neural Networks (DNNs), the gap to hardware capabilities in terms of memory and compute increases. To effectively compress DNNs, quantization and connection pruning are usually considered. However, unconstrained pruning usually leads to unstructured parallelism, which maps poorly to massively parallel processors, and substantially reduces the efficiency of g… ▽ More As a result of the growing size of Deep Neural Networks (DNNs), the gap to hardware capabilities in terms of memory and compute increases. To effectively compress DNNs, quantization and connection pruning are usually considered. However, unconstrained pruning usually leads to unstructured parallelism, which maps poorly to massively parallel processors, and substantially reduces the efficiency of general-purpose processors. Similar applies to quantization, which often requires dedicated hardware. We propose Parameterized Structured Pruning (PSP), a novel method to dynamically learn the shape of DNNs through structured sparsity. PSP parameterizes structures (e.g. channel- or layer-wise) in a weight tensor and leverages weight decay to learn a clear distinction between important and unimportant structures. As a result, PSP maintains prediction performance, creates a substantial amount of sparsity that is structured and, thus, easy and efficient to map to a variety of massively parallel processors, which are mandatory for utmost compute power and energy efficiency. PSP is experimentally validated on the popular CIFAR10/100 and ILSVRC2012 datasets using ResNet and DenseNet architectures, respectively. △ Less

Submitted 12 June, 2019; originally announced June 2019.

arXiv:1905.10884 [pdf, other]

Bayesian Learning of Sum-Product Networks

Authors: Martin Trapp, Robert Peharz, Hong Ge, Franz Pernkopf, Zoubin Ghahramani

Abstract: Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning princip… ▽ More Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning principle. In this paper, we introduce a well-principled Bayesian framework for SPN structure learning. First, we decompose the problem into i) laying out a computational graph, and ii) learning the so-called scope function over the graph. The first is rather unproblematic and akin to neural network architecture validation. The second represents the effective structure of the SPN and needs to respect the usual structural constraints in SPN, i.e. completeness and decomposability. While representing and learning the scope function is somewhat involved in general, in this paper, we propose a natural parametrisation for an important and widely used special case of SPNs. These structural parameters are incorporated into a Bayesian model, such that simultaneous structure and parameter learning is cast into monolithic Bayesian posterior inference. In various experiments, our Bayesian SPNs often improve test likelihoods over greedy SPN learners. Further, since the Bayesian framework protects against overfitting, we can evaluate hyper-parameters directly on the Bayesian model score, waiving the need for a separate validation set, which is especially beneficial in low data regimes. Bayesian SPNs can be applied to heterogeneous domains and can easily be extended to nonparametric formulations. Moreover, our Bayesian approach is the first, which consistently and robustly learns SPN structures under missing data. △ Less

Submitted 4 November, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019; See conference page for supplement

arXiv:1905.08196 [pdf, other]

Optimisation of Overparametrized Sum-Product Networks

Authors: Martin Trapp, Robert Peharz, Franz Pernkopf

Abstract: It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleratio… ▽ More It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleration compared to their shallow counterpart. In fact, gradient-based optimisation in deep tree-structured sum-product networks is equal to gradient ascend with adaptive and time-varying learning rates and additional momentum terms. △ Less

Submitted 29 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: Workshop on Tractable Probabilistic Models (TPM) at ICML 2019

arXiv:1812.02240 [pdf, other]

Efficient and Robust Machine Learning for Real-World Systems

Authors: Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Guenther Schindler, Holger Froening, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani

Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. On top of this, it is crucial to treat uncertainty in a consisten… ▽ More While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. On top of this, it is crucial to treat uncertainty in a consistent manner in all but the simplest applications of machine learning systems. In particular, a desideratum for any real-world system is to be robust in the presence of outliers and corrupted data, as well as being `aware' of its limits, i.e.\ the system should maintain and provide an uncertainty estimate over its own predictions. These complex demands are among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology into every day's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. First we provide a comprehensive review of resource-efficiency in deep neural networks with focus on techniques for model size reduction, compression and reduced precision. These techniques can be applied during training or as post-processing and are widely used to reduce both computational complexity and memory footprint. As most (practical) neural networks are limited in their ways to treat uncertainty, we contrast them with probabilistic graphical models, which readily serve these desiderata by means of probabilistic inference. In that way, we provide an extensive overview of the current state-of-the-art of robust and efficient machine learning for real-world systems. △ Less

Submitted 5 December, 2018; originally announced December 2018.

arXiv:1812.01339 [pdf, other]

Self-Guided Belief Propagation -- A Homotopy Continuation Method

Authors: Christian Knoll, Adrian Weller, Franz Pernkopf

Abstract: Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal anal… ▽ More Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal analysis to demonstrate that SBP finds the global optimum of the Bethe approximation for attractive models where all variables favor the same state. Moreover, we apply SBP to various graphs with random potentials and empirically show that: (i) SBP is superior in terms of accuracy whenever BP converges, and (ii) SBP obtains a unique, stable, and accurate solution whenever BP does not converge. △ Less

Submitted 19 March, 2021; v1 submitted 4 December, 2018; originally announced December 2018.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1810.06897 [pdf, other]

Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and Self-Adaptive Label Refinement

Authors: Robert Harb, Franz Pernkopf

Abstract: In this paper, we present a gated convolutional recurrent neural network based approach to solve task 4, large-scale weakly labelled semi-supervised sound event detection in domestic environments, of the DCASE 2018 challenge. Gated linear units and a temporal attention layer are used to predict the onset and offset of sound events in 10s long audio clips. Whereby for training only weakly-labelled… ▽ More In this paper, we present a gated convolutional recurrent neural network based approach to solve task 4, large-scale weakly labelled semi-supervised sound event detection in domestic environments, of the DCASE 2018 challenge. Gated linear units and a temporal attention layer are used to predict the onset and offset of sound events in 10s long audio clips. Whereby for training only weakly-labelled data is used. Virtual adversarial training is used for regularization, utilizing both labelled and unlabeled data. Furthermore, we introduce self-adaptive label refinement, a method which allows unsupervised adaption of our trained system to refine the accuracy of frame-level class predictions. The proposed system reaches an overall macro averaged event-based F-score of 34.6%, resulting in a relative improvement of 20.5% over the baseline system. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: Accepted at DCASE 2018 Workshop for oral presentation

arXiv:1809.04400 [pdf, other]

Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks

Authors: Martin Trapp, Robert Peharz, Carl E. Rasmussen, Franz Pernkopf

Abstract: While Gaussian processes (GPs) are the method of choice for regression tasks, they also come with practical difficulties, as inference cost scales cubic in time and quadratic in memory. In this paper, we introduce a natural and expressive way to tackle these problems, by incorporating GPs in sum-product networks (SPNs), a recently proposed tractable probabilistic model allowing exact and efficient… ▽ More While Gaussian processes (GPs) are the method of choice for regression tasks, they also come with practical difficulties, as inference cost scales cubic in time and quadratic in memory. In this paper, we introduce a natural and expressive way to tackle these problems, by incorporating GPs in sum-product networks (SPNs), a recently proposed tractable probabilistic model allowing exact and efficient inference. In particular, by using GPs as leaves of an SPN we obtain a novel flexible prior over functions, which implicitly represents an exponentially large mixture of local GPs. Exact and efficient posterior inference in this model can be done in a natural interplay of the inference mechanisms in GPs and SPNs. Thereby, each GP is -- similarly as in a mixture of experts approach -- responsible only for a subset of data points, which effectively reduces inference cost in a divide and conquer fashion. We show that integrating GPs into the SPN framework leads to a promising probabilistic regression model which is: (1) computational and memory efficient, (2) allows efficient and exact posterior inference, (3) is flexible enough to mix different kernel functions, and (4) naturally accounts for non-stationarities in time series. In a variate of experiments, we show that the SPN-GP model can learn input dependent parameters and hyper-parameters and is on par with or outperforms the traditional GPs as well as state of the art approximations on real-world data. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: Presented at the Workshop on Tractable Probabilistic Models (TPM 2018), ICML 2018

arXiv:1807.02324 [pdf, other]

Sum-Product Networks for Sequence Labeling

Authors: Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf

Abstract: We consider higher-order linear-chain conditional random fields (HO-LC-CRFs) for sequence modelling, and use sum-product networks (SPNs) for representing higher-order input- and output-dependent factors. SPNs are a recently introduced class of deep models for which exact and efficient inference can be performed. By combining HO-LC-CRFs with SPNs, expressive models over both the output labels and t… ▽ More We consider higher-order linear-chain conditional random fields (HO-LC-CRFs) for sequence modelling, and use sum-product networks (SPNs) for representing higher-order input- and output-dependent factors. SPNs are a recently introduced class of deep models for which exact and efficient inference can be performed. By combining HO-LC-CRFs with SPNs, expressive models over both the output labels and the hidden variables are instantiated while still enabling efficient exact inference. Furthermore, the use of higher-order factors allows us to capture relations of multiple input segments and multiple output labels as often present in real-world data. These relations can not be modelled by the commonly used first-order models and higher-order models with local factors including only a single output label. We demonstrate the effectiveness of our proposed models for sequence labeling. In extensive experiments, we outperform other state-of-the-art methods in optical character recognition and achieve competitive results in phone classification. △ Less

Submitted 6 July, 2018; originally announced July 2018.

arXiv:1806.00981 [pdf, other]

Automatic Clustering of a Network Protocol with Weakly-Supervised Clustering

Authors: Tobias Schrank, Franz Pernkopf

Abstract: Abstraction is a fundamental part when learning behavioral models of systems. Usually the process of abstraction is manually defined by domain experts. This paper presents a method to perform automatic abstraction for network protocols. In particular a weakly supervised clustering algorithm is used to build an abstraction with a small vocabulary size for the widely used TLS protocol. To show the e… ▽ More Abstraction is a fundamental part when learning behavioral models of systems. Usually the process of abstraction is manually defined by domain experts. This paper presents a method to perform automatic abstraction for network protocols. In particular a weakly supervised clustering algorithm is used to build an abstraction with a small vocabulary size for the widely used TLS protocol. To show the effectiveness of the proposed method we compare the resultant abstract messages to a manually constructed (reference) abstraction. With a small amount of side-information in the form of a few labeled examples this method finds an abstraction that matches the reference abstraction perfectly. △ Less

Submitted 4 June, 2018; originally announced June 2018.

arXiv:1710.03444 [pdf, other]

Safe Semi-Supervised Learning of Sum-Product Networks

Authors: Martin Trapp, Tamas Madl, Robert Peharz, Franz Pernkopf, Robert Trappl

Abstract: In several domains obtaining class annotations is expensive while at the same time unlabelled data are abundant. While most semi-supervised approaches enforce restrictive assumptions on the data distribution, recent work has managed to learn semi-supervised models in a non-restrictive regime. However, so far such approaches have only been proposed for linear models. In this work, we introduce semi… ▽ More In several domains obtaining class annotations is expensive while at the same time unlabelled data are abundant. While most semi-supervised approaches enforce restrictive assumptions on the data distribution, recent work has managed to learn semi-supervised models in a non-restrictive regime. However, so far such approaches have only been proposed for linear models. In this work, we introduce semi-supervised parameter learning for Sum-Product Networks (SPNs). SPNs are deep probabilistic models admitting inference in linear time in number of network edges. Our approach has several advantages, as it (1) allows generative and discriminative semi-supervised learning, (2) guarantees that adding unlabelled data can increase, but not degrade, the performance (safe), and (3) is computationally efficient and does not enforce restrictive assumptions on the data distribution. We show on a variety of data sets that safe semi-supervised learning with SPNs is competitive compared to state-of-the-art and can lead to a better generative and discriminative objective value than a purely supervised approach. △ Less

Submitted 10 October, 2017; originally announced October 2017.

Comments: Conference on Uncertainty in Artificial Intelligence (UAI), 2017

arXiv:1605.06451 [pdf, other]

Fixed Points of Belief Propagation -- An Analysis via Polynomial Homotopy Continuation

Authors: Christian Knoll, Franz Pernkopf, Dhagash Mehta, Tianran Chen

Abstract: Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both the structure and the parametrization of the model. To understand this dependence it is interesting to find \emph{all} fixed points. In this work, we formulate a set of polynomial equations, the solutions of… ▽ More Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both the structure and the parametrization of the model. To understand this dependence it is interesting to find \emph{all} fixed points. In this work, we formulate a set of polynomial equations, the solutions of which correspond to BP fixed points. To solve such a nonlinear system we present the numerical polynomial-homotopy-continuation (NPHC) method. Experiments on binary Ising models and on error-correcting codes show how our method is capable of obtaining all BP fixed points. On Ising models with fixed parameters we show how the structure influences both the number of fixed points and the convergence properties. We further asses the accuracy of the marginals and weighted combinations thereof. Weighting marginals with their respective partition function increases the accuracy in all experiments. Contrary to the conjecture that uniqueness of BP fixed points implies convergence, we find graphs for which BP fails to converge, even though a unique fixed point exists. Moreover, we show that this fixed point gives a good approximation, and the NPHC method is able to obtain this fixed point. △ Less

Submitted 30 May, 2017; v1 submitted 20 May, 2016; originally announced May 2016.

arXiv:1601.06180 [pdf, ps, other]

On the Latent Variable Interpretation in Sum-Product Networks

Authors: Robert Peharz, Robert Gens, Franz Pernkopf, Pedro Domingos

Abstract: One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explicitly introducing the indicator variables correspon… ▽ More One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explicitly introducing the indicator variables corresponding to the LVs' states. However, as pointed out in this paper, this approach is in conflict with the completeness condition in SPNs and does not fully specify the probabilistic model. We propose a remedy for this problem by modifying the original approach for introducing the LVs, which we call SPN augmentation. We discuss conditional independencies in augmented SPNs, formally establish the probabilistic interpretation of the sum-weights and give an interpretation of augmented SPNs as Bayesian networks. Based on these results, we find a sound derivation of the EM algorithm for SPNs. Furthermore, the Viterbi-style algorithm for MPE proposed in literature was never proven to be correct. We show that this is indeed a correct algorithm, when applied to selective SPNs, and in particular when applied to augmented SPNs. Our theoretical results are confirmed in experiments on synthetic data and 103 real-world datasets. △ Less

Submitted 28 October, 2016; v1 submitted 22 January, 2016; originally announced January 2016.

Comments: Revised version, accepted for publication in IEEE Transactions on Machine Intelligence and Pattern Analysis (TPAMI). Shortened and revised Section 4: Thanks to our reviewers, pointing out that Theorem 2 holds for selective SPNs. Added paragraph in Section 2.1, relating sizes of original/augmented SPNs. Fixed typos, rephrased sentences, revised references

MSC Class: 62

arXiv:1206.6431 [pdf]

Exact Maximum Margin Structure Learning of Bayesian Networks

Authors: Robert Peharz, Franz Pernkopf

Abstract: Recently, there has been much interest in finding globally optimal Bayesian network structures. These techniques were developed for generative scores and can not be directly extended to discriminative scores, as desired for classification. In this paper, we propose an exact method for finding network structures maximizing the probabilistic soft margin, a successfully applied discriminative score.… ▽ More Recently, there has been much interest in finding globally optimal Bayesian network structures. These techniques were developed for generative scores and can not be directly extended to discriminative scores, as desired for classification. In this paper, we propose an exact method for finding network structures maximizing the probabilistic soft margin, a successfully applied discriminative score. Our method is based on branch-and-bound techniques within a linear programming framework and maintains an any-time solution, together with worst-case sub-optimality bounds. We apply a set of order constraints for enforcing the network structure to be acyclic, which allows a compact problem representation and the use of general-purpose optimization techniques. In classification experiments, our methods clearly outperform generatively trained network structures and compete with support vector machines. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: ICML

Showing 1–42 of 42 results for author: Pernkopf, F