-
Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers
Authors:
Mohammad Javad Askarizadeh,
Ebrahim Farahmand,
Jorge Castro-Godinez,
Ali Mahani,
Laura Cabrera-Quiros,
Carlos Salazar-Garcia
Abstract:
Deep Neural Networks (DNNs) have advanced in many real-world applications, such as healthcare and autonomous driving. However, their high computational complexity and vulnerability to adversarial attacks are ongoing challenges. In this letter, approximate multipliers are used to explore DNN robustness improvement against adversarial attacks. By uniformly replacing accurate multipliers for state-of…
▽ More
Deep Neural Networks (DNNs) have advanced in many real-world applications, such as healthcare and autonomous driving. However, their high computational complexity and vulnerability to adversarial attacks are ongoing challenges. In this letter, approximate multipliers are used to explore DNN robustness improvement against adversarial attacks. By uniformly replacing accurate multipliers for state-of-the-art approximate ones in DNN layer models, we explore the DNNs robustness against various adversarial attacks in a feasible time. Results show up to 7% accuracy drop due to approximations when no attack is present while improving robust accuracy up to 10% when attacks applied.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Exploration of Activation Fault Reliability in Quantized Systolic Array-Based DNN Accelerators
Authors:
Mahdi Taheri,
Natalia Cherezova,
Mohammad Saeed Ansari,
Maksim Jenihhin,
Ali Mahani,
Masoud Daneshtalab,
Jaan Raik
Abstract:
The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability stand along with the need for reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Moreover, the growing demand for specialized DNN accelerators with tailored requirements, particularly for…
▽ More
The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability stand along with the need for reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Moreover, the growing demand for specialized DNN accelerators with tailored requirements, particularly for safety-critical applications, necessitates a comprehensive design space exploration to enable the development of efficient and robust accelerators that meet those requirements. Therefore, the trade-off between hardware performance, i.e. area and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis. This paper presents a comprehensive methodology for exploring and enabling a holistic assessment of the trilateral impact of quantization on model accuracy, activation fault reliability, and hardware efficiency. A fully automated framework is introduced that is capable of applying various quantization-aware techniques, fault injection, and hardware implementation, thus enabling the measurement of hardware parameters. Moreover, this paper proposes a novel lightweight protection technique integrated within the framework to ensure the dependable deployment of the final systolic-array-based FPGA implementation. The experiments on established benchmarks demonstrate the analysis flow and the profound implications of quantization on reliability, hardware performance, and network accuracy, particularly concerning the transient faults in the network's activations.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
A Novel Fault-Tolerant Logic Style with Self-Checking Capability
Authors:
Mahdi Taheri,
Saeideh Sheikhpour,
Ali Mahani,
Maksim Jenihhin
Abstract:
We introduce a novel logic style with self-checking capability to enhance hardware reliability at logic level. The proposed logic cells have two-rail inputs/outputs, and the functionality for each rail of outputs enables construction of faulttolerant configurable circuits. The AND and OR gates consist of 8 transistors based on CNFET technology, while the proposed XOR gate benefits from both CNFET…
▽ More
We introduce a novel logic style with self-checking capability to enhance hardware reliability at logic level. The proposed logic cells have two-rail inputs/outputs, and the functionality for each rail of outputs enables construction of faulttolerant configurable circuits. The AND and OR gates consist of 8 transistors based on CNFET technology, while the proposed XOR gate benefits from both CNFET and low-power MGDI technologies in its transistor arrangement. To demonstrate the feasibility of our new logic gates, we used an AES S-box implementation as the use case. The extensive simulation results using HSPICE indicate that the case-study circuit using on proposed gates has superior speed and power consumption compared to other implementations with error-detection capability
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
LRDB: LSTM Raw data DNA Base-caller based on long-short term models in an active learning environment
Authors:
Ahmad Rezaei,
Mahdi Taheri,
Ali Mahani,
Sebastian Magierowski
Abstract:
The first important step in extracting DNA characters is using the output data of MinION devices in the form of electrical current signals. Various cutting-edge base callers use this data to detect the DNA characters based on the input. In this paper, we discuss several shortcomings of prior base callers in the case of time-critical applications, privacy-aware design, and the problem of catastroph…
▽ More
The first important step in extracting DNA characters is using the output data of MinION devices in the form of electrical current signals. Various cutting-edge base callers use this data to detect the DNA characters based on the input. In this paper, we discuss several shortcomings of prior base callers in the case of time-critical applications, privacy-aware design, and the problem of catastrophic forgetting. Next, we propose the LRDB model, a lightweight open-source model for private developments with a better read-identity (0.35% increase) for the target bacterial samples in the paper. We have limited the extent of training data and benefited from the transfer learning algorithm to make the active usage of the LRDB viable in critical applications. Henceforth, less training time for adapting to new DNA samples (in our case, Bacterial samples) is needed. Furthermore, LRDB can be modified concerning the user constraints as the results show a negligible accuracy loss in case of using fewer parameters. We have also assessed the noise-tolerance property, which offers about a 1.439% decline in accuracy for a 15dB noise injection, and the performance metrics show that the model executes in a medium speed range compared with current cutting-edge models.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
An Edge-based WiFi Fingerprinting Indoor Localization Using Convolutional Neural Network and Convolutional Auto-Encoder
Authors:
Amin Kargar-Barzi,
Ebrahim Farahmand,
Nooshin Taheri Chatrudi,
Ali Mahani,
Muhammad Shafique
Abstract:
With the ongoing development of Indoor Location-Based Services, the location information of users in indoor environments has been a challenging issue in recent years. Due to the widespread use of WiFi networks, WiFi fingerprinting has become one of the most practical methods of locating mobile users. In addition to localization accuracy, some other critical factors such as latency, and users' priv…
▽ More
With the ongoing development of Indoor Location-Based Services, the location information of users in indoor environments has been a challenging issue in recent years. Due to the widespread use of WiFi networks, WiFi fingerprinting has become one of the most practical methods of locating mobile users. In addition to localization accuracy, some other critical factors such as latency, and users' privacy should be considered in indoor localization systems. In this study, we propose a light Convolutional Neural Network-based method for edge devices (e.g. smartphones) to overcome the above issues by eliminating the need for a cloud/server in the localization system. The proposed method is evaluated for three different open datasets, i.e., UJIIndoorLoc, Tampere and UTSIndoorLoc, as well as for our collected dataset named SBUK-D to verify its scalability. We also evaluate performance efficiency of our localization method on an Android smartphone to demonstrate its applicability to edge devices. For UJIIndoorLoc dataset, our model obtains approximately 99% building accuracy, over 90% floor accuracy, and 9.5 m positioning mean error with the model size and inference time of 0.5 MB and 51 us, respectively, which demonstrate high accuracy in range of state of the art works as well as amenability to the resource-constrained edge devices.
△ Less
Submitted 4 June, 2024; v1 submitted 7 March, 2023;
originally announced March 2023.
-
scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation
Authors:
Ebrahim Farahmand,
Ali Mahani,
Behnam Ghavami,
Muhammad Abdullah Hanif,
Muhammad Shafique
Abstract:
Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on th…
▽ More
Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on their leading one positions. Then, a curve-fitting method is employed to map the product term to a linear function, and a piecewise constant error-correction term is used to reduce the approximation error. For computing the piecewise constant error-compensation term, we partition the function space into M segments and compute the compensation factor for each segment by averaging the errors in the segment. The multiplier supports various degrees of truncation and error-compensation to exploit accuracy-efficiency trade-off. The proposed approximate multiplier offers better error metrics such as mean and standard deviation of absolute relative error (MARED and StdARED) compare to a state-of-the-art integer approximate multiplier. The proposed approximate multiplier improves the MARED and StdARED by about 38% and 32% when its energy consumption is about equal to the state-of-the-art approximate multiplier. Moreover, the performance of the proposed approximate multiplier is evaluated in image classification applications using a Deep Neural Network (DNN). The results indicate that the degradation of DNN accuracy is negligible especially due to the compensation properties of our approximate multiplier.
△ Less
Submitted 4 May, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
Design and Analysis of High Performance Heterogeneous Block-based Approximate Adders
Authors:
Ebrahim Farahmand,
Ali Mahani,
Muhammad Abdullah Hanif,
Muhammad Shafique
Abstract:
Approximate computing is an emerging paradigm to improve the power and performance efficiency of error-resilient applications. As adders are one of the key components in almost all processing systems, a significant amount of research has been carried out towards designing approximate adders that can offer better efficiency than conventional designs, however, at the cost of some accuracy loss. In t…
▽ More
Approximate computing is an emerging paradigm to improve the power and performance efficiency of error-resilient applications. As adders are one of the key components in almost all processing systems, a significant amount of research has been carried out towards designing approximate adders that can offer better efficiency than conventional designs, however, at the cost of some accuracy loss. In this paper, we highlight a new class of energy-efficient approximate adders, namely Heterogeneous Block-based Approximate Adders (HBAA), and propose a generic configurable adder model that can be configured to represent a particular HBAA configuration. An HBAA, in general, is composed of heterogeneous sub-adder blocks of equal length, where each sub-adder can be an approximate sub-adder and have a different configuration. The sub-adders are mainly approximated through inexact logic and carry truncation. Compared to the existing design space, HBAAs provide additional design points that fall on the Pareto-front and offer a better quality-efficiency trade-off in certain scenarios. Furthermore, to enable efficient design space exploration based on user-defined constraints, we propose an analytical model to efficiently evaluate the Probability Mass Function (PMF) of approximation error and other error metrics, such as Mean Error Distance (MED), Normalized Mean Error Distance (NMED) and Error Rate (ER) of HBAAs. The results show that HBAA configurations can provide around 15% reduction in area and up to 17% reduction in energy compared to state-of-the-art approximate adders.
△ Less
Submitted 14 September, 2023; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Quantum Annealing Continuous Optimisation in Renewable Energy
Authors:
Mansour T. A. Sharabiani,
Vibe B. Jakobsen,
Martin Jeppesen,
Alireza S. Mahani
Abstract:
Renewable energy optimisation poses computationally-intensive challenges. Yet, often the continuous nature of the decision space precludes the use of many emerging, non-von-Neumann computing platforms such as quantum annealing, which are limited to discrete problems. We propose Quantum Annealing Continuous Optimisation (QuAnCO), a Trust Region (TR)-based algorithm, where the TR Newton sub-problem…
▽ More
Renewable energy optimisation poses computationally-intensive challenges. Yet, often the continuous nature of the decision space precludes the use of many emerging, non-von-Neumann computing platforms such as quantum annealing, which are limited to discrete problems. We propose Quantum Annealing Continuous Optimisation (QuAnCO), a Trust Region (TR)-based algorithm, where the TR Newton sub-problem is transformed into Quadratic Unconstrained Binary Optimisation (QUBO), thereby allowing the use of Ising solvers such as D-Wave's quantum annealer. This transformation to QUBO is done by 1) using a hyper-rectangular shape for the TR, 2) discrete representation of each continuous dimension using an interval-bounded integer, and 3) binary encoding of the resulting bounded integers. We tackle a real-world challenge of optimising the biomass mix selection for Nature Energy, the largest biogas producer in Europe, thus providing evidence of feasibility and performance advantage in using QuAnCO in green energy production, and beyond.
△ Less
Submitted 2 April, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
A high-performance MEMRISTOR-based Smith-Waterman DNA sequence alignment Using FPNI structure
Authors:
Mahdi Taheri,
Hamed Zandevakili,
Ali Mahani
Abstract:
This paper aims to present a new re-configuration sequencing method for difference of read lengths that may take place as input data in which is crucial drawbacks lay impact on DNA sequencing methods.
This paper aims to present a new re-configuration sequencing method for difference of read lengths that may take place as input data in which is crucial drawbacks lay impact on DNA sequencing methods.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
DMR-based Technique for Fault Tolerant AES S-box Architecture
Authors:
Mahdi Taheri,
Saeideh Sheikhpour,
Mohammad Saeed Ansari,
Ali Mahani
Abstract:
This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. If a transient natural or even malicious fault in each pipeline stage is detected, the corresponding error signal becomes high and as a result, the control unit holds the output of our proposed DMR voter till the fault effect disappears. The proposed low-cost HFS-box provides a high capabili…
▽ More
This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. If a transient natural or even malicious fault in each pipeline stage is detected, the corresponding error signal becomes high and as a result, the control unit holds the output of our proposed DMR voter till the fault effect disappears. The proposed low-cost HFS-box provides a high capability of fault-tolerant against transient faults with any duration by putting low area overhead, i.e. 137%, and low throughput degradation, i.e. 11.3%, on the original implementation.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Combining matching and linear regression: Introducing a mathematical framework and software for simulations, diagnostics and calibration
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
Combining matching and regression for causal inference provides double-robustness in removing treatment effect estimation bias due to confounding variables. In most real-world applications, however, treatment and control populations are not large enough for matching to achieve perfect or near-perfect balance on all confounding variables and their nonlinear/interaction functions, leading to trade-o…
▽ More
Combining matching and regression for causal inference provides double-robustness in removing treatment effect estimation bias due to confounding variables. In most real-world applications, however, treatment and control populations are not large enough for matching to achieve perfect or near-perfect balance on all confounding variables and their nonlinear/interaction functions, leading to trade-offs. [this fact is independent of regression, so a bit disjointed from first sentence.] Furthermore, variance is as important of a contributor as bias towards total error in small samples, and must therefore be factored into the methodological decisions. In this paper, we develop a mathematical framework for quantifying the combined impact of matching and linear regression on bias and variance of treatment effect estimation. The framework includes expressions for bias and variance in a misspecified linear regression, theorems regarding impact of matching on bias and variance, and a constrained bias estimation approach for quantifying misspecification bias and combining it with variance to arrive at total error. Methodological decisions can thus be based on minimization of this total error, given the practitioner's assumption/belief about an intuitive parameter, which we call `omitted R-squared'. The proposed methodology excludes the outcome variable from analysis, thereby avoiding overfit creep and making it suitable for observational study designs. All core functions for bias and variance calculation, as well as diagnostic tools for bias-variance trade-off analysis, matching calibration, and power analysis are made available to researchers and practitioners through an open-source R library, MatchLinReg.
△ Less
Submitted 11 July, 2015;
originally announced July 2015.
-
Stochastic Newton Sampler: R Package sns
Authors:
Alireza S. Mahani,
Asad Hasan,
Marshall Jiang,
Mansour T. A. Sharabiani
Abstract:
The R package sns implements Stochastic Newton Sampler (SNS), a Metropolis-Hastings Monte Carlo Markov Chain algorithm where the proposal density function is a multivariate Gaussian based on a local, second-order Taylor series expansion of log-density. The mean of the proposal function is the full Newton step in Newton-Raphson optimization algorithm. Taking advantage of the local, multivariate geo…
▽ More
The R package sns implements Stochastic Newton Sampler (SNS), a Metropolis-Hastings Monte Carlo Markov Chain algorithm where the proposal density function is a multivariate Gaussian based on a local, second-order Taylor series expansion of log-density. The mean of the proposal function is the full Newton step in Newton-Raphson optimization algorithm. Taking advantage of the local, multivariate geometry captured in log-density Hessian allows SNS to be more efficient than univariate samplers, approaching independent sampling as the density function increasingly resembles a multivariate Gaussian. SNS requires the log-density Hessian to be negative-definite everywhere in order to construct a valid proposal function. This property holds, or can be easily checked, for many GLM-like models. When initial point is far from density peak, running SNS in non-stochastic mode by taking the Newton step, augmented with with line search, allows the MCMC chain to converge to high-density areas faster. For high-dimensional problems, partitioning of state space into lower-dimensional subsets, and applying SNS to the subsets within a Gibbs sampling framework can significantly improve the mixing of SNS chains. In addition to the above strategies for improving convergence and mixing, sns offers diagnostics and visualization capabilities, as well as a function for sample-based calculation of Bayesian predictive posterior distributions.
△ Less
Submitted 6 February, 2015;
originally announced February 2015.
-
Expander Framework for Generating High-Dimensional GLM Gradient and Hessian from Low-Dimensional Base Distributions: R Package RegressionFactory
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
The R package RegressionFactory provides expander functions for constructing the high-dimensional gradient vector and Hessian matrix of the log-likelihood function for generalized linear models (GLMs), from the lower-dimensional base-distribution derivatives. The software follows a modular implementation using the chain rule of derivatives. Such modularity offers a clear separation of case-specifi…
▽ More
The R package RegressionFactory provides expander functions for constructing the high-dimensional gradient vector and Hessian matrix of the log-likelihood function for generalized linear models (GLMs), from the lower-dimensional base-distribution derivatives. The software follows a modular implementation using the chain rule of derivatives. Such modularity offers a clear separation of case-specific components (base distribution functional form and link functions) from common steps (e.g., matrix algebra operations needed for expansion) in calculating log-likelihood derivatives. In doing so, RegressionFactory offers several advantages: 1) It provides a fast and convenient method for constructing log-likelihood and its derivatives by requiring only the low-dimensional, base-distribution derivatives, 2) The accompanying definiteness-invariance theorem allows researchers to reason about the negative-definiteness of the log-likelihood Hessian in the much lower-dimensional space of the base distributions, 3) The factorized, abstract view of regression suggests opportunities to generate novel regression models, and 4) Computational techniques for performance optimization can be developed generically in the abstract framework and be readily applicable across all the specific regression instances. We expect RegressionFactory to facilitate research and development on optimization and sampling techniques for GLM log-likelihoods as well as construction of composite models from GLM lego blocks, such as Hierarchical Bayesian models.
△ Less
Submitted 24 January, 2015;
originally announced January 2015.
-
Multivariate-from-Univariate MCMC Sampler: R Package MfUSampler
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
The R package MfUSampler provides Monte Carlo Markov Chain machinery for generating samples from multivariate probability distributions using univariate sampling algorithms such as Slice Sampler and Adaptive Rejection Sampler. The sampler function performs a full cycle of univariate sampling steps, one coordinate at a time. In each step, the latest sample values obtained for other coordinates are…
▽ More
The R package MfUSampler provides Monte Carlo Markov Chain machinery for generating samples from multivariate probability distributions using univariate sampling algorithms such as Slice Sampler and Adaptive Rejection Sampler. The sampler function performs a full cycle of univariate sampling steps, one coordinate at a time. In each step, the latest sample values obtained for other coordinates are used to form the conditional distributions. The concept is an extension of Gibbs sampling where each step involves, not an independent sample from the conditional distribution, but a Markov transition for which the conditional distribution is invariant. The software relies on proportionality of conditional distributions to the joint distribution to implement a thin wrapper for producing conditionals. Examples illustrate basic usage as well as methods for improving performance. By encapsulating the multivariate-from-univariate logic, MfUSampler provides a reliable library for rapid prototy** of custom Bayesian models while allowing for incremental performance optimizations such as utilization of conjugacy, conditional independence, and porting function evaluations to compiled languages.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Efficient SIMD RNG for Varying-Parameter Streams: C++ Class BatchRNG
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
Single-Instruction, Multiple-Data (SIMD) random number generators (RNGs) take advantage of vector units to offer significant performance gain over non-vectorized libraries, but they often rely on batch production of deviates from distributions with fixed parameters. In many statistical applications such as Gibbs sampling, parameters of sampled distributions change from one iteration to the next, r…
▽ More
Single-Instruction, Multiple-Data (SIMD) random number generators (RNGs) take advantage of vector units to offer significant performance gain over non-vectorized libraries, but they often rely on batch production of deviates from distributions with fixed parameters. In many statistical applications such as Gibbs sampling, parameters of sampled distributions change from one iteration to the next, requiring that random deviates be generated one-at-a-time. This situation can render vectorized RNGs inefficient, and even inferior to their scalar counterparts. The C++ class BatchRNG uses buffers of base distributions such uniform, Gaussian and exponential to take advantage of vector units while allowing for sequences of deviates to be generated with varying parameters. These small buffers are consumed and replenished as needed during a program execution. Performance tests using Intel Vector Statistical Library (VSL) on various probability distributions illustrates the effectiveness of the proposed batching strategy.
△ Less
Submitted 15 December, 2014;
originally announced December 2014.
-
Fast Estimation of Multinomial Logit Models: R Package mnlogit
Authors:
Asad Hasan,
Wang Zhiyu,
Alireza S. Mahani
Abstract:
We present R package mnlogit for training multinomial logistic regression models, particularly those involving a large number of classes and features. Compared to existing software, mnlogit offers speedups of 10x-50x for modestly sized problems and more than 100x for larger problems. Running mnlogit in parallel mode on a multicore machine gives an additional 2x-4x speedup on up to 8 processor core…
▽ More
We present R package mnlogit for training multinomial logistic regression models, particularly those involving a large number of classes and features. Compared to existing software, mnlogit offers speedups of 10x-50x for modestly sized problems and more than 100x for larger problems. Running mnlogit in parallel mode on a multicore machine gives an additional 2x-4x speedup on up to 8 processor cores. Computational efficiency is achieved by drastically speeding up calculation of the log-likelihood function's Hessian matrix by exploiting structure in matrices that arise in intermediate calculations.
△ Less
Submitted 16 September, 2014; v1 submitted 11 April, 2014;
originally announced April 2014.
-
SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as MCMC sampling, and the development of general strategies for map** such parallel algorithms to modern CPU…
▽ More
Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as MCMC sampling, and the development of general strategies for map** such parallel algorithms to modern CPUs in order to elicit the performance up the compute-based and/or memory-based hardware limits. Two opportunities for Single-Instruction Multiple-Data (SIMD) parallelization of MCMC sampling for probabilistic graphical models are presented. In exchangeable models with many observations such as Bayesian Generalized Linear Models, child-node contributions to the conditional posterior of each node can be calculated concurrently. In undirected graphs with discrete nodes, concurrent sampling of conditionally-independent nodes can be transformed into a SIMD form. High-performance libraries with multi-threading and vectorization capabilities can be readily applied to such SIMD opportunities to gain decent speedup, while a series of high-level source-code and runtime modifications provide further performance boost by reducing parallelization overhead and increasing data locality for NUMA architectures. For big-data Bayesian GLM graphs, the end-result is a routine for evaluating the conditional posterior and its gradient vector that is 5 times faster than a naive implementation using (built-in) multi-threaded Intel MKL BLAS, and reaches within the striking distance of the memory-bandwidth-induced hardware limit. The proposed optimization strategies improve the scaling of performance with number of cores and width of vector units (applicable to many-core SIMD processors such as Intel Xeon Phi and GPUs), resulting in cost-effectiveness, energy efficiency, and higher speed on multi-core x86 processors.
△ Less
Submitted 19 November, 2014; v1 submitted 6 October, 2013;
originally announced October 2013.
-
Metropolis-Hastings Sampling Using Multivariate Gaussian Tangents
Authors:
Alireza S. Mahani,
Mansour T. A. Sharabiani
Abstract:
We present MH-MGT, a multivariate technique for sampling from twice-differentiable, log-concave probability density functions. MH-MGT is Metropolis-Hastings sampling using asymmetric, multivariate Gaussian proposal functions constructed from Taylor-series expansion of the log-density function. The mean of the Gaussian proposal function represents the full Newton step, and thus MH-MGT is the stocha…
▽ More
We present MH-MGT, a multivariate technique for sampling from twice-differentiable, log-concave probability density functions. MH-MGT is Metropolis-Hastings sampling using asymmetric, multivariate Gaussian proposal functions constructed from Taylor-series expansion of the log-density function. The mean of the Gaussian proposal function represents the full Newton step, and thus MH-MGT is the stochastic counterpart to Newton optimization. Convergence analysis shows that MH-MGT is well suited for sampling from computationally-expensive log-densities with contributions from many independent observations. We apply the technique to Gibbs sampling analysis of a Hierarchical Bayesian marketing effectiveness model built for a large US foodservice distributor. Compared to univariate slice sampling, MH-MGT shows 6x improvement in sampling efficiency, measured in terms of `function evaluation equivalents per independent sample'. To facilitate wide applicability of MH-MGT to statistical models, we prove that log-concavity of a twice-differentiable distribution is invariant with respect to 'linear-projection' transformations including, but not restricted to, generalized linear models.
△ Less
Submitted 2 August, 2013;
originally announced August 2013.