Search | arXiv e-print repository

Scavenging Hyena: Distilling Transformers into Long Convolution Models

Authors: Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Mohammad Sami Nur Islam, Wassim Jabbour, Laurence Liang

Abstract: The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, ou… ▽ More The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, our method replaces attention heads in transformer models by Hyena, offering a cost-effective alternative to traditional pre-training while confronting the challenge of processing long contextual information, inherent in quadratic attention mechanisms. Unlike conventional compression-focused methods, our technique not only enhances inference speed but also surpasses pre-training in terms of both accuracy and efficiency. In the era of evolving LLMs, our work contributes to the pursuit of sustainable AI solutions, striking a balance between computational power and environmental impact. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 9 pages, 2 figures

arXiv:2310.13791 [pdf]

Comparative Analysis of Machine Learning Algorithms for Solar Irradiance Forecasting in Smart Grids

Authors: Saman Soleymani, Shima Mohammadzadeh

Abstract: The increasing global demand for clean and environmentally friendly energy resources has caused increased interest in harnessing solar power through photovoltaic (PV) systems for smart grids and homes. However, the inherent unpredictability of PV generation poses problems associated with smart grid planning and management, energy trading and market participation, demand response, reliability, etc.… ▽ More The increasing global demand for clean and environmentally friendly energy resources has caused increased interest in harnessing solar power through photovoltaic (PV) systems for smart grids and homes. However, the inherent unpredictability of PV generation poses problems associated with smart grid planning and management, energy trading and market participation, demand response, reliability, etc. Therefore, solar irradiance forecasting is essential for optimizing PV system utilization. This study proposes the next-generation machine learning algorithms such as random forests, Extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM) ensemble, CatBoost, and Multilayer Perceptron Artificial Neural Networks (MLP-ANNs) to forecast solar irradiance. Besides, Bayesian optimization is applied to hyperparameter tuning. Unlike tree-based ensemble algorithms that select the features intrinsically, MLP-ANN needs feature selection as a separate step. The simulation results indicate that the performance of the MLP-ANNs improves when feature selection is applied. Besides, the random forest outperforms the other learning algorithms. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 6 pages, 4 figures, 3 tables, to appear in the 13th Smart Grid Conference

arXiv:2309.13785 [pdf, other]

Study of Robust Adaptive Beamforming Algorithms Based on Power Method Processing and Spatial Spectrum Matching

Authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare, O. Kukrer

Abstract: Robust adaptive beamforming (RAB) based on interference-plus-noise covariance (INC) matrix reconstruction can experience performance degradation when model mismatch errors exist, particularly when the input signal-to-noise ratio (SNR) is large. In this work, we devise an efficient RAB technique for dealing with covariance matrix reconstruction issues. The proposed method involves INC matrix recons… ▽ More Robust adaptive beamforming (RAB) based on interference-plus-noise covariance (INC) matrix reconstruction can experience performance degradation when model mismatch errors exist, particularly when the input signal-to-noise ratio (SNR) is large. In this work, we devise an efficient RAB technique for dealing with covariance matrix reconstruction issues. The proposed method involves INC matrix reconstruction using an idea in which the power and the steering vector of the interferences are estimated based on the power method. Furthermore, spatial match processing is computed to reconstruct the desired signal-plus-noise covariance matrix. Then, the noise components are excluded to retain the desired signal (DS) covariance matrix. A key feature of the proposed technique is to avoid eigenvalue decomposition of the INC matrix to obtain the dominant power of the interference-plus-noise region. Moreover, the INC reconstruction is carried out according to the definition of the theoretical INC matrix. Simulation results are shown and discussed to verify the effectiveness of the proposed method against existing approaches. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 7 pages, 2 figures

arXiv:2309.01040 [pdf, ps, other]

Efficient Covariance Matrix Reconstruction with Iterative Spatial Spectrum Sampling

Authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare, O. Kukrer

Abstract: This work presents a cost-effective technique for designing robust adaptive beamforming algorithms based on efficient covariance matrix reconstruction with iterative spatial power spectrum (CMR-ISPS). The proposed CMR-ISPS approach reconstructs the interference-plus-noise covariance (INC) matrix based on a simplified maximum entropy power spectral density function that can be used to shape the dir… ▽ More This work presents a cost-effective technique for designing robust adaptive beamforming algorithms based on efficient covariance matrix reconstruction with iterative spatial power spectrum (CMR-ISPS). The proposed CMR-ISPS approach reconstructs the interference-plus-noise covariance (INC) matrix based on a simplified maximum entropy power spectral density function that can be used to shape the directional response of the beamformer. Firstly, we estimate the directions of arrival (DoAs) of the interfering sources with the available snapshots. We then develop an algorithm to reconstruct the INC matrix using a weighted sum of outer products of steering vectors whose coefficients can be estimated in the vicinity of the DoAs of the interferences which lie in a small angular sector. We also devise a cost-effective adaptive algorithm based on conjugate gradient techniques to update the beamforming weights and a method to obtain estimates of the signal of interest (SOI) steering vector from the spatial power spectrum. The proposed CMR-ISPS beamformer can suppress interferers close to the direction of the SOI by producing notches in the directional response of the array with sufficient depths. Simulation results are provided to confirm the validity of the proposed method and make a comparison to existing approaches △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: 14 pages, 8 figures

arXiv:2304.10502 [pdf, ps, other]

Study of Robust Adaptive Beamforming with Covariance Matrix Reconstruction Based on Power Spectral Estimation and Uncertainty Region

Authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare, O. Kukrer

Abstract: In this work, a simple and effective robust adaptive beamforming technique is proposed for uniform linear arrays, which is based on the power spectral estimation and uncertainty region (PSEUR) of the interference plus noise (IPN) components. In particular, two algorithms are presented to find the angular sector of interference in every snapshot based on the adopted spatial uncertainty region of th… ▽ More In this work, a simple and effective robust adaptive beamforming technique is proposed for uniform linear arrays, which is based on the power spectral estimation and uncertainty region (PSEUR) of the interference plus noise (IPN) components. In particular, two algorithms are presented to find the angular sector of interference in every snapshot based on the adopted spatial uncertainty region of the interference direction. Moreover, a power spectrum is introduced based on the estimation of the power of interference and noise components, which allows the development of a robust approach to IPN covariance matrix reconstruction. The proposed method has two main advantages. First, an angular region that contains the interference direction is updated based on the statistics of the array data. Secondly, the proposed IPN-PSEUR method avoids estimating the power spectrum of the whole range of possible directions of the interference sector. Simulation results show that the performance of the proposed IPN-PSEUR beamformer is almost always close to the optimal value across a wide range of signal-to-noise ratios. △ Less

Submitted 18 March, 2023; originally announced April 2023.

Comments: 14 figures, 11 pages

arXiv:2212.00881 [pdf, ps, other]

Investigating Deep Learning Model Calibration for Classification Problems in Mechanics

Authors: Saeed Mohammadzadeh, Peerasait Prachaseree, Emma Lejeune

Abstract: Recently, there has been a growing interest in applying machine learning methods to problems in engineering mechanics. In particular, there has been significant interest in applying deep learning techniques to predicting the mechanical behavior of heterogeneous materials and structures. Researchers have shown that deep learning methods are able to effectively predict mechanical behavior with low e… ▽ More Recently, there has been a growing interest in applying machine learning methods to problems in engineering mechanics. In particular, there has been significant interest in applying deep learning techniques to predicting the mechanical behavior of heterogeneous materials and structures. Researchers have shown that deep learning methods are able to effectively predict mechanical behavior with low error for systems ranging from engineered composites, to geometrically complex metamaterials, to heterogeneous biological tissue. However, there has been comparatively little attention paid to deep learning model calibration, i.e., the match between predicted probabilities of outcomes and the true probabilities of outcomes. In this work, we perform a comprehensive investigation into ML model calibration across seven open access engineering mechanics datasets that cover three distinct types of mechanical problems. Specifically, we evaluate both model and model calibration error for multiple machine learning methods, and investigate the influence of ensemble averaging and post hoc model calibration via temperature scaling. Overall, we find that ensemble averaging of deep neural networks is both an effective and consistent tool for improving model calibration, while temperature scaling has comparatively limited benefits. Looking forward, we anticipate that this investigation will lay the foundation for future work in develo** mechanics specific approaches to deep learning model calibration. △ Less

Submitted 14 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: 21 pages, 9 figures

MSC Class: 74B20; 74A40; 68T07 ACM Class: J.2; I.6.3; I.6.5

arXiv:2203.04183 [pdf, other]

doi 10.1115/1.4054898

Enhancing Mechanical Metamodels with a Generative Model-Based Augmented Training Dataset

Authors: Hiba Kobeissi, Saeed Mohammadzadeh, Emma Lejeune

Abstract: Modeling biological soft tissue is complex in part due to material heterogeneity. Microstructural patterns, which play a major role in defining the mechanical behavior of these tissues, are both challenging to characterize, and difficult to simulate. Recently, machine learning-based methods to predict the mechanical behavior of heterogeneous materials have made it possible to more thoroughly explo… ▽ More Modeling biological soft tissue is complex in part due to material heterogeneity. Microstructural patterns, which play a major role in defining the mechanical behavior of these tissues, are both challenging to characterize, and difficult to simulate. Recently, machine learning-based methods to predict the mechanical behavior of heterogeneous materials have made it possible to more thoroughly explore the massive input parameter space associated with heterogeneous blocks of material. Specifically, we can train machine learning (ML) models to closely approximate computationally expensive heterogeneous material simulations where the ML model is trained on a dataset of simulations that capture the range of spatial heterogeneity present in the material of interest. However, when it comes to applying these techniques to biological tissue more broadly, there is a major limitation: the relevant microstructural patterns are both challenging to obtain and difficult to analyze. Consequently, the number of useful examples available to characterize the input domain under study is limited. In this work, we investigate the efficacy of ML-based generative models as well as procedural methods as a tool for augmenting limited input pattern datasets. We find that a Style-based Generative Adversarial Network with adaptive discriminator augmentation is able to successfully leverage just 1,000 example patterns to create the most authentic generated patterns. In general, diverse generated patterns with adequate resemblance to the real patterns can be used as inputs to finite element simulations to meaningfully augment the training dataset. To enable this methodological contribution, we have created an open access dataset of Finite Element Analysis simulations based on Cahn-Hilliard patterns. We anticipate that future researchers will be able to leverage this dataset and build on the work presented here. △ Less

Submitted 17 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 13 pages, 6 figures

MSC Class: 74A40; 74B20; 74S05 ACM Class: I.6.3; I.6.5; J.2

Journal ref: Journal of Biomechanical Engineering (2022)

arXiv:2108.03995 [pdf, ps, other]

Predicting Mechanically Driven Full-Field Quantities of Interest with Deep Learning-Based Metamodels

Authors: S. Mohammadzadeh, E. Lejeune

Abstract: Using simulation to predict the mechanical behavior of heterogeneous materials has applications ranging from topology optimization to multi-scale structural analysis. However, full-fidelity simulation techniques such as Finite Element Analysis can be prohibitively computationally expensive when they are used to explore the massive input parameter space of heterogeneous materials. Therefore, there… ▽ More Using simulation to predict the mechanical behavior of heterogeneous materials has applications ranging from topology optimization to multi-scale structural analysis. However, full-fidelity simulation techniques such as Finite Element Analysis can be prohibitively computationally expensive when they are used to explore the massive input parameter space of heterogeneous materials. Therefore, there has been significant recent interest in machine learning-based models that, once trained, can predict mechanical behavior at a fraction of the computational cost. Over the past several years, research in this area has been focused mainly on predicting single Quantities of Interest (QoIs). However, there has recently been an increased interest in a more challenging problem: predicting full-field QoI (e.g., displacement/strain fields, damage fields) for mechanical problems. Due to the added complexity of full-field information, network architectures that perform well on single QoI problems may perform poorly in the full-field QoI problem setting. The work presented in this paper is twofold. First, we made a significant extension to the Mechanical MNIST dataset designed to enable the investigation of full field QoI prediction. Specifically, we added Finite Element simulation results of quasi-static brittle fracture in a heterogeneous material captured with the phase-field method. Second, we established strong baseline performance for predicting full-field QoI with MultiRes-WNet architecture. In addition to presenting the results in this paper, we have released our model implementation and the Mechanical MNIST Crack Path dataset under open-source licenses. We anticipate that future researchers will directly use our model architecture on related datasets and potentially design models that exceed the baseline performance for predicting full-field QoI established in this paper. △ Less

Submitted 25 October, 2021; v1 submitted 23 July, 2021; originally announced August 2021.

Comments: 17 pages, 7 figures

MSC Class: 74R10; 74B20; 74A40 ACM Class: J.2; I.6.3; I.6.5

arXiv:2106.12663 [pdf, ps, other]

Study of Robust Adaptive Beamforming Based on Low-Complexity DFT Spatial Sampling

Authors: Saeed Mohammadzadeh, Vitor H. Nascimento, Rodrigo C. de Lamare, Osman Kukrer

Abstract: In this paper, a novel and robust algorithm is proposed for adaptive beamforming based on the idea of reconstructing the autocorrelation sequence (ACS) of a random process from a set of measured data. This is obtained from the first column and the first row of the sample covariance matrix (SCM) after averaging along its diagonals. Then, the power spectrum of the correlation sequence is estimated u… ▽ More In this paper, a novel and robust algorithm is proposed for adaptive beamforming based on the idea of reconstructing the autocorrelation sequence (ACS) of a random process from a set of measured data. This is obtained from the first column and the first row of the sample covariance matrix (SCM) after averaging along its diagonals. Then, the power spectrum of the correlation sequence is estimated using the discrete Fourier transform (DFT). The DFT coefficients corresponding to the angles within the noise-plus-interference region are used to reconstruct the noise-plus-interference covariance matrix (NPICM), while the desired signal covariance matrix (DSCM) is estimated by identifying and removing the noise-plus-interference component from the SCM. In particular, the spatial power spectrum of the estimated received signal is utilized to compute the correlation sequence corresponding to the noise-plus-interference in which the dominant DFT coefficient of the noise-plus-interference is captured. A key advantage of the proposed adaptive beamforming is that only little prior information is required. Specifically, an imprecise knowledge of the array geometry and of the angular sectors in which the interferences are located is needed. Simulation results demonstrate that compared with previous reconstruction-based beamformers, the proposed approach can achieve better overall performance in the case of multiple mismatches over a very large range of input signal-to-noise ratios. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 12 pages, 12 figures

arXiv:2012.14338 [pdf, ps, other]

Low-Cost Maximum Entropy Covariance Matrix Reconstruction Algorithm for Robust Adaptive Beamforming

Authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare

Abstract: In this letter, we present a novel low-complexity adaptive beamforming technique using a stochastic gradient algorithm to avoid matrix inversions. The proposed method exploits algorithms based on the maximum entropy power spectrum (MEPS) to estimate the noise-plus-interference covariance matrix (MEPS-NPIC) so that the beamforming weights are updated adaptively, thus greatly reducing the computatio… ▽ More In this letter, we present a novel low-complexity adaptive beamforming technique using a stochastic gradient algorithm to avoid matrix inversions. The proposed method exploits algorithms based on the maximum entropy power spectrum (MEPS) to estimate the noise-plus-interference covariance matrix (MEPS-NPIC) so that the beamforming weights are updated adaptively, thus greatly reducing the computational complexity. MEPS is further used to reconstruct the desired signal covariance matrix and to improve the estimate of the desired signals's steering vector (SV). Simulations show the superiority of the proposed MEPS-NPIC approach over previously proposed beamformers. △ Less

Submitted 28 December, 2020; originally announced December 2020.

Comments: 6 pages, 4 figures

arXiv:0711.1621 [pdf, ps, other]

Arens Regularity of Module Actions and the Second Adjoit of a Derivation

Authors: S. Mohammadzadeh, H. R. E. Vishki

Abstract: In this paper, first we give a simple criterion for the Arens regularity of a bilinear map** on normed spaces, which applies in particular to Banach module actions and then we investigate those conditions under which the second adjoint of a derivation into a dual Banach module is again a derivation. As a consequence of the main result, a simple and direct proof for several older results is als… ▽ More In this paper, first we give a simple criterion for the Arens regularity of a bilinear map** on normed spaces, which applies in particular to Banach module actions and then we investigate those conditions under which the second adjoint of a derivation into a dual Banach module is again a derivation. As a consequence of the main result, a simple and direct proof for several older results is also included. △ Less

Submitted 10 November, 2007; originally announced November 2007.

Comments: 15 pages. To appear in Bull. Austral. Math. Soc

MSC Class: 46H20; 46H25

Showing 1–11 of 11 results for author: Mohammadzadeh, S