Search | arXiv e-print repository

DCT-Based Decorrelated Attention for Vision Transformers

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Enis Cetin, Ulas Bagci

Abstract: Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf… ▽ More Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transformers by introducing a simple, yet highly innovative, initialization approach utilizing Discrete Cosine Transform (DCT) coefficients. Our proposed DCT-based attention initialization marks a significant gain compared to traditional initialization strategies; offering a robust foundation for the attention mechanism. Our experiments reveal that the DCT-based initialization enhances the accuracy of Vision Transformers in classification tasks. (ii) We also recognize that since DCT effectively decorrelates image information in the frequency domain, this decorrelation is useful for compression because it allows the quantization step to discard many of the higher-frequency components. Based on this observation, we propose a novel DCT-based compression technique for the attention function of Vision Transformers. Since high-frequency DCT coefficients usually correspond to noise, we truncate the high-frequency DCT components of the input patches. Our DCT-based compression reduces the size of weight matrices for queries, keys, and values. While maintaining the same level of accuracy, our DCT compressed Swin Transformers obtain a considerable decrease in the computational overhead. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2403.18846 [pdf, other]

The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access

Authors: Xin Zhu, Ahmet Enis Cetin

Abstract: The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Ste… ▽ More The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Stein variational gradient descent (SVGD)-based detector is proposed to obtain an approximate solution to the MLE model. First, by exploring the relationship between the Hadamard transform and wavelet transform, a new modified Hadamard transform (MHT) is developed to separate high-frequencies from important components using the second-order derivative filter. Next, to eliminate noise and mitigate the vanishing gradients problem in the SVGD-based detectors, the block MHT layer is designed based on the MHT, scaling layer, soft-thresholding layer, inverse MHT and sparsity penalty. Then, the blind normalized SVGD algorithm is derived to perform preamble detection without prior knowledge of noise power and the number of active devices. The experimental results show the proposed block MHT layer outperforms other transform-based methods in terms of computation costs and denoising performance. Furthermore, with the assistance of the block MHT layer, the proposed blind normalized SVGD algorithm achieves a higher preamble detection accuracy and throughput than other state-of-the-art detection methods. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.05024 [pdf, other]

A Probabilistic Hadamard U-Net for MRI Bias Field Correction

Authors: Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci

Abstract: Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as… ▽ More Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as the prostate. To address this problem, this paper proposes a probabilistic Hadamard U-Net (PHU-Net) for prostate MRI bias field correction. First, a novel Hadamard U-Net (HU-Net) is introduced to extract the low-frequency scalar field, multiplied by the original input to obtain the prototypical corrected image. HU-Net converts the input image from the time domain into the frequency domain via Hadamard transform. In the frequency domain, high-frequency components are eliminated using the trainable filter (scaling layer), hard-thresholding layer, and sparsity penalty. Next, a conditional variational autoencoder is used to encode possible bias field-corrected variants into a low-dimensional latent space. Random samples drawn from latent space are then incorporated with a prototypical corrected image to generate multiple plausible images. Experimental results demonstrate the effectiveness of PHU-Net in correcting bias-field in prostate MRI with a fast inference speed. It has also been shown that prostate MRI segmentation accuracy improves with the high-quality corrected images from PHU-Net. The code will be available in the final version of this manuscript. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2310.02862 [pdf, other]

A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression

Authors: Xin Zhu, Daoguang Yang, Hongyi Pan, Hamid Reza Karimi, Didem Ozevin, Ahmet Enis Cetin

Abstract: The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer… ▽ More The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer autoencoder. A trainable filter is implemented in the DCST domain by utilizing the multiplication property of the convolution. A trainable hard-thresholding layer is applied to reduce redundant data in the DCST layer to make the feature map sparse. In comparison to the linear layer, the DCST layer reduces the number of trainable parameters and improves the accuracy of data reconstruction. Second, training the autoencoder with a sparsifying DCST layer only requires a small number of datasets. The proposed method is superior to other autoencoder-based methods on the University of Connecticut (UoC) and Southeast University (SEU) gearbox datasets, as the average quality score is improved by 2.00% at the lowest and 32.35% at the highest with a limited number of training samples △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.12201 [pdf, other]

Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer

Authors: Xin Zhu, Hongyi Pan, Shuaiang Rong, Ahmet Enis Cetin

Abstract: Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduc… ▽ More Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity. Furthermore, the DCT layer includes trainable hard-thresholding parameters and scaling layers to give emphasis or de-emphasis on individual DCT coefficients. Finally, the one-by-one convolutional layer generates the latent space. The sparsity penalty-based cost function is employed to keep the feature map as sparse as possible in the latent space. The latent space data is transmitted to the receiver. The decoder module of the autoencoder is designed using the inverse DCT and two fully connected linear layers to improve the accuracy of data reconstruction. In comparison to other state-of-the-art methods, the proposed method significantly improves the average quality score in various data compression experiments. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.09866 [pdf, other]

Domain Generalization with Fourier Transform and Soft Thresholding

Authors: Hongyi Pan, Bin Wang, Zheyuan Zhang, Xin Zhu, Debesh Jha, Ahmet Enis Cetin, Concetto Spampinato, Ulas Bagci

Abstract: Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more rob… ▽ More Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics. △ Less

Submitted 12 December, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: This paper was accepted to ICASSP 2024

arXiv:2309.08782 [pdf, other]

Stein Variational Gradient Descent-based Detection For Random Access With Preambles In MTC

Authors: Xin Zhu, Hongyi Pan, Salih Atici, Ahmet Enis Cetin

Abstract: Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance… ▽ More Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance the performance of the SVGD detector, especially in a dense user scenario, we propose a normalized SVGD detector with momentum. It utilizes the momentum and a bias correction term to reduce the preamble estimation errors during the gradient descent process. Simulation results show that the proposed algorithm performs better than Markov Chain Monte Carlo-based approaches in terms of detection accuracy. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2305.17510 [pdf, other]

A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer

Authors: Hongyi Pan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computin… ▽ More In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computing the HT is simply the application of a Hadamard gate to each qubit individually, so the HT computations of our proposed layer can be implemented on a quantum computer. Compared to the regular Conv2D layer, the proposed HT-perceptron layer is computationally more efficient. Compared to a CNN with the same number of trainable parameters and 99.26\% test accuracy, our HT network reaches 99.31\% test accuracy with 57.1\% MACs reduced in the MNIST dataset; and in our ImageNet-1K experiments, our HT-based ResNet-50 exceeds the accuracy of the baseline ResNet-50 by 0.59\% center-crop top-1 accuracy using 11.5\% fewer parameters with 12.6\% fewer MACs. △ Less

Submitted 22 February, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: To be presented at International Conference on Machine Learning (ICML), 2023

arXiv:2303.06797 [pdf, other]

doi 10.1109/TNNLS.2024.3384316

Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolut… ▽ More In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolution theorems, convolutional filtering operations are performed in the transform domain using element-wise multiplications. Trainable soft-thresholding layers, that remove noise in the transform domain, bring nonlinearity to the transform domain layers. Compared to the Conv2D layer, which is spatial-agnostic and channel-specific, the proposed layers are location-specific and channel-specific. Moreover, these proposed layers reduce the number of parameters and multiplications significantly while improving the accuracy results of regular ResNets on the ImageNet-1K classification task. Furthermore, they can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy. △ Less

Submitted 22 April, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: This work is accepted to IEEE Transactions on Neural Networks and Learning Systems. The initial title is "Orthogonal Transform Domain Approaches for the Convolutional Layer". We changed it to "Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets" based on reviewer's comment. arXiv admin note: text overlap with arXiv:2211.08577

arXiv:2212.09921 [pdf, other]

Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks

Authors: Salih Atici, Hongyi Pan, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our al… ▽ More In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%. △ Less

Submitted 26 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.08577 [pdf, other]

DCT Perceptron Layer: A Transform Domain Approach for Convolution Layer

Authors: Hongyi Pan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding… ▽ More In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding layer is used as the nonlinearity in the DCT perceptron. Compared to ResNet's Conv2D layer which is spatial-agnostic and channel-specific, the proposed layer is location-specific and channel-specific. The DCT-perceptron layer reduces the number of parameters and multiplications significantly while maintaining comparable accuracy results of regular ResNets in CIFAR-10 and ImageNet-1K. Moreover, the DCT-perceptron layer can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.08505 [pdf, other]

Classification of the Cervical Vertebrae Maturation (CVM) stages Using the Tripod Network

Authors: Salih Atici, Hongyi Pan, Mohammed H. Elnagar, Veerasathpurush Allareddy, Omar Suhaym, Rashid Ansari, Ahmet Enis Cetin

Abstract: We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray im… ▽ More We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray images. Outputs of the three parallel networks are combined using a fully connected layer. 1018 cephalometric radiographs were labeled, divided by gender, and classified according to the CVM stages. Resulting images, using different training techniques and patches, were used to train TripodNet together with a set of tunable directional edge enhancers. Data augmentation is implemented to avoid overfitting. TripodNet achieves the state-of-the-art accuracy of 81.18\% in female patients and 75.32\% in male patients. The proposed TripodNet achieves a higher accuracy in our dataset than the Swin Transformers and the previous network models that we investigated for CVM stage estimation. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.08491 [pdf, other]

Real-time Wireless ECG-derived Respiration Rate Estimation Using an Autoencoder with a DCT Layer

Authors: Hongyi Pan, Xin Zhu, Zhilu Ye, Pai-Yen Chen, Ahmet Enis Cetin

Abstract: In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect… ▽ More In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect results due to uneven breathing. To improve the estimation precision, we propose a neural network that uses a novel Discrete Cosine Transform (DCT) layer to denoise and decorrelates the data. The DCT layer has trainable weights and soft-thresholds in the transform domain. In our dataset, we improve the Mean Squared Error (MSE) and Mean Absolute Error (MAE) of the Fourier analysis-based approach using our novel neural network with the DCT layer. △ Less

Submitted 16 February, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: This paper was accepted to ICASSP 2023

arXiv:2206.13613 [pdf, other]

Flexible-Rate Learned Hierarchical Bi-Directional Video Compression With Motion Refinement and Frame-Level Bit Allocation

Authors: Eren Cetin, M. Akin Yilmaz, A. Murat Tekalp

Abstract: This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted… ▽ More This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted the gain unit proposed for image compression to flexible-rate video compression in two ways: first, the gain unit enables a single encoder model to operate at multiple rate-distortion operating points; second, we exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate that we obtain state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP 2022)

Report number: 1850

arXiv:2201.02711 [pdf, other]

Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks

Authors: Hongyi Pan, Diaa Badawi, Ahmet Enis Cetin

Abstract: Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dime… ▽ More Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dimensional (1-D) and two-dimensional (2-D) binary WHTs in this paper. In both 1-D and 2-D layers, we compute the binary WHT of the input feature map and denoise the WHT domain coefficients using a nonlinearity which is obtained by combining soft-thresholding with the tanh function. After denoising, we compute the inverse WHT. We use 1D-WHT to replace the $1\times 1$ convolutional layers, and 2D-WHT layers can replace the 3$\times$3 convolution layers and Squeeze-and-Excite layers. 2D-WHT layers with trainable weights can be also inserted before the Global Average Pooling (GAP) layers to assist the dense layers. In this way, we can reduce the number of trainable parameters significantly with a slight decrease in trainable parameters. In this paper, we implement the WHT layers into MobileNet-V2, MobileNet-V3-Large, and ResNet to reduce the number of parameters significantly with negligible accuracy loss. Moreover, according to our speed test, the 2D-FWHT layer runs about 24 times as fast as the regular $3\times 3$ convolution with 19.51\% less RAM usage in an NVIDIA Jetson Nano experiment. △ Less

Submitted 27 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: This paper has been accepted by ACM Transactions on Embedded Computing Systems

arXiv:2201.02709 [pdf, other]

Detecting Anomaly in Chemical Sensors via L1-Kernels based Principal Component Analysis

Authors: Hongyi Pan, Diaa Badawi, Ishaan Bassi, Sule Ozev, Ahmet Enis Cetin

Abstract: We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the diffe… ▽ More We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the difference between the actual sensor data and the reconstructed data from the dominant eigenvector. In this paper, we introduce a new multiplication-free kernel, which is related to the l1-norm for the anomaly detection task. The l1-kernel PCA is not only computationally efficient but also energy-efficient because it does not require any actual multiplications during the kernel covariance matrix computation. Our experimental results show that our kernel-PCA method achieves a higher area under curvature (AUC) score (0.7483) than the baseline regular PCA method (0.7366). △ Less

Submitted 28 September, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: This paper has been accepted to IEEE Sensors Letters

arXiv:2110.12065 [pdf, other]

Multiplication-Avoiding Variant of Power Iteration with Applications

Authors: Hongyi Pan, Diaa Badawi, Runxuan Miao, Erdem Koyuncu, Ahmet Enis Cetin

Abstract: Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that… ▽ More Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that appear at the regular power iteration (RPI) with multiplication-free vector products which are Mercer-type kernel operations related with the $\ell_1$ norm. Precisely, for an $n\times n$ matrix, MAPI requires $n$ multiplications, while RPI needs $n^2$ multiplications per iteration. Therefore, MAPI provides a significant reduction of the number of multiplication operations, which are known to be costly in terms of energy consumption. We provide applications of MAPI to PCA-based image reconstruction as well as to graph-based ranking algorithms. When compared to RPI, MAPI not only typically converges much faster, but also provides superior performance. △ Less

Submitted 31 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: This is the technique report for the paper "MULTIPLICATION-AVOIDING VARIANT OF POWER ITERATION WITH APPLICATIONS", which has been accepted by ICASSP 2022

arXiv:2106.00668 [pdf, other]

High Resolution Time-Frequency Generation with Generative Adversarial Networks

Authors: Zeynel Deprem, A. Enis Çetin

Abstract: Signal representation in Time-Frequency (TF) domain is valuable in many applications including radar imaging and inverse synthetic aparture radar. TF representation allows us to identify signal components or features in a mixed time and frequency plane. There are several well-known tools, such as Wigner-Ville Distribution (WVD), Short-Time Fourier Transform (STFT) and various other variants for su… ▽ More Signal representation in Time-Frequency (TF) domain is valuable in many applications including radar imaging and inverse synthetic aparture radar. TF representation allows us to identify signal components or features in a mixed time and frequency plane. There are several well-known tools, such as Wigner-Ville Distribution (WVD), Short-Time Fourier Transform (STFT) and various other variants for such a purpose. The main requirement for a TF representation tool is to give a high-resolution view of the signal such that the signal components or features are identifiable. A commonly used method is the reassignment process which reduces the cross-terms by artificially moving smoothed WVD values from their actual location to the center of the gravity for that region. In this article, we propose a novel reassignment method using the Conditional Generative Adversarial Network (CGAN). We train a CGAN to perform the reassignment process. Through examples, it is shown that the method generates high-resolution TF representations which are better than the current reassignment methods. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.11634 [pdf, other]

Robust Principal Component Analysis Using a Novel Kernel Related with the L1-Norm

Authors: Hongyi Pan, Diaa Badawi, Erdem Koyuncu, A. Enis Cetin

Abstract: We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite ge… ▽ More We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite generalized covariance matrices, thus enabling principal component analysis (PCA). Moreover, the generalized covariance matrices can be constructed in an Energy Efficient (EEF) manner due to the multiplication-free property of the underlying vector products. We present image reconstruction examples in which our EEF PCA method result in the highest peak signal-to-noise ratios compared to the ordinary $\ell_2$-PCA and the recursive $\ell_1$-PCA. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 6 pages, 3 tables and one figure

arXiv:2104.07085 [pdf, other]

Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks

Authors: Hongyi Pan, Diaa Dabawi, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-f… ▽ More In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-free operators from the basic 2$\times$2 Hadamard transform to implement $3\times 3$ depthwise separable convolution layers. Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy. For example, by replacing the final third bottleneck layers, we reduce the number of parameters from 2.270M to 540K. This reduces the accuracy from 95.21\% to 92.98\% on the CIFAR-10 dataset. Our approach significantly improves the speed of data processing. The fast Walsh-Hadamard transform has a computational complexity of $O(m\log_2 m)$. As a result, it is computationally more efficient than the $1\times1$ convolution layer. The fast Walsh-Hadamard layer processes a tensor in $\mathbb{R}^{10\times32\times32\times1024}$ about 2 times faster than $1\times1$ convolution layer on NVIDIA Jetson Nano computer board. △ Less

Submitted 29 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: The paper (v1) has been accepted to CVPR 2021 BiVision Workshop. We notice the final Conv2D is also a 1x1 convolution layer so we update the result with changing the layer in v2. In v3, we update citation 37 because its authorship changes. In v4, we propose the improved version of smooth thresholding called "weighted smooth thresholding"

arXiv:2102.00035 [pdf, other]

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

Authors: Shamma Nasrin, Diaa Badawi, Ahmet Enis Cetin, Wilfred Gomes, Amit Ranjan Trivedi

Abstract: We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on ell_1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM… ▽ More We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on ell_1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our 8$\times$62 SRAM macro, which requires a 5-bit ADC, achieves $\sim$105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. △ Less

Submitted 29 January, 2021; originally announced February 2021.

arXiv:2011.06681 [pdf, other]

Discrete Cosine Transform Based Causal Convolutional Neural Network for Drift Compensation in Chemical Sensors

Authors: Diaa Badawi, Agamyrat Agambayev, Sule Ozev, A. Enis Cetin

Abstract: Sensor drift is a major problem in chemical sensors that requires addressing for reliable and accurate detection of chemical analytes. In this paper, we develop a causal convolutional neural network (CNN) with a Discrete Cosine Transform (DCT) layer to estimate the drift signal. In the DCT module, we apply soft-thresholding nonlinearity in the transform domain to denoise the data and obtain a spar… ▽ More Sensor drift is a major problem in chemical sensors that requires addressing for reliable and accurate detection of chemical analytes. In this paper, we develop a causal convolutional neural network (CNN) with a Discrete Cosine Transform (DCT) layer to estimate the drift signal. In the DCT module, we apply soft-thresholding nonlinearity in the transform domain to denoise the data and obtain a sparse representation of the drift signal. The soft-threshold values are learned during training. Our results show that DCT layer-based CNNs are able to produce a slowly varying baseline drift signal. We train the CNN on synthetic data and test it on real chemical sensor data. Our results show that we can have an accurate and smooth drift estimate even when the observed sensor signal is very noisy. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Comments: 5 pages, 3 figure, submitted to ICASSP 2020

arXiv:1910.14096 [pdf, other]

Robust and Computationally-Efficient Anomaly Detection using Powers-of-Two Networks

Authors: Usama Muneeb, Erdem Koyuncu, Yasaman Keshtkarjahromi, Hulya Seferoglu, Mehmet Fatih Erden, Ahmet Enis Cetin

Abstract: Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of t… ▽ More Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of the CNN and by using powers-of-two weights, which replaces the computationally expensive multiplication operations with bit-shift operations. Denoising operation during inference forces small valued intermediate layer outputs to zero. The number of zeros in the network significantly increases as a result of denoising, we can implement the CNN about 10% faster than a comparable network while detecting all the anomalies in the testing set. It turns out that denoising operation also provides robustness because the contribution of small intermediate values to the final result is negligible. During training we also generate motion vector images by a Generative Adversarial Network (GAN) to improve the robustness of the overall system. We experimentally observe that the resulting system is robust to background motion. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1908.07619 [pdf, other]

Detecting Gas Vapor Leaks Using Uncalibrated Sensors

Authors: Diaa Badawi, Tuba Ayhan, Sule Ozev, Chengmo Yang, Alex Orailoglu, A. Enis Çetin

Abstract: Chemical and infra-red sensors generate distinct responses under similar conditions because of sensor drift, noise or resolution errors. In this work, we use different time-series data sets obtained by infra-red and E-nose sensors in order to detect Volatile Organic Compounds (VOCs) and Ammonia vapor leaks. We process time-series sensor signals using deep neural networks (DNN). Three neural networ… ▽ More Chemical and infra-red sensors generate distinct responses under similar conditions because of sensor drift, noise or resolution errors. In this work, we use different time-series data sets obtained by infra-red and E-nose sensors in order to detect Volatile Organic Compounds (VOCs) and Ammonia vapor leaks. We process time-series sensor signals using deep neural networks (DNN). Three neural network algorithms are utilized for this purpose. Additive neural networks (termed AddNet) are based on a multiplication-devoid operator and consequently exhibit energy-efficiency compared to regular neural networks. The second algorithm uses generative adversarial neural networks so as to expose the classifying neural network to more realistic data points in order to help the classifier network to deliver improved generalization. Finally, we use conventional convolutional neural networks as a baseline method and compare their performance with the two aforementioned deep neural network algorithms in order to evaluate their effectiveness empirically. △ Less

Submitted 20 August, 2019; originally announced August 2019.

arXiv:1905.09472 [pdf, ps, other]

EEG Classification by factoring in Sensor Configuration

Authors: Lubna Shibly Mokatren, Rashid Ansari, Ahmet Enis Cetin, Alex D Leow, Heide Klumpp, Olusola Ajilore, Fatos Yarman Vural

Abstract: Electroencephalography (EEG) serves as an effective diagnostic tool for mental disorders and neurological abnormalities. Enhanced analysis and classification of EEG signals can help improve detection performance. A new approach is examined here for enhancing EEG classification performance by leveraging knowledge of spatial layout of EEG sensors. Performance of two classification models - model 1 t… ▽ More Electroencephalography (EEG) serves as an effective diagnostic tool for mental disorders and neurological abnormalities. Enhanced analysis and classification of EEG signals can help improve detection performance. A new approach is examined here for enhancing EEG classification performance by leveraging knowledge of spatial layout of EEG sensors. Performance of two classification models - model 1 that ignores the sensor layout and model 2 that factors it in - is investigated and found to achieve consistently higher detection accuracy. The analysis is based on the information content of these signals represented in two different ways: concatenation of the channels of the frequency bands and an image-like 2D representation of the EEG channel locations. Performance of these models is examined on two tasks, social anxiety disorder (SAD) detection, and emotion recognition using a dataset for emotion analysis using physiological signals (DEAP). We hypothesized that model 2 will significantly outperform model 1 and this was validated in our results as model 2 yielded $5$--$8\%$ higher accuracy in all machine learning algorithms investigated. Convolutional Neural Networks (CNN) provided the best performance far exceeding that of Support Vector Machine (SVM) and k-Nearest Neighbors (kNNs) algorithms. △ Less

Submitted 7 February, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: arXiv admin note: text overlap with arXiv:1812.02865

arXiv:1905.04596 [pdf, ps, other]

Deep Layered LMS Predictor

Authors: Lubna Shibly Mokatren, Ahmet Enis Cetin, Rashid Ansari

Abstract: In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuro… ▽ More In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuron in the hidden layers corresponds to a certain FIR filter output which goes through nonlinearity. The output of the last layer is the prediction. We hypothesize that this approach will outperform the traditional adaptive filters. △ Less

Submitted 11 May, 2019; originally announced May 2019.

arXiv:1812.02865 [pdf, other]

EEG Classification based on Image Configuration in Social Anxiety Disorder

Authors: Lubna Shibly Mokatren, Rashid Ansari, Ahmet Enis Cetin, Alex D. Leow, Olusola Ajilore, Heide Klumpp, Fatos T. Yarman Vural

Abstract: The problem of detecting the presence of Social Anxiety Disorder (SAD) using Electroencephalography (EEG) for classification has seen limited study and is addressed with a new approach that seeks to exploit the knowledge of EEG sensor spatial configuration. Two classification models, one which ignores the configuration (model 1) and one that exploits it with different interpolation methods (model… ▽ More The problem of detecting the presence of Social Anxiety Disorder (SAD) using Electroencephalography (EEG) for classification has seen limited study and is addressed with a new approach that seeks to exploit the knowledge of EEG sensor spatial configuration. Two classification models, one which ignores the configuration (model 1) and one that exploits it with different interpolation methods (model 2), are studied. Performance of these two models is examined for analyzing 34 EEG data channels each consisting of five frequency bands and further decomposed with a filter bank. The data are collected from 64 subjects consisting of healthy controls and patients with SAD. Validity of our hypothesis that model 2 will significantly outperform model 1 is borne out in the results, with accuracy $6$--$7\%$ higher for model 2 for each machine learning algorithm we investigated. Convolutional Neural Networks (CNN) were found to provide much better performance than SVM and kNNs. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Showing 1–27 of 27 results for author: Cetin, E