-
DCT-Based Decorrelated Attention for Vision Transformers
Authors:
Hongyi Pan,
Emadeldeen Hamdan,
Xin Zhu,
Koushik Biswas,
Ahmet Enis Cetin,
Ulas Bagci
Abstract:
Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf…
▽ More
Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transformers by introducing a simple, yet highly innovative, initialization approach utilizing Discrete Cosine Transform (DCT) coefficients. Our proposed DCT-based attention initialization marks a significant gain compared to traditional initialization strategies; offering a robust foundation for the attention mechanism. Our experiments reveal that the DCT-based initialization enhances the accuracy of Vision Transformers in classification tasks. (ii) We also recognize that since DCT effectively decorrelates image information in the frequency domain, this decorrelation is useful for compression because it allows the quantization step to discard many of the higher-frequency components. Based on this observation, we propose a novel DCT-based compression technique for the attention function of Vision Transformers. Since high-frequency DCT coefficients usually correspond to noise, we truncate the high-frequency DCT components of the input patches. Our DCT-based compression reduces the size of weight matrices for queries, keys, and values. While maintaining the same level of accuracy, our DCT compressed Swin Transformers obtain a considerable decrease in the computational overhead.
△ Less
Submitted 28 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access
Authors:
Xin Zhu,
Ahmet Enis Cetin
Abstract:
The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Ste…
▽ More
The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Stein variational gradient descent (SVGD)-based detector is proposed to obtain an approximate solution to the MLE model. First, by exploring the relationship between the Hadamard transform and wavelet transform, a new modified Hadamard transform (MHT) is developed to separate high-frequencies from important components using the second-order derivative filter. Next, to eliminate noise and mitigate the vanishing gradients problem in the SVGD-based detectors, the block MHT layer is designed based on the MHT, scaling layer, soft-thresholding layer, inverse MHT and sparsity penalty. Then, the blind normalized SVGD algorithm is derived to perform preamble detection without prior knowledge of noise power and the number of active devices. The experimental results show the proposed block MHT layer outperforms other transform-based methods in terms of computation costs and denoising performance. Furthermore, with the assistance of the block MHT layer, the proposed blind normalized SVGD algorithm achieves a higher preamble detection accuracy and throughput than other state-of-the-art detection methods.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Simple Ingredients for Offline Reinforcement Learning
Authors:
Edoardo Cetin,
Andrea Tirinzoni,
Matteo Pirotta,
Alessandro Lazaric,
Yann Ollivier,
Ahmed Touati
Abstract:
Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline…
▽ More
Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. In light of this finding, we conduct a large empirical study where we formulate and test several hypotheses to explain this failure. Surprisingly, we find that scale, more than algorithmic considerations, is the key factor influencing performance. We show that simple methods like AWAC and IQL with increased network size overcome the paradoxical failure modes from the inclusion of additional data in MOOD, and notably outperform prior state-of-the-art algorithms on the canonical D4RL benchmark.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
A Probabilistic Hadamard U-Net for MRI Bias Field Correction
Authors:
Xin Zhu,
Hongyi Pan,
Yury Velichko,
Adam B. Murphy,
Ashley Ross,
Baris Turkbey,
Ahmet Enis Cetin,
Ulas Bagci
Abstract:
Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as…
▽ More
Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as the prostate. To address this problem, this paper proposes a probabilistic Hadamard U-Net (PHU-Net) for prostate MRI bias field correction. First, a novel Hadamard U-Net (HU-Net) is introduced to extract the low-frequency scalar field, multiplied by the original input to obtain the prototypical corrected image. HU-Net converts the input image from the time domain into the frequency domain via Hadamard transform. In the frequency domain, high-frequency components are eliminated using the trainable filter (scaling layer), hard-thresholding layer, and sparsity penalty. Next, a conditional variational autoencoder is used to encode possible bias field-corrected variants into a low-dimensional latent space. Random samples drawn from latent space are then incorporated with a prototypical corrected image to generate multiple plausible images. Experimental results demonstrate the effectiveness of PHU-Net in correcting bias-field in prostate MRI with a fast inference speed. It has also been shown that prostate MRI segmentation accuracy improves with the high-quality corrected images from PHU-Net. The code will be available in the final version of this manuscript.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression
Authors:
Xin Zhu,
Daoguang Yang,
Hongyi Pan,
Hamid Reza Karimi,
Didem Ozevin,
Ahmet Enis Cetin
Abstract:
The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer…
▽ More
The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer autoencoder. A trainable filter is implemented in the DCST domain by utilizing the multiplication property of the convolution. A trainable hard-thresholding layer is applied to reduce redundant data in the DCST layer to make the feature map sparse. In comparison to the linear layer, the DCST layer reduces the number of trainable parameters and improves the accuracy of data reconstruction. Second, training the autoencoder with a sparsifying DCST layer only requires a small number of datasets. The proposed method is superior to other autoencoder-based methods on the University of Connecticut (UoC) and Southeast University (SEU) gearbox datasets, as the average quality score is improved by 2.00% at the lowest and 32.35% at the highest with a limited number of training samples
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer
Authors:
Xin Zhu,
Hongyi Pan,
Shuaiang Rong,
Ahmet Enis Cetin
Abstract:
Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduc…
▽ More
Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity. Furthermore, the DCT layer includes trainable hard-thresholding parameters and scaling layers to give emphasis or de-emphasis on individual DCT coefficients. Finally, the one-by-one convolutional layer generates the latent space. The sparsity penalty-based cost function is employed to keep the feature map as sparse as possible in the latent space. The latent space data is transmitted to the receiver. The decoder module of the autoencoder is designed using the inverse DCT and two fully connected linear layers to improve the accuracy of data reconstruction. In comparison to other state-of-the-art methods, the proposed method significantly improves the average quality score in various data compression experiments.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Domain Generalization with Fourier Transform and Soft Thresholding
Authors:
Hongyi Pan,
Bin Wang,
Zheyuan Zhang,
Xin Zhu,
Debesh Jha,
Ahmet Enis Cetin,
Concetto Spampinato,
Ulas Bagci
Abstract:
Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more rob…
▽ More
Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics.
△ Less
Submitted 12 December, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Stein Variational Gradient Descent-based Detection For Random Access With Preambles In MTC
Authors:
Xin Zhu,
Hongyi Pan,
Salih Atici,
Ahmet Enis Cetin
Abstract:
Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance…
▽ More
Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance the performance of the SVGD detector, especially in a dense user scenario, we propose a normalized SVGD detector with momentum. It utilizes the momentum and a bias correction term to reduce the preamble estimation errors during the gradient descent process. Simulation results show that the proposed algorithm performs better than Markov Chain Monte Carlo-based approaches in terms of detection accuracy.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Wildfire Detection Via Transfer Learning: A Survey
Authors:
Ziliang Hong,
Emadeldeen Hamdan,
Yifei Zhao,
Tianxiao Ye,
Hongyi Pan,
A. Enis Cetin
Abstract:
This paper surveys different publicly available neural network models used for detecting wildfires using regular visible-range cameras which are placed on hilltops or forest lookout towers. The neural network models are pre-trained on ImageNet-1K and fine-tuned on a custom wildfire dataset. The performance of these models is evaluated on a diverse set of wildfire images, and the survey provides us…
▽ More
This paper surveys different publicly available neural network models used for detecting wildfires using regular visible-range cameras which are placed on hilltops or forest lookout towers. The neural network models are pre-trained on ImageNet-1K and fine-tuned on a custom wildfire dataset. The performance of these models is evaluated on a diverse set of wildfire images, and the survey provides useful information for those interested in using transfer learning for wildfire detection. Swin Transformer-tiny has the highest AUC value but ConvNext-tiny detects all the wildfire events and has the lowest false alarm rate in our dataset.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer
Authors:
Hongyi Pan,
Xin Zhu,
Salih Atici,
Ahmet Enis Cetin
Abstract:
In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computin…
▽ More
In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computing the HT is simply the application of a Hadamard gate to each qubit individually, so the HT computations of our proposed layer can be implemented on a quantum computer. Compared to the regular Conv2D layer, the proposed HT-perceptron layer is computationally more efficient. Compared to a CNN with the same number of trainable parameters and 99.26\% test accuracy, our HT network reaches 99.31\% test accuracy with 57.1\% MACs reduced in the MNIST dataset; and in our ImageNet-1K experiments, our HT-based ResNet-50 exceeds the accuracy of the baseline ResNet-50 by 0.59\% center-crop top-1 accuracy using 11.5\% fewer parameters with 12.6\% fewer MACs.
△ Less
Submitted 22 February, 2024; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets
Authors:
Hongyi Pan,
Emadeldeen Hamdan,
Xin Zhu,
Salih Atici,
Ahmet Enis Cetin
Abstract:
In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolut…
▽ More
In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolution theorems, convolutional filtering operations are performed in the transform domain using element-wise multiplications. Trainable soft-thresholding layers, that remove noise in the transform domain, bring nonlinearity to the transform domain layers. Compared to the Conv2D layer, which is spatial-agnostic and channel-specific, the proposed layers are location-specific and channel-specific. Moreover, these proposed layers reduce the number of parameters and multiplications significantly while improving the accuracy results of regular ResNets on the ImageNet-1K classification task. Furthermore, they can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy.
△ Less
Submitted 22 April, 2024; v1 submitted 12 March, 2023;
originally announced March 2023.
-
Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks
Authors:
Salih Atici,
Hongyi Pan,
Ahmet Enis Cetin
Abstract:
In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our al…
▽ More
In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%.
△ Less
Submitted 26 June, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
DCT Perceptron Layer: A Transform Domain Approach for Convolution Layer
Authors:
Hongyi Pan,
Xin Zhu,
Salih Atici,
Ahmet Enis Cetin
Abstract:
In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding…
▽ More
In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding layer is used as the nonlinearity in the DCT perceptron. Compared to ResNet's Conv2D layer which is spatial-agnostic and channel-specific, the proposed layer is location-specific and channel-specific. The DCT-perceptron layer reduces the number of parameters and multiplications significantly while maintaining comparable accuracy results of regular ResNets in CIFAR-10 and ImageNet-1K. Moreover, the DCT-perceptron layer can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Classification of the Cervical Vertebrae Maturation (CVM) stages Using the Tripod Network
Authors:
Salih Atici,
Hongyi Pan,
Mohammed H. Elnagar,
Veerasathpurush Allareddy,
Omar Suhaym,
Rashid Ansari,
Ahmet Enis Cetin
Abstract:
We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray im…
▽ More
We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray images. Outputs of the three parallel networks are combined using a fully connected layer. 1018 cephalometric radiographs were labeled, divided by gender, and classified according to the CVM stages. Resulting images, using different training techniques and patches, were used to train TripodNet together with a set of tunable directional edge enhancers. Data augmentation is implemented to avoid overfitting. TripodNet achieves the state-of-the-art accuracy of 81.18\% in female patients and 75.32\% in male patients. The proposed TripodNet achieves a higher accuracy in our dataset than the Swin Transformers and the previous network models that we investigated for CVM stage estimation.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Real-time Wireless ECG-derived Respiration Rate Estimation Using an Autoencoder with a DCT Layer
Authors:
Hongyi Pan,
Xin Zhu,
Zhilu Ye,
Pai-Yen Chen,
Ahmet Enis Cetin
Abstract:
In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect…
▽ More
In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect results due to uneven breathing. To improve the estimation precision, we propose a neural network that uses a novel Discrete Cosine Transform (DCT) layer to denoise and decorrelates the data. The DCT layer has trainable weights and soft-thresholds in the transform domain. In our dataset, we improve the Mean Squared Error (MSE) and Mean Absolute Error (MAE) of the Fourier analysis-based approach using our novel neural network with the DCT layer.
△ Less
Submitted 16 February, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Policy Gradient With Serial Markov Chain Reasoning
Authors:
Edoardo Cetin,
Oya Celiktutan
Abstract:
We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process. We model agent behavior as the steady-state distribution of a parameterized reasoning Markov chain (RMC), optimized with a new tractable estimate of the policy gradient. We perform action selection by simulating the RMC for enough reasoning steps to approach its steady-state…
▽ More
We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process. We model agent behavior as the steady-state distribution of a parameterized reasoning Markov chain (RMC), optimized with a new tractable estimate of the policy gradient. We perform action selection by simulating the RMC for enough reasoning steps to approach its steady-state distribution. We show our framework has several useful properties that are inherently missing from traditional RL. For instance, it allows agent behavior to approximate any continuous distribution over actions by parameterizing the RMC with a simple Gaussian transition function. Moreover, the number of reasoning steps to reach convergence can scale adaptively with the difficulty of each action selection decision and can be accelerated by re-using past solutions. Our resulting algorithm achieves state-of-the-art performance in popular Mujoco and DeepMind Control benchmarks, both for proprioceptive and pixel-based tasks.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Hyperbolic Deep Reinforcement Learning
Authors:
Edoardo Cetin,
Benjamin Chamberlain,
Michael Bronstein,
Jonathan J Hunt
Abstract:
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry p…
▽ More
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Multipod Convolutional Network
Authors:
Hongyi Pan,
Salih Atici,
Ahmet Enis Cetin
Abstract:
In this paper, we introduce a convolutional network which we call MultiPodNet consisting of a combination of two or more convolutional networks which process the input image in parallel to achieve the same goal. Output feature maps of parallel convolutional networks are fused at the fully connected layer of the network. We experimentally observed that three parallel pod networks (TripodNet) produc…
▽ More
In this paper, we introduce a convolutional network which we call MultiPodNet consisting of a combination of two or more convolutional networks which process the input image in parallel to achieve the same goal. Output feature maps of parallel convolutional networks are fused at the fully connected layer of the network. We experimentally observed that three parallel pod networks (TripodNet) produce the best results in commonly used object recognition datasets. Baseline pod networks can be of any type. In this paper, we use ResNets as baseline networks and their inputs are augmented image patches. The number of parameters of the TripodNet is about three times that of a single ResNet. We train the TripodNet using the standard backpropagation type algorithms. In each individual ResNet, parameters are initialized with different random numbers during training. The TripodNet achieved state-of-the-art performance on CIFAR-10 and ImageNet datasets. For example, it improved the accuracy of a single ResNet from 91.66% to 92.47% under the same training process on the CIFAR-10 dataset.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Authors:
Edoardo Cetin,
Philip J. Ball,
Steve Roberts,
Oya Celiktutan
Abstract:
Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutio…
▽ More
Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. We show that this new visual deadly triad causes unstable training and premature convergence to degenerate solutions, a phenomenon we name catastrophic self-overfitting. Based on our analysis, we propose A-LIX, a method providing adaptive regularization to the encoder's gradients that explicitly prevents the occurrence of catastrophic self-overfitting using a dual objective. By applying A-LIX, we significantly outperform the prior state-of-the-art on the DeepMind Control and Atari 100k benchmarks without any data augmentation or auxiliary losses.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Flexible-Rate Learned Hierarchical Bi-Directional Video Compression With Motion Refinement and Frame-Level Bit Allocation
Authors:
Eren Cetin,
M. Akin Yilmaz,
A. Murat Tekalp
Abstract:
This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted…
▽ More
This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted the gain unit proposed for image compression to flexible-rate video compression in two ways: first, the gain unit enables a single encoder model to operate at multiple rate-distortion operating points; second, we exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate that we obtain state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks
Authors:
Hongyi Pan,
Diaa Badawi,
Ahmet Enis Cetin
Abstract:
Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dime…
▽ More
Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dimensional (1-D) and two-dimensional (2-D) binary WHTs in this paper. In both 1-D and 2-D layers, we compute the binary WHT of the input feature map and denoise the WHT domain coefficients using a nonlinearity which is obtained by combining soft-thresholding with the tanh function. After denoising, we compute the inverse WHT. We use 1D-WHT to replace the $1\times 1$ convolutional layers, and 2D-WHT layers can replace the 3$\times$3 convolution layers and Squeeze-and-Excite layers. 2D-WHT layers with trainable weights can be also inserted before the Global Average Pooling (GAP) layers to assist the dense layers. In this way, we can reduce the number of trainable parameters significantly with a slight decrease in trainable parameters. In this paper, we implement the WHT layers into MobileNet-V2, MobileNet-V3-Large, and ResNet to reduce the number of parameters significantly with negligible accuracy loss. Moreover, according to our speed test, the 2D-FWHT layer runs about 24 times as fast as the regular $3\times 3$ convolution with 19.51\% less RAM usage in an NVIDIA Jetson Nano experiment.
△ Less
Submitted 27 January, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Detecting Anomaly in Chemical Sensors via L1-Kernels based Principal Component Analysis
Authors:
Hongyi Pan,
Diaa Badawi,
Ishaan Bassi,
Sule Ozev,
Ahmet Enis Cetin
Abstract:
We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the diffe…
▽ More
We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the difference between the actual sensor data and the reconstructed data from the dominant eigenvector. In this paper, we introduce a new multiplication-free kernel, which is related to the l1-norm for the anomaly detection task. The l1-kernel PCA is not only computationally efficient but also energy-efficient because it does not require any actual multiplications during the kernel covariance matrix computation. Our experimental results show that our kernel-PCA method achieves a higher area under curvature (AUC) score (0.7483) than the baseline regular PCA method (0.7366).
△ Less
Submitted 28 September, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Multiplication-Avoiding Variant of Power Iteration with Applications
Authors:
Hongyi Pan,
Diaa Badawi,
Runxuan Miao,
Erdem Koyuncu,
Ahmet Enis Cetin
Abstract:
Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that…
▽ More
Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that appear at the regular power iteration (RPI) with multiplication-free vector products which are Mercer-type kernel operations related with the $\ell_1$ norm. Precisely, for an $n\times n$ matrix, MAPI requires $n$ multiplications, while RPI needs $n^2$ multiplications per iteration. Therefore, MAPI provides a significant reduction of the number of multiplication operations, which are known to be costly in terms of energy consumption. We provide applications of MAPI to PCA-based image reconstruction as well as to graph-based ranking algorithms. When compared to RPI, MAPI not only typically converges much faster, but also provides superior performance.
△ Less
Submitted 31 January, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Authors:
Edoardo Cetin,
Oya Celiktutan
Abstract:
Popular off-policy deep reinforcement learning algorithms compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. In this work, we propose a novel learnable penalty to enact such pessimism, based on a new way to quantify the critic's epistemic uncertainty. Furthermore, we propose to learn the penalty alongside the cr…
▽ More
Popular off-policy deep reinforcement learning algorithms compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. In this work, we propose a novel learnable penalty to enact such pessimism, based on a new way to quantify the critic's epistemic uncertainty. Furthermore, we propose to learn the penalty alongside the critic with dual TD-learning, a strategy to estimate and minimize the bias magnitude in the target returns. Our method enables us to accurately counteract overestimation bias throughout training without incurring the downsides of overly pessimistic targets. Empirically, by integrating our method and other orthogonal improvements with popular off-policy algorithms, we achieve state-of-the-art results in continuous control tasks from both proprioceptive and pixel observations.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Learning Routines for Effective Off-Policy Reinforcement Learning
Authors:
Edoardo Cetin,
Oya Celiktutan
Abstract:
The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such const…
▽ More
The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of 'equivalent' sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
High Resolution Time-Frequency Generation with Generative Adversarial Networks
Authors:
Zeynel Deprem,
A. Enis Çetin
Abstract:
Signal representation in Time-Frequency (TF) domain is valuable in many applications including radar imaging and inverse synthetic aparture radar. TF representation allows us to identify signal components or features in a mixed time and frequency plane. There are several well-known tools, such as Wigner-Ville Distribution (WVD), Short-Time Fourier Transform (STFT) and various other variants for su…
▽ More
Signal representation in Time-Frequency (TF) domain is valuable in many applications including radar imaging and inverse synthetic aparture radar. TF representation allows us to identify signal components or features in a mixed time and frequency plane. There are several well-known tools, such as Wigner-Ville Distribution (WVD), Short-Time Fourier Transform (STFT) and various other variants for such a purpose. The main requirement for a TF representation tool is to give a high-resolution view of the signal such that the signal components or features are identifiable. A commonly used method is the reassignment process which reduces the cross-terms by artificially moving smoothed WVD values from their actual location to the center of the gravity for that region. In this article, we propose a novel reassignment method using the Conditional Generative Adversarial Network (CGAN). We train a CGAN to perform the reassignment process. Through examples, it is shown that the method generates high-resolution TF representations which are better than the current reassignment methods.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Robust Principal Component Analysis Using a Novel Kernel Related with the L1-Norm
Authors:
Hongyi Pan,
Diaa Badawi,
Erdem Koyuncu,
A. Enis Cetin
Abstract:
We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite ge…
▽ More
We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite generalized covariance matrices, thus enabling principal component analysis (PCA). Moreover, the generalized covariance matrices can be constructed in an Energy Efficient (EEF) manner due to the multiplication-free property of the underlying vector products. We present image reconstruction examples in which our EEF PCA method result in the highest peak signal-to-noise ratios compared to the ordinary $\ell_2$-PCA and the recursive $\ell_1$-PCA.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
IB-DRR: Incremental Learning with Information-Back Discrete Representation Replay
Authors:
Jian Jiang,
Edoardo Cetin,
Oya Celiktutan
Abstract:
Incremental learning aims to enable machine learning models to continuously acquire new knowledge given new classes, while maintaining the knowledge already learned for old classes. Saving a subset of training samples of previously seen classes in the memory and replaying them during new training phases is proven to be an efficient and effective way to fulfil this aim. It is evident that the large…
▽ More
Incremental learning aims to enable machine learning models to continuously acquire new knowledge given new classes, while maintaining the knowledge already learned for old classes. Saving a subset of training samples of previously seen classes in the memory and replaying them during new training phases is proven to be an efficient and effective way to fulfil this aim. It is evident that the larger number of exemplars the model inherits the better performance it can achieve. However, finding a trade-off between the model performance and the number of samples to save for each class is still an open problem for replay-based incremental learning and is increasingly desirable for real-life applications. In this paper, we approach this open problem by tap** into a two-step compression approach. The first step is a lossy compression, we propose to encode input images and save their discrete latent representations in the form of codes that are learned using a hierarchical Vector Quantised Variational Autoencoder (VQ-VAE). In the second step, we further compress codes losslessly by learning a hierarchical latent variable model with bits-back asymmetric numeral systems (BB-ANS). To compensate for the information lost in the first step compression, we introduce an Information Back (IB) mechanism that utilizes real exemplars for a contrastive learning loss to regularize the training of a classifier. By maintaining all seen exemplars' representations in the format of `codes', Discrete Representation Replay (DRR) outperforms the state-of-art method on CIFAR-100 by a margin of 4% accuracy with a much less memory cost required for saving samples. Incorporated with IB and saving a small set of old raw exemplars as well, the accuracy of DRR can be further improved by 2% accuracy.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks
Authors:
Hongyi Pan,
Diaa Dabawi,
Ahmet Enis Cetin
Abstract:
In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-f…
▽ More
In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-free operators from the basic 2$\times$2 Hadamard transform to implement $3\times 3$ depthwise separable convolution layers. Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy. For example, by replacing the final third bottleneck layers, we reduce the number of parameters from 2.270M to 540K. This reduces the accuracy from 95.21\% to 92.98\% on the CIFAR-10 dataset. Our approach significantly improves the speed of data processing. The fast Walsh-Hadamard transform has a computational complexity of $O(m\log_2 m)$. As a result, it is computationally more efficient than the $1\times1$ convolution layer. The fast Walsh-Hadamard layer processes a tensor in $\mathbb{R}^{10\times32\times32\times1024}$ about 2 times faster than $1\times1$ convolution layer on NVIDIA Jetson Nano computer board.
△ Less
Submitted 29 October, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Domain-Robust Visual Imitation Learning with Mutual Information Constraints
Authors:
Edoardo Cetin,
Oya Celiktutan
Abstract:
Human beings are able to understand objectives and learn by simply observing others perform a task. Imitation learning methods aim to replicate such capabilities, however, they generally depend on access to a full set of optimal states and actions taken with the agent's actuators and from the agent's point of view. In this paper, we introduce a new algorithm - called Disentangling Generative Adver…
▽ More
Human beings are able to understand objectives and learn by simply observing others perform a task. Imitation learning methods aim to replicate such capabilities, however, they generally depend on access to a full set of optimal states and actions taken with the agent's actuators and from the agent's point of view. In this paper, we introduce a new algorithm - called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) - with the purpose of bypassing such constraints. Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task, by making use of adversarial learning with a latent representation inside the discriminator network. Such latent representation is regularized through mutual information constraints to incentivize learning only features that encode information about the completion levels of the task being demonstrated. This allows to obtain a shared feature space to successfully perform imitation while disregarding the differences between the expert's and the agent's domains. Empirically, our algorithm is able to efficiently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators
Authors:
Shamma Nasrin,
Diaa Badawi,
Ahmet Enis Cetin,
Wilfred Gomes,
Amit Ranjan Trivedi
Abstract:
We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on ell_1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM…
▽ More
We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on ell_1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our 8$\times$62 SRAM macro, which requires a 5-bit ADC, achieves $\sim$105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS.
△ Less
Submitted 29 January, 2021;
originally announced February 2021.
-
Discrete Cosine Transform Based Causal Convolutional Neural Network for Drift Compensation in Chemical Sensors
Authors:
Diaa Badawi,
Agamyrat Agambayev,
Sule Ozev,
A. Enis Cetin
Abstract:
Sensor drift is a major problem in chemical sensors that requires addressing for reliable and accurate detection of chemical analytes. In this paper, we develop a causal convolutional neural network (CNN) with a Discrete Cosine Transform (DCT) layer to estimate the drift signal. In the DCT module, we apply soft-thresholding nonlinearity in the transform domain to denoise the data and obtain a spar…
▽ More
Sensor drift is a major problem in chemical sensors that requires addressing for reliable and accurate detection of chemical analytes. In this paper, we develop a causal convolutional neural network (CNN) with a Discrete Cosine Transform (DCT) layer to estimate the drift signal. In the DCT module, we apply soft-thresholding nonlinearity in the transform domain to denoise the data and obtain a sparse representation of the drift signal. The soft-threshold values are learned during training. Our results show that DCT layer-based CNNs are able to produce a slowly varying baseline drift signal. We train the CNN on synthetic data and test it on real chemical sensor data. Our results show that we can have an accurate and smooth drift estimate even when the observed sensor signal is very noisy.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Robust and Computationally-Efficient Anomaly Detection using Powers-of-Two Networks
Authors:
Usama Muneeb,
Erdem Koyuncu,
Yasaman Keshtkarjahromi,
Hulya Seferoglu,
Mehmet Fatih Erden,
Ahmet Enis Cetin
Abstract:
Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of t…
▽ More
Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of the CNN and by using powers-of-two weights, which replaces the computationally expensive multiplication operations with bit-shift operations. Denoising operation during inference forces small valued intermediate layer outputs to zero. The number of zeros in the network significantly increases as a result of denoising, we can implement the CNN about 10% faster than a comparable network while detecting all the anomalies in the testing set. It turns out that denoising operation also provides robustness because the contribution of small intermediate values to the final result is negligible. During training we also generate motion vector images by a Generative Adversarial Network (GAN) to improve the robustness of the overall system. We experimentally observe that the resulting system is robust to background motion.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Detecting Gas Vapor Leaks Using Uncalibrated Sensors
Authors:
Diaa Badawi,
Tuba Ayhan,
Sule Ozev,
Chengmo Yang,
Alex Orailoglu,
A. Enis Çetin
Abstract:
Chemical and infra-red sensors generate distinct responses under similar conditions because of sensor drift, noise or resolution errors. In this work, we use different time-series data sets obtained by infra-red and E-nose sensors in order to detect Volatile Organic Compounds (VOCs) and Ammonia vapor leaks. We process time-series sensor signals using deep neural networks (DNN). Three neural networ…
▽ More
Chemical and infra-red sensors generate distinct responses under similar conditions because of sensor drift, noise or resolution errors. In this work, we use different time-series data sets obtained by infra-red and E-nose sensors in order to detect Volatile Organic Compounds (VOCs) and Ammonia vapor leaks. We process time-series sensor signals using deep neural networks (DNN). Three neural network algorithms are utilized for this purpose. Additive neural networks (termed AddNet) are based on a multiplication-devoid operator and consequently exhibit energy-efficiency compared to regular neural networks. The second algorithm uses generative adversarial neural networks so as to expose the classifying neural network to more realistic data points in order to help the classifier network to deliver improved generalization. Finally, we use conventional convolutional neural networks as a baseline method and compare their performance with the two aforementioned deep neural network algorithms in order to evaluate their effectiveness empirically.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
EEG Classification by factoring in Sensor Configuration
Authors:
Lubna Shibly Mokatren,
Rashid Ansari,
Ahmet Enis Cetin,
Alex D Leow,
Heide Klumpp,
Olusola Ajilore,
Fatos Yarman Vural
Abstract:
Electroencephalography (EEG) serves as an effective diagnostic tool for mental disorders and neurological abnormalities. Enhanced analysis and classification of EEG signals can help improve detection performance. A new approach is examined here for enhancing EEG classification performance by leveraging knowledge of spatial layout of EEG sensors. Performance of two classification models - model 1 t…
▽ More
Electroencephalography (EEG) serves as an effective diagnostic tool for mental disorders and neurological abnormalities. Enhanced analysis and classification of EEG signals can help improve detection performance. A new approach is examined here for enhancing EEG classification performance by leveraging knowledge of spatial layout of EEG sensors. Performance of two classification models - model 1 that ignores the sensor layout and model 2 that factors it in - is investigated and found to achieve consistently higher detection accuracy. The analysis is based on the information content of these signals represented in two different ways: concatenation of the channels of the frequency bands and an image-like 2D representation of the EEG channel locations. Performance of these models is examined on two tasks, social anxiety disorder (SAD) detection, and emotion recognition using a dataset for emotion analysis using physiological signals (DEAP). We hypothesized that model 2 will significantly outperform model 1 and this was validated in our results as model 2 yielded $5$--$8\%$ higher accuracy in all machine learning algorithms investigated. Convolutional Neural Networks (CNN) provided the best performance far exceeding that of Support Vector Machine (SVM) and k-Nearest Neighbors (kNNs) algorithms.
△ Less
Submitted 7 February, 2020; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Deep Layered LMS Predictor
Authors:
Lubna Shibly Mokatren,
Ahmet Enis Cetin,
Rashid Ansari
Abstract:
In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuro…
▽ More
In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuron in the hidden layers corresponds to a certain FIR filter output which goes through nonlinearity. The output of the last layer is the prediction. We hypothesize that this approach will outperform the traditional adaptive filters.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Deep Convolutional Generative Adversarial Networks Based Flame Detection in Video
Authors:
Süleyman Aslan,
Uğur Güdükbay,
B. Uğur Töreyin,
A. Enis Çetin
Abstract:
Real-time flame detection is crucial in video based surveillance systems. We propose a vision-based method to detect flames using Deep Convolutional Generative Adversarial Neural Networks (DCGANs). Many existing supervised learning approaches using convolutional neural networks do not take temporal information into account and require substantial amount of labeled data. In order to have a robust r…
▽ More
Real-time flame detection is crucial in video based surveillance systems. We propose a vision-based method to detect flames using Deep Convolutional Generative Adversarial Neural Networks (DCGANs). Many existing supervised learning approaches using convolutional neural networks do not take temporal information into account and require substantial amount of labeled data. In order to have a robust representation of sequences with and without flame, we propose a two-stage training of a DCGAN exploiting spatio-temporal flame evolution. Our training framework includes the regular training of a DCGAN with real spatio-temporal images, namely, temporal slice images, and noise vectors, and training the discriminator separately using the temporal flame images without the generator. Experimental results show that the proposed method effectively detects flame in video with negligible false positive rates in real-time.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
EEG Classification based on Image Configuration in Social Anxiety Disorder
Authors:
Lubna Shibly Mokatren,
Rashid Ansari,
Ahmet Enis Cetin,
Alex D. Leow,
Olusola Ajilore,
Heide Klumpp,
Fatos T. Yarman Vural
Abstract:
The problem of detecting the presence of Social Anxiety Disorder (SAD) using Electroencephalography (EEG) for classification has seen limited study and is addressed with a new approach that seeks to exploit the knowledge of EEG sensor spatial configuration. Two classification models, one which ignores the configuration (model 1) and one that exploits it with different interpolation methods (model…
▽ More
The problem of detecting the presence of Social Anxiety Disorder (SAD) using Electroencephalography (EEG) for classification has seen limited study and is addressed with a new approach that seeks to exploit the knowledge of EEG sensor spatial configuration. Two classification models, one which ignores the configuration (model 1) and one that exploits it with different interpolation methods (model 2), are studied. Performance of these two models is examined for analyzing 34 EEG data channels each consisting of five frequency bands and further decomposed with a filter bank. The data are collected from 64 subjects consisting of healthy controls and patients with SAD. Validity of our hypothesis that model 2 will significantly outperform model 1 is borne out in the results, with accuracy $6$--$7\%$ higher for model 2 for each machine learning algorithm we investigated. Convolutional Neural Networks (CNN) were found to provide much better performance than SVM and kNNs.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
Energy Efficient Hadamard Neural Networks
Authors:
T. Ceren Deveci,
Serdar Cakir,
A. Enis Cetin
Abstract:
Deep learning has made significant improvements at many image processing tasks in recent years, such as image classification, object recognition and object detection. Convolutional neural networks (CNN), which is a popular deep learning architecture designed to process data in multiple array form, show great success to almost all detection \& recognition problems and computer vision tasks. However…
▽ More
Deep learning has made significant improvements at many image processing tasks in recent years, such as image classification, object recognition and object detection. Convolutional neural networks (CNN), which is a popular deep learning architecture designed to process data in multiple array form, show great success to almost all detection \& recognition problems and computer vision tasks. However, the number of parameters in a CNN is too high such that the computers require more energy and larger memory size. In order to solve this problem, we propose a novel energy efficient model Binary Weight and Hadamard-transformed Image Network (BWHIN), which is a combination of Binary Weight Network (BWN) and Hadamard-transformed Image Network (HIN). It is observed that energy efficiency is achieved with a slight sacrifice at classification accuracy. Among all energy efficient networks, our novel ensemble model outperforms other energy efficient models.
△ Less
Submitted 14 May, 2018;
originally announced May 2018.
-
Projection onto Epigraph Sets for Rapid Self-Tuning Compressed Sensing MRI
Authors:
Mohammad Shahdloo,
Efe Ilicak,
Mohammad Tofighi,
Emine U. Saritas,
A. Enis Çetin,
Tolga Çukur
Abstract:
The compressed sensing (CS) framework leverages the sparsity of MR images to reconstruct from undersampled acquisitions. CS reconstructions involve one or more regularization parameters that weigh sparsity in transform domains against fidelity to acquired data. While parameter selection is critical for reconstruction quality, the optimal parameters are subject and dataset specific. Thus, commonly…
▽ More
The compressed sensing (CS) framework leverages the sparsity of MR images to reconstruct from undersampled acquisitions. CS reconstructions involve one or more regularization parameters that weigh sparsity in transform domains against fidelity to acquired data. While parameter selection is critical for reconstruction quality, the optimal parameters are subject and dataset specific. Thus, commonly practiced heuristic parameter selection generalizes poorly to independent datasets. Recent studies have proposed to tune parameters by estimating the risk of removing significant image coefficients. Line searches are performed across the parameter space to identify the parameter value that minimizes this risk. Although effective, these line searches yield prolonged reconstruction times. Here, we propose a new self-tuning CS method for multi-coil multi-acquisition reconstructions. The proposed method uses computationally efficient projections onto epigraph sets of the $l_1$ and total-variation norms to simultaneously achieve parameter selection and regularization. In vivo demonstrations are provided for balanced steady-state free precession, time-of-flight, and T1-weighted imaging. The proposed method achieves nearly an order of magnitude improvement in computational efficiency over line-search methods while maintaining near-optimal parameter selection.
△ Less
Submitted 25 January, 2019; v1 submitted 6 February, 2018;
originally announced February 2018.
-
A Blind Deconvolution Technique Based on Projection Onto Convex Sets for Magnetic Particle Imaging
Authors:
Onur Yorulmaz,
Omer Burak Demirel,
Yavuz Muslu,
Tolga Çukur,
Emine U Saritas,
A Enis Çetin
Abstract:
Magnetic Particle Imaging (MPI) is an emerging imaging modality that maps the spatial distribution of magnetic nanoparticles. The x-space reconstruction in MPI results in highly blurry images, where the resolution depends on both system parameters and nanoparticle type. Previous techniques to counteract this blurring rely on the knowledge of the imaging point spread function (PSF), which may not b…
▽ More
Magnetic Particle Imaging (MPI) is an emerging imaging modality that maps the spatial distribution of magnetic nanoparticles. The x-space reconstruction in MPI results in highly blurry images, where the resolution depends on both system parameters and nanoparticle type. Previous techniques to counteract this blurring rely on the knowledge of the imaging point spread function (PSF), which may not be available or may require additional measurements. This work proposes a blind deconvolution algorithm for MPI to recover the precise spatial distribution of nanoparticles. The proposed algorithm exploits the observation that the imaging PSF in MPI has zero phase in Fourier domain. Thus, even though the reconstructed images are highly blurred, phase remains unaltered. We leverage this powerful property to iteratively enforce consistency of phase and bounded l1 energy information, using an orthogonal Projections Onto Convex Sets (POCS) algorithm. To demonstrate the method, comprehensive simulations were performed without and with nanoparticle relaxation effects, and at various noise levels. In addition, imaging experiments were performed on an in-house MPI scanner using a three-vial phantom that contained different nanoparticle types. Image quality was compared with conventional deconvolution methods, Wiener deconvolution and Lucy-Richardson method, which explicitly rely on the knowledge of PSF. Both the simulation results and experimental imaging results show that the proposed blind deconvolution algorithm outperforms the conventional deconvolution methods. Without utilizing the imaging PSF, the proposed algorithm improves image quality and resolution even in the case of different nanoparticle types, while displaying reliable performance against loss of the fundamental harmonic, nanoparticle relaxation effects, and noise.
△ Less
Submitted 29 January, 2020; v1 submitted 21 May, 2017;
originally announced May 2017.
-
Energy Saving Additive Neural Network
Authors:
Arman Afrasiyabi,
Ozan Yildiz,
Baris Nasir,
Fatos T. Yarman Vural,
A. Enis Cetin
Abstract:
In recent years, machine learning techniques based on neural networks for mobile computing become increasingly popular. Classical multi-layer neural networks require matrix multiplications at each stage. Multiplication operation is not an energy efficient operation and consequently it drains the battery of the mobile device. In this paper, we propose a new energy efficient neural network with the…
▽ More
In recent years, machine learning techniques based on neural networks for mobile computing become increasingly popular. Classical multi-layer neural networks require matrix multiplications at each stage. Multiplication operation is not an energy efficient operation and consequently it drains the battery of the mobile device. In this paper, we propose a new energy efficient neural network with the universal approximation property over space of Lebesgue integrable functions. This network, called, additive neural network, is very suitable for mobile computing. The neural structure is based on a novel vector product definition, called ef-operator, that permits a multiplier-free implementation. In ef-operation, the "product" of two real numbers is defined as the sum of their absolute values, with the sign determined by the sign of the product of the numbers. This "product" is used to construct a vector product in $R^N$. The vector product induces the $l_1$ norm. The proposed additive neural network successfully solves the XOR problem. The experiments on MNIST dataset show that the classification performances of the proposed additive neural networks are very similar to the corresponding multi-layer perceptron and convolutional neural networks (LeNet).
△ Less
Submitted 8 February, 2017;
originally announced February 2017.
-
Analytic Properties of the Sum $B_{1}(h,k)$
Authors:
Elif Cetin
Abstract:
In \cite{csc}, Cetin et al. defined a new special finite sum which is denoted by $B_{1}(h,k)$. In this paper, with the help of the Hardy and Dedekind sums we will give many properties of the sum $B_{1}(h,k).$ Then we will give the connections of this sum with the other well-known finite sums such as the Dedekind sums, the Hardy sums, the Simsek sums $Y(h,k)$ and the sum $C_{1}(h,k)$. By using the…
▽ More
In \cite{csc}, Cetin et al. defined a new special finite sum which is denoted by $B_{1}(h,k)$. In this paper, with the help of the Hardy and Dedekind sums we will give many properties of the sum $B_{1}(h,k).$ Then we will give the connections of this sum with the other well-known finite sums such as the Dedekind sums, the Hardy sums, the Simsek sums $Y(h,k)$ and the sum $C_{1}(h,k)$. By using the Fibonacci numbers and two-term polynomial relation, we will also give a new property of the sum $B_{1}(h,k)$.
△ Less
Submitted 17 April, 2016;
originally announced April 2016.
-
Ultrafast broadband tuning of resonant optical nanostructures using phase change materials
Authors:
Miquel Rudé,
Vahagn Mkhitaryan,
Arif E. Cetin,
Timothy A. Miller,
Albert Carrilero,
Simon Wall,
F. Javier García de Abajo,
Hatice Altug,
Valerio Pruneri
Abstract:
The functionalities of a wide range of optical and opto-electronic devices are based on resonance effects and active tuning of the amplitude and wavelength response is often essential. Plasmonic nanostructures are an efficient way to create optical resonances, a prominent example is the extraordinary optical transmission (EOT) through arrays of nanoholes patterned in a metallic film. Tuning of res…
▽ More
The functionalities of a wide range of optical and opto-electronic devices are based on resonance effects and active tuning of the amplitude and wavelength response is often essential. Plasmonic nanostructures are an efficient way to create optical resonances, a prominent example is the extraordinary optical transmission (EOT) through arrays of nanoholes patterned in a metallic film. Tuning of resonances by heating, applying electrical or optical signals has proven to be more elusive, due to the lack of materials that can induce modulation over a broad spectral range and/or at high speeds. Here we show that nanopatterned metals combined with phase change materials (PCMs) can overcome this limitation due to the large change in optical constants which can be induced thermally or on an ultrafast timescale. We demonstrate resonance wavelength shifts as large as 385 nm - an order of magnitude higher than previously reported - by combining properly designed Au EOT nanostructures with Ge2Sb2Te5 (GST). Moreover, we show, through pump probe measurements, repeatable and reversible, large amplitude modulations in the resonances, especially at telecommunication wavelengths, over ps time scales and at powers far below those needed to produce a permanent phase transition. Our findings open a pathway to the design of hybrid metal PCM nanostructures with ultrafast and widely tuneable resonance responses, which hold potential impact on active nanophotonic devices such as tuneable optical filters, smart windows, biosensors and reconfigurable memories.
△ Less
Submitted 29 October, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.
-
Phase and TV Based Convex Sets for Blind Deconvolution of Microscopic Images
Authors:
Mohammad Tofighi,
Onur Yorulmaz,
A. Enis Cetin
Abstract:
In this article, two closed and convex sets for blind deconvolution problem are proposed. Most blurring functions in microscopy are symmetric with respect to the origin. Therefore, they do not modify the phase of the Fourier transform (FT) of the original image. As a result blurred image and the original image have the same FT phase. Therefore, the set of images with a prescribed FT phase can be u…
▽ More
In this article, two closed and convex sets for blind deconvolution problem are proposed. Most blurring functions in microscopy are symmetric with respect to the origin. Therefore, they do not modify the phase of the Fourier transform (FT) of the original image. As a result blurred image and the original image have the same FT phase. Therefore, the set of images with a prescribed FT phase can be used as a constraint set in blind deconvolution problems. Another convex set that can be used during the image reconstruction process is the epigraph set of Total Variation (TV) function. This set does not need a prescribed upper bound on the total variation of the image. The upper bound is automatically adjusted according to the current image of the restoration process. Both of these two closed and convex sets can be used as a part of any blind deconvolution algorithm. Simulation examples are presented.
△ Less
Submitted 16 March, 2015;
originally announced March 2015.
-
Cosine Similarity Measure According to a Convex Cost Function
Authors:
Osman Gunay,
Cem Emre Akbas,
A. Enis Cetin
Abstract:
In this paper, we describe a new vector similarity measure associated with a convex cost function. Given two vectors, we determine the surface normals of the convex function at the vectors. The angle between the two surface normals is the similarity measure. Convex cost function can be the negative entropy function, total variation (TV) function and filtered variation function. The convex cost fun…
▽ More
In this paper, we describe a new vector similarity measure associated with a convex cost function. Given two vectors, we determine the surface normals of the convex function at the vectors. The angle between the two surface normals is the similarity measure. Convex cost function can be the negative entropy function, total variation (TV) function and filtered variation function. The convex cost function need not be differentiable everywhere. In general, we need to compute the gradient of the cost function to compute the surface normals. If the gradient does not exist at a given vector, it is possible to use the subgradients and the normal producing the smallest angle between the two vectors is used to compute the similarity measure.
△ Less
Submitted 22 October, 2014;
originally announced October 2014.
-
Classifying Fonts and Calligraphy Styles Using Complex Wavelet Transform
Authors:
Alican Bozkurt,
Pinar Duygulu,
A. Enis Cetin
Abstract:
Recognizing fonts has become an important task in document analysis, due to the increasing number of available digital documents in different fonts and emphases. A generic font-recognition system independent of language, script and content is desirable for processing various types of documents. At the same time, categorizing calligraphy styles in handwritten manuscripts is important for palaeograp…
▽ More
Recognizing fonts has become an important task in document analysis, due to the increasing number of available digital documents in different fonts and emphases. A generic font-recognition system independent of language, script and content is desirable for processing various types of documents. At the same time, categorizing calligraphy styles in handwritten manuscripts is important for palaeographic analysis, but has not been studied sufficiently in the literature. We address the font-recognition problem as analysis and categorization of textures. We extract features using complex wavelet transform and use support vector machines for classification. Extensive experimental evaluations on different datasets in four languages and comparisons with state-of-the-art studies show that our proposed method achieves higher recognition accuracy while being computationally simpler. Furthermore, on a new dataset generated from Ottoman manuscripts, we show that the proposed method can also be used for categorizing Ottoman calligraphy with high accuracy.
△ Less
Submitted 9 July, 2014;
originally announced July 2014.
-
Denosing Using Wavelets and Projections onto the L1-Ball
Authors:
A. Enis Cetin,
Mohammad Tofighi
Abstract:
Both wavelet denoising and denosing methods using the concept of sparsity are based on soft-thresholding. In sparsity based denoising methods, it is assumed that the original signal is sparse in some transform domains such as the wavelet domain and the wavelet subsignals of the noisy signal are projected onto L1-balls to reduce noise. In this lecture note, it is shown that the size of the L1-ball…
▽ More
Both wavelet denoising and denosing methods using the concept of sparsity are based on soft-thresholding. In sparsity based denoising methods, it is assumed that the original signal is sparse in some transform domains such as the wavelet domain and the wavelet subsignals of the noisy signal are projected onto L1-balls to reduce noise. In this lecture note, it is shown that the size of the L1-ball or equivalently the soft threshold value can be determined using linear algebra. The key step is an orthogonal projection onto the epigraph set of the L1-norm cost function.
△ Less
Submitted 10 June, 2014;
originally announced June 2014.
-
Deconvolution Using Projections Onto The Epigraph Set of a Convex Cost Function
Authors:
Mohammad Tofighi,
Alican Bozkurt,
A. Enis Cetin
Abstract:
A new deconvolution algorithm based on orthogonal projections onto the epigraph set of a convex cost function is presented. In this algorithm, the dimension of the minimization problem is lifted by one and sets corresponding to the cost function are defined. As the utilized cost function is a convex function in $R^N$, the corresponding epigraph set is also a convex set in $R^{N+1}$. The deconvolut…
▽ More
A new deconvolution algorithm based on orthogonal projections onto the epigraph set of a convex cost function is presented. In this algorithm, the dimension of the minimization problem is lifted by one and sets corresponding to the cost function are defined. As the utilized cost function is a convex function in $R^N$, the corresponding epigraph set is also a convex set in $R^{N+1}$. The deconvolution algorithm starts with an arbitrary initial estimate in $R^{N+1}$. At each step of the iterative algorithm, first deconvolution projections are performed onto the epigraphs, later an orthogonal projection is performed onto one of the constraint sets associated with the cost function in a sequential manner. The method provides globally optimal solutions for total-variation, $\ell_1$, $\ell_2$, and entropic cost functions.
△ Less
Submitted 24 February, 2014;
originally announced February 2014.
-
Signal Reconstruction Framework Based On Projections Onto Epigraph Set Of A Convex Cost Function (PESC)
Authors:
Mohammad Tofighi,
Kivanc Kose,
A. Enis Cetin
Abstract:
A new signal processing framework based on making orthogonal Projections onto the Epigraph Set of a Convex cost function (PESC) is developed. In this way it is possible to solve convex optimization problems using the well-known Projections onto Convex Set (POCS) approach. In this algorithm, the dimension of the minimization problem is lifted by one and a convex set corresponding to the epigraph of…
▽ More
A new signal processing framework based on making orthogonal Projections onto the Epigraph Set of a Convex cost function (PESC) is developed. In this way it is possible to solve convex optimization problems using the well-known Projections onto Convex Set (POCS) approach. In this algorithm, the dimension of the minimization problem is lifted by one and a convex set corresponding to the epigraph of the cost function is defined. If the cost function is a convex function in $R^N$, the corresponding epigraph set is also a convex set in R^{N+1}. The PESC method provides globally optimal solutions for total-variation (TV), filtered variation (FV), L_1, L_2, and entropic cost function based convex optimization problems. In this article, the PESC based denoising and compressive sensing algorithms are developed. Simulation examples are presented.
△ Less
Submitted 10 February, 2014;
originally announced February 2014.