-
Estimating Visual Information From Audio Through Manifold Learning
Authors:
Fabrizio Pedersoli,
Dryden Wiebe,
Amin Banitalebi,
Yong Zhang,
George Tzanetakis,
Kwang Moo Yi
Abstract:
We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for app…
▽ More
We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for applications in which only visual information is of interest Our framework is based on Manifold Learning and consists of two steps. First, we train a Vector-Quantized Variational Auto-Encoder to learn the data manifold of the particular visual modality we are interested in. Second, we train an Audio Transformation network to map multi-channel audio signals to the latent representation of the corresponding visual sample. We show that our method is able to produce meaningful images from audio using a publicly available audio/visual dataset. In particular, we consider the prediction of the following visual modalities from audio: depth and semantic segmentation. We hope the findings of our work can facilitate further research in visual information extraction from audio. Code is available at: https://github.com/ubc-vision/audio_manifold.
△ Less
Submitted 13 September, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
A Technique Based on Chaos for Brain Computer Interfacing
Authors:
A. Banitalebi,
S. K. Setarehdan,
G. A. Hossein-Zadeh
Abstract:
A user of Brain Computer Interface (BCI) system must be able to control external computer devices with brain activity. Although the proof-of-concept was given decades ago, the reliable translation of user intent into device control commands is still a major challenge. There are problems associated with classification of different BCI tasks. In this paper we propose the use of chaotic indices of th…
▽ More
A user of Brain Computer Interface (BCI) system must be able to control external computer devices with brain activity. Although the proof-of-concept was given decades ago, the reliable translation of user intent into device control commands is still a major challenge. There are problems associated with classification of different BCI tasks. In this paper we propose the use of chaotic indices of the BCI. We use largest Lyapunov exponent, mutual information, correlation dimension and minimum embedding dimension as the features for the classification of EEG signals which have been released by BCI Competition IV. A multi-layer Perceptron classifier and a KM- SVM(support vector machine classifier based on k-means clustering) is used for classification process, which lead us to an accuracy of 95.5%, for discrimination between two motor imagery tasks.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Robust LSB Watermarking Optimized for Local Structural Similarity
Authors:
Amin Banitalebi,
Said Nader-Esfahani,
Alireza Nasiri Avanaki
Abstract:
Growth of the Internet and networked multimedia systems has emphasized the need for copyright protection of the media. Media can be images, audio clips, videos and etc. Digital watermarking is today extensively used for many applications such as authentication of ownership or identification of illegal copies. Digital watermark is an invisible or maybe visible structure added to the original media…
▽ More
Growth of the Internet and networked multimedia systems has emphasized the need for copyright protection of the media. Media can be images, audio clips, videos and etc. Digital watermarking is today extensively used for many applications such as authentication of ownership or identification of illegal copies. Digital watermark is an invisible or maybe visible structure added to the original media (known as asset). Images are considered as communication channel when they are subject to a watermark embedding procedure so in the case of embedding a digital watermark in an image, the capacity of the channel should be considered. There is a trade-off between imperceptibility, robustness and capacity for embedding a watermark in an asset. In the case of image watermarks, it is reasonable that the watermarking algorithm should depend on the content and structure of the image. Conventionally, mean squared error (MSE) has been used as a common distortion measure to assess the quality of the images. Newly developed quality metrics proposed some distortion measures that are based on human visual system (HVS). These metrics show that MSE is not based on HVS and it has a lack of accuracy when dealing with perceptually important signals such as images and videos. SSIM or structural similarity is a state of the art HVS based image quality criterion that has recently been of much interest. In this paper we propose a robust least significant bit (LSB) watermarking scheme which is optimized for structural similarity. The watermark is embedded into a host image through an adaptive algorithm. Various attacks examined on the embedding approach and simulation results revealed the fact that the watermarked sequence can be extracted with an acceptable accuracy after all attacks.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Exploring the Distributed Video Coding in a Quality Assessment Context
Authors:
A. Banitalebi,
H. R. Tohidypour
Abstract:
In the popular video coding trend, the encoder has the task to exploit both spatial and temporal redundancies present in the video sequence, which is a complex procedure. As a result almost all video encoders have five to ten times more complexity than their decoders. In a video compression process, one of the main tasks at the encoder side is motion estimation which is to extract the temporal cor…
▽ More
In the popular video coding trend, the encoder has the task to exploit both spatial and temporal redundancies present in the video sequence, which is a complex procedure. As a result almost all video encoders have five to ten times more complexity than their decoders. In a video compression process, one of the main tasks at the encoder side is motion estimation which is to extract the temporal correlation between frames. Distributed video coding (DVC) proposed the idea that can lead to low complexity encoders and higher complexity decoders. DVC is a new paradigm in video compression based on the information theoretic ideas of Slepian-Wolf and Wyner-Ziv theorems. Wyner-Ziv coding is naturally robust against transmission errors and can be used for joint source and channel coding. Side Information is one of the key components of the Wyner-Ziv decoder. Better side information generation will result in better functionality of Wyner-Ziv coder. In this paper we proposed a new method that can generate side information with a better quality and thus better compression. We have used HVS (human visual system) based image quality metrics as our quality criterion. The motion estimation we used in the decoder is modified due to these metrics such that we could obtain finer side information. The motion compensation is optimized for perceptual quality metrics and leads to better side information generation compared to con- ventional MSE (mean squared error) or SAD (sum of absolute difference) based motion compensation currently used in the literature. Better motion compensation means better compression.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
A Perceptual Based Motion Compensation Technique for Video Coding
Authors:
Amin Banitalebi,
Said Nader-Esfahani,
Alireza Nasiri Avanaki
Abstract:
Motion estimation is one of the important procedures in the all video encoders. Most of the complexity of the video coder depends on the complexity of the motion estimation step. The original motion estimation algorithm has a remarkable complexity and therefore many improvements were proposed to enhance the crude version of the motion estimation. The basic idea of many of these works were to optim…
▽ More
Motion estimation is one of the important procedures in the all video encoders. Most of the complexity of the video coder depends on the complexity of the motion estimation step. The original motion estimation algorithm has a remarkable complexity and therefore many improvements were proposed to enhance the crude version of the motion estimation. The basic idea of many of these works were to optimize some distortion function for mean squared error (MSE) or sum of absolute difference (SAD) in block matching But it is shown that these metrics do not conclude the quality as it is, on the other hand, they are not compatible with the human visual system (HVS). In this paper we explored the usage of the image quality metrics in the video coding and more specific in the motion estimation. We have utilized the perceptual image quality metrics instead of MSE or SAD in the block based motion estimation. Three different metrics have used: structural similarity or SSIM, complex wavelet structural similarity or CW-SSIM, visual information fidelity or VIF. Experimental results showed that usage of the quality criterions can improve the compression rate while the quality remains fix and thus better quality in coded video at the same bit budget.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.