Search | arXiv e-print repository

DNN-Compressed Domain Visual Recognition with Feature Adaptation

Abstract: Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned w… ▽ More Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images. △ Less

Submitted 26 July, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2104.10065 [pdf, other]

Learning-based Compression for Material and Texture Recognition

Authors: Yingpeng Deng, Lina J. Karam

Abstract: Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned w… ▽ More Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. Such a characteristic has been incorporated as part of the scope and requirements of the new emerging JPEG-AI standard. In our work, we adopt the learning-based JPEG-AI framework for performing material and texture recognition using the compressed-domain latent representation at varing bit-rates. For comparison, performance results are presented using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that even though decoded images can degrade the classification performance of the model trained with original images, retraining the model with decoded images will largely reduce the performance gap for the adopted texture dataset. It is also shown that the compressed-domain classification can yield a competitive performance in terms of Top-1 and Top-5 accuracy while using a smaller reduced-complexity classification model. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2011.11957 [pdf, other]

Towards Imperceptible Universal Attacks on Texture Recognition

Authors: Yingpeng Deng, Lina J. Karam

Abstract: Although deep neural networks (DNNs) have been shown to be susceptible to image-agnostic adversarial attacks on natural image classification problems, the effects of such attacks on DNN-based texture recognition have yet to be explored. As part of our work, we find that limiting the perturbation's $l_p$ norm in the spatial domain may not be a suitable way to restrict the perceptibility of universa… ▽ More Although deep neural networks (DNNs) have been shown to be susceptible to image-agnostic adversarial attacks on natural image classification problems, the effects of such attacks on DNN-based texture recognition have yet to be explored. As part of our work, we find that limiting the perturbation's $l_p$ norm in the spatial domain may not be a suitable way to restrict the perceptibility of universal adversarial perturbations for texture images. Based on the fact that human perception is affected by local visual frequency characteristics, we propose a frequency-tuned universal attack method to compute universal perturbations in the frequency domain. Our experiments indicate that our proposed method can produce less perceptible perturbations yet with a similar or higher white-box fooling rates on various DNN texture classifiers and texture datasets as compared to existing universal attack techniques. We also demonstrate that our approach can improve the attack robustness against defended models as well as the cross-dataset transferability for texture recognition problems. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2010.01506 [pdf, other]

A Study for Universal Adversarial Attacks on Texture Recognition

Authors: Yingpeng Deng, Lina J. Karam

Abstract: Given the outstanding progress that convolutional neural networks (CNNs) have made on natural image classification and object recognition problems, it is shown that deep learning methods can achieve very good recognition performance on many texture datasets. However, while CNNs for natural image classification/object recognition tasks have been revealed to be highly vulnerable to various types of… ▽ More Given the outstanding progress that convolutional neural networks (CNNs) have made on natural image classification and object recognition problems, it is shown that deep learning methods can achieve very good recognition performance on many texture datasets. However, while CNNs for natural image classification/object recognition tasks have been revealed to be highly vulnerable to various types of adversarial attack methods, the robustness of deep learning methods for texture recognition is yet to be examined. In our paper, we show that there exist small image-agnostic/univesal perturbations that can fool the deep learning models with more than 80\% of testing fooling rates on all tested texture datasets. The computed perturbations using various attack methods on the tested datasets are generally quasi-imperceptible, containing structured patterns with low, middle and high frequency components. △ Less

Submitted 4 October, 2020; originally announced October 2020.

arXiv:2003.05549 [pdf, other]

Frequency-Tuned Universal Adversarial Attacks

Authors: Yingpeng Deng, Lina J. Karam

Abstract: Researchers have shown that the predictions of a convolutional neural network (CNN) for an image set can be severely distorted by one single image-agnostic perturbation, or universal perturbation, usually with an empirically fixed threshold in the spatial domain to restrict its perceivability. However, by considering the human perception, we propose to adopt JND thresholds to guide the perceivabil… ▽ More Researchers have shown that the predictions of a convolutional neural network (CNN) for an image set can be severely distorted by one single image-agnostic perturbation, or universal perturbation, usually with an empirically fixed threshold in the spatial domain to restrict its perceivability. However, by considering the human perception, we propose to adopt JND thresholds to guide the perceivability of universal adversarial perturbations. Based on this, we propose a frequency-tuned universal attack method to compute universal perturbations and show that our method can realize a good balance between perceivability and effectiveness in terms of fooling rate by adapting the perturbations to the local frequency content. Compared with existing universal adversarial attack techniques, our frequency-tuned attack method can achieve cutting-edge quantitative results. We demonstrate that our approach can significantly improve the performance of the baseline on both white-box and black-box attacks. △ Less

Submitted 9 June, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

arXiv:1912.01707 [pdf, other]

doi 10.1109/TIP.2021.3124155

It GAN DO Better: GAN-based Detection of Objects on Images with Varying Quality

Authors: Charan D. Prakash, Lina J. Karam

Abstract: In this paper, we propose in our novel generative framework the use of Generative Adversarial Networks (GANs) to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The… ▽ More In this paper, we propose in our novel generative framework the use of Generative Adversarial Networks (GANs) to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The resulting deep neural network maintains the exact architecture as the selected baseline model without adding to the model parameter complexity or inference speed. We first evaluate the effect of image quality not only on the object classification but also on the object bounding box regression. We then test the models resulting from our proposed GAN-DO framework, using two state-of-the-art object detection architectures as the baseline models. We also evaluate the effect of the number of re-trained parameters in the generator of GAN-DO on the accuracy of the final trained model. Performance results provided using GAN-DO on object detection datasets establish an improved robustness to varying image quality and a higher mAP compared to the existing approaches. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Journal ref: Published in the IEEE Transactions on Image Processing, Vol. 30, 2021, pages 9220-9230

arXiv:1906.03444 [pdf, other]

doi 10.1109/CVPR42600.2020.00079

Defending Against Universal Attacks Through Selective Feature Regeneration

Authors: Tejas Borkar, Felix Heide, Lina Karam

Abstract: Deep neural network (DNN) predictions have been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, image-agnostic (universal adversarial) perturbations added to any image can fool a target network into making erroneous predictions. Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN… ▽ More Deep neural network (DNN) predictions have been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, image-agnostic (universal adversarial) perturbations added to any image can fool a target network into making erroneous predictions. Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN feature domain and effectively defends against such universal perturbations. Our approach identifies pre-trained convolutional features that are most vulnerable to adversarial noise and deploys trainable feature regeneration units which transform these DNN filter activations into resilient features that are robust to universal perturbations. Regenerating only the top 50% adversarially susceptible activations in at most 6 DNN layers and leaving all remaining DNN activations unchanged, we outperform existing defense strategies across different network architectures by more than 10% in restored accuracy. We show that without any additional modification, our defense trained on ImageNet with one type of universal attack examples effectively defends against other types of unseen universal attacks. △ Less

Submitted 10 June, 2020; v1 submitted 8 June, 2019; originally announced June 2019.

Comments: CVPR 2020. Code: https://github.com/tsborkar/Selective-feature-regeneration Webpage: https://www.cs.princeton.edu/~fheide/SelectiveFeatureRegeneration/

Journal ref: CVPR 2020

arXiv:1804.08020 [pdf, other]

Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients

Authors: S. Alireza Golestaneh, Lina Karam

Abstract: Perceptual quality assessment for synthesized textures is a challenging task. In this paper, we propose a training-free reduced-reference (RR) objective quality assessment method that quantifies the perceived quality of synthesized textures. The proposed reduced-reference synthesized texture quality assessment metric is based on measuring the spatial and statistical attributes of the texture image… ▽ More Perceptual quality assessment for synthesized textures is a challenging task. In this paper, we propose a training-free reduced-reference (RR) objective quality assessment method that quantifies the perceived quality of synthesized textures. The proposed reduced-reference synthesized texture quality assessment metric is based on measuring the spatial and statistical attributes of the texture image using both image- and gradient-based wavelet coefficients at multiple scales. Performance evaluations on two synthesized texture databases demonstrate that our proposed RR synthesized texture quality metric significantly outperforms both full-reference and RR state-of-the-art quality metrics in predicting the perceived visual quality of the synthesized textures. △ Less

Submitted 26 April, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

arXiv:1803.04103 [pdf, other]

Full Reference Objective Quality Assessment for Reconstructed Background Images

Authors: Aditee Shrotre, Lina Karam

Abstract: With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating th… ▽ More With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images. △ Less

Submitted 11 April, 2018; v1 submitted 11 March, 2018; originally announced March 2018.

Comments: Associated source code: https://github.com/ashrotre/RBQI, Associated Database: https://drive.google.com/drive/folders/1bg8YRPIBcxpKIF9BIPisULPBPcA5x-Bk?usp=sharing (Email for permissions at: ashrotre<at>asu<dot>edu)

arXiv:1801.02684 [pdf]

Generative Sensing: Transforming Unreliable Sensor Data for Reliable Recognition

Authors: Lina Karam, Tejas Borkar, Yu Cao, Junseok Chae

Abstract: This paper introduces a deep learning enabled generative sensing framework which integrates low-end sensors with computational intelligence to attain a high recognition accuracy on par with that attained with high-end sensors. The proposed generative sensing framework aims at transforming low-end, low-quality sensor data into higher quality sensor data in terms of achieved classification accuracy.… ▽ More This paper introduces a deep learning enabled generative sensing framework which integrates low-end sensors with computational intelligence to attain a high recognition accuracy on par with that attained with high-end sensors. The proposed generative sensing framework aims at transforming low-end, low-quality sensor data into higher quality sensor data in terms of achieved classification accuracy. The low-end data can be transformed into higher quality data of the same modality or into data of another modality. Different from existing methods for image generation, the proposed framework is based on discriminative models and targets to maximize the recognition accuracy rather than a similarity measure. This is achieved through the introduction of selective feature regeneration in a deep neural network (DNN). The proposed generative sensing will essentially transform low-quality sensor data into high-quality information for robust perception. Results are presented to illustrate the performance of the proposed framework. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 5 pages, Submitted to IEEE MIPR 2018

ACM Class: I.2; I.2.10

arXiv:1710.04744 [pdf, other]

Can the early human visual system compete with Deep Neural Networks?

Authors: Samuel Dodge, Lina Karam

Abstract: We study and compare the human visual system and state-of-the-art deep neural networks on classification of distorted images. Different from previous works, we limit the display time to 100ms to test only the early mechanisms of the human visual system, without allowing time for any eye movements or other higher level processes. Our findings show that the human visual system still outperforms mode… ▽ More We study and compare the human visual system and state-of-the-art deep neural networks on classification of distorted images. Different from previous works, we limit the display time to 100ms to test only the early mechanisms of the human visual system, without allowing time for any eye movements or other higher level processes. Our findings show that the human visual system still outperforms modern deep neural networks under blurry and noisy images. These findings motivate future research into develo** more robust deep networks. △ Less

Submitted 12 October, 2017; originally announced October 2017.

Comments: Accepted as an oral paper at the Mutual Benefits of Cognitive and Computer Vision Workshop (held in conjunction with ICCV2017)

arXiv:1708.00169 [pdf, other]

doi 10.1109/TIP.2016.2577498

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

Authors: Milind S. Gide, Lina J. Karam

Abstract: With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerabl… ▽ More With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark. △ Less

Submitted 1 August, 2017; originally announced August 2017.

Journal ref: IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3852-3861, Aug. 2016

arXiv:1705.02498 [pdf, other]

A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions

Authors: Samuel Dodge, Lina Karam

Abstract: Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is st… ▽ More Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs. △ Less

Submitted 6 May, 2017; originally announced May 2017.

arXiv:1705.02406 [pdf, other]

DeepCorrect: Correcting DNN models against Image Distortions

Authors: Tejas Borkar, Lina Karam

Abstract: In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on prist… ▽ More In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on pristine images perform poorly when tested on such distortions. In this paper, we evaluate the effect of image distortions like Gaussian blur and additive noise on the activations of pre-trained convolutional filters. We propose a metric to identify the most noise susceptible convolutional filters and rank them in order of the highest gain in classification accuracy upon correction. In our proposed approach called DeepCorrect, we apply small stacks of convolutional layers with residual connections, at the output of these ranked filters and train them to correct the worst distortion affected filter activations, whilst leaving the rest of the pre-trained filter outputs in the network unchanged. Performance results show that applying DeepCorrect models for common vision tasks like image classification (ImageNet), object recognition (Caltech-101, Caltech-256) and scene classification (SUN-397), significantly improves the robustness of DNNs against distorted images and outperforms other alternative approaches.. △ Less

Submitted 8 April, 2019; v1 submitted 5 May, 2017; originally announced May 2017.

Comments: Accepted to IEEE Transactions on Image Processing, April 2019. For associated code, see https://github.com/tsborkar/DeepCorrect

arXiv:1703.08119 [pdf, other]

Quality Resilient Deep Neural Networks

Authors: Samuel Dodge, Lina Karam

Abstract: We study deep neural networks for classification of images with quality distortions. We first show that networks fine-tuned on distorted data greatly outperform the original networks when tested on distorted data. However, fine-tuned networks perform poorly on quality distortions that they have not been trained for. We propose a mixture of experts ensemble method that is robust to different types… ▽ More We study deep neural networks for classification of images with quality distortions. We first show that networks fine-tuned on distorted data greatly outperform the original networks when tested on distorted data. However, fine-tuned networks perform poorly on quality distortions that they have not been trained for. We propose a mixture of experts ensemble method that is robust to different types of distortions. The "experts" in our model are trained on a particular type of distortion. The output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The gating network is trained to predict optimal weights for a particular distortion type and level. During testing, the network is blind to the distortion level and type, yet can still assign appropriate weights to the expert models. We additionally investigate weight sharing methods for the mixture model and show that improved performance can be achieved with a large reduction in the number of unique network parameters. △ Less

Submitted 23 March, 2017; originally announced March 2017.

arXiv:1703.07478 [pdf, other]

Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes

Authors: S. Alireza Golestaneh, Lina J. Karam

Abstract: The detection of spatially-varying blur without having any information about the blur type is a challenging task. In this paper, we propose a novel effective approach to address the blur detection problem from a single image without requiring any knowledge about the blur type, level, or camera settings. Our approach computes blur detection maps based on a novel High-frequency multiscale Fusion and… ▽ More The detection of spatially-varying blur without having any information about the blur type is a challenging task. In this paper, we propose a novel effective approach to address the blur detection problem from a single image without requiring any knowledge about the blur type, level, or camera settings. Our approach computes blur detection maps based on a novel High-frequency multiscale Fusion and Sort Transform (HiFST) of gradient magnitudes. The evaluations of the proposed approach on a diverse set of blurry images with different blur types, levels, and contents demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods qualitatively and quantitatively. △ Less

Submitted 11 April, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

Comments: Accepted to CVPR 2017

arXiv:1702.00372 [pdf, other]

doi 10.1109/TIP.2018.2834826

Visual Saliency Prediction Using a Mixture of Deep Neural Networks

Authors: Samuel Dodge, Lina Karam

Abstract: Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency using local mechanisms limited to the receptive field of the network. We propose a model that incorporates global scene semantic information in addition to local information gathered by a convolutional… ▽ More Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency using local mechanisms limited to the receptive field of the network. We propose a model that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-to-end manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical non-mixture model that does not incorporate global scene information. △ Less

Submitted 1 February, 2017; originally announced February 2017.

arXiv:1604.04004 [pdf, other]

Understanding How Image Quality Affects Deep Neural Networks

Authors: Samuel Dodge, Lina Karam

Abstract: Image quality is an important practical challenge that is often overlooked in the design of machine vision systems. Commonly, machine vision systems are trained and tested on high quality image datasets, yet in practical applications the input images can not be assumed to be of high quality. Recently, deep neural networks have obtained state-of-the-art performance on many machine vision tasks. In… ▽ More Image quality is an important practical challenge that is often overlooked in the design of machine vision systems. Commonly, machine vision systems are trained and tested on high quality image datasets, yet in practical applications the input images can not be assumed to be of high quality. Recently, deep neural networks have obtained state-of-the-art performance on many machine vision tasks. In this paper we provide an evaluation of 4 state-of-the-art deep neural network models for image classification under quality distortions. We consider five types of quality distortions: blur, noise, contrast, JPEG, and JPEG2000 compression. We show that the existing networks are susceptible to these quality distortions, particularly to blur and noise. These results enable future work in develo** deep neural networks that are more invariant to quality distortions. △ Less

Submitted 21 April, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

Comments: Final version will appear in IEEE Xplore in the Proceedings of the Conference on the Quality of Multimedia Experience (QoMEX), June 6-8, 2016

arXiv:1604.03882 [pdf, other]

The Effect of Distortions on the Prediction of Visual Attention

Authors: Milind S. Gide, Samuel F. Dodge, Lina J. Karam

Abstract: Existing saliency models have been designed and evaluated for predicting the saliency in distortion-free images. However, in practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression and transmission. Several studies have explored the effect of distortion on human visual attention; however, none of them have con… ▽ More Existing saliency models have been designed and evaluated for predicting the saliency in distortion-free images. However, in practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression and transmission. Several studies have explored the effect of distortion on human visual attention; however, none of them have considered the performance of visual saliency models in the presence of distortion. Furthermore, given that one potential application of visual saliency prediction is to aid pooling of objective visual quality metrics, it is important to compare the performance of existing saliency models on distorted images. In this paper, we evaluate several state-of-the-art visual attention models over different databases consisting of distorted images with various types of distortions such as blur, noise and compression with varying levels of distortion severity. This paper also introduces new improved performance evaluation metrics that are shown to overcome shortcomings in existing performance metrics. We find that the performance of most models improves with moderate and high levels of distortions as compared to the near distortion-free case. In addition, model performance is also found to decrease with an increase in image complexity. △ Less

Submitted 13 April, 2016; originally announced April 2016.

Comments: 14 pages, 2 column, 14 figures

arXiv:1311.5834 [pdf, ps, other]

Traffic and Statistical Multiplexing Characterization of 3D Video Representation Formats (Extended Version)

Authors: Akshay Pulipaka, Patrick Seeling, Martin Reisslein, Lina J. Karam

Abstract: The network transport of 3D video, which contains two views of a video scene, poses significant challenges due to the increased video data compared to conventional single-view video. Addressing these challenges requires a thorough understanding of the traffic and multiplexing characteristics of the different representation formats of 3D video. We examine the average bitrate-distortion (RD) and bit… ▽ More The network transport of 3D video, which contains two views of a video scene, poses significant challenges due to the increased video data compared to conventional single-view video. Addressing these challenges requires a thorough understanding of the traffic and multiplexing characteristics of the different representation formats of 3D video. We examine the average bitrate-distortion (RD) and bitrate variability-distortion (VD) characteristics of three main representation formats. Specifically, we compare multiview video (MV) representation and encoding, frame sequential (FS) representation, and side-by-side (SBS) representation, whereby conventional single-view encoding is employed for the FS and SBS representations. Our results for long 3D videos in full HD format indicate that the MV representation and encoding achieves the highest RD efficiency, while exhibiting the highest bitrate variabilities. We examine the impact of these bitrate variabilities on network transport through extensive statistical multiplexing simulations. We find that when multiplexing a small number of streams, the MV and FS representations require the same bandwidth. However, when multiplexing a large number of streams or smoothing traffic, the MV representation and encoding reduces the bandwidth requirement relative to the FS representation. △ Less

Submitted 22 November, 2013; originally announced November 2013.

arXiv:1307.5702 [pdf, other]

Is Bottom-Up Attention Useful for Scene Recognition?

Authors: Samuel F. Dodge, Lina J. Karam

Abstract: The human visual system employs a selective attention mechanism to understand the visual world in an eficient manner. In this paper, we show how computational models of this mechanism can be exploited for the computer vision application of scene recognition. First, we consider saliency weighting and saliency pruning, and provide a comparison of the performance of different attention models in thes… ▽ More The human visual system employs a selective attention mechanism to understand the visual world in an eficient manner. In this paper, we show how computational models of this mechanism can be exploited for the computer vision application of scene recognition. First, we consider saliency weighting and saliency pruning, and provide a comparison of the performance of different attention models in these approaches in terms of classification accuracy. Pruning can achieve a high degree of computational savings without significantly sacrificing classification accuracy. In saliency weighting, however, we found that classification performance does not improve. In addition, we present a new method to incorporate salient and non-salient regions for improved classification accuracy. We treat the salient and non-salient regions separately and combine them using Multiple Kernel Learning. We evaluate our approach using the UIUC sports dataset and find that with a small training size, our method improves upon the classification accuracy of the baseline bag of features approach. △ Less

Submitted 22 July, 2013; originally announced July 2013.

Report number: ISACS/2013/04

Showing 1–21 of 21 results for author: Karam, L