Search | arXiv e-print repository

Learned Compression of Encoding Distributions

Abstract: The entropy bottleneck introduced by Ballé et al. is a common component used in many learned compression models. It encodes a transformed latent representation using a static distribution whose parameters are learned during training. However, the actual distribution of the latent data may vary wildly across different inputs. The static distribution attempts to encompass all possible input distribu… ▽ More The entropy bottleneck introduced by Ballé et al. is a common component used in many learned compression models. It encodes a transformed latent representation using a static distribution whose parameters are learned during training. However, the actual distribution of the latent data may vary wildly across different inputs. The static distribution attempts to encompass all possible input distributions, thus fitting none of them particularly well. This unfortunate phenomenon, sometimes known as the amortization gap, results in suboptimal compression. To address this issue, we propose a method that dynamically adapts the encoding distribution to match the latent data distribution for a specific input. First, our model estimates a better encoding distribution for a given input. This distribution is then compressed and transmitted as an additional side-information bitstream. Finally, the decoder reconstructs the encoding distribution and uses it to decompress the corresponding latent data. Our method achieves a Bjøntegaard-Delta (BD)-rate gain of -7.10% on the Kodak test dataset when applied to the standard fully-factorized architecture. Furthermore, considering computational complexity, the transform used by our method is an order of magnitude cheaper in terms of Multiply-Accumulate (MAC) operations compared to related side-information methods such as the scale hyperprior. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures, IEEE ICIP 2024

arXiv:2405.19453 [pdf, other]

Optimizing Split Points for Error-Resilient SplitFed Learning

Authors: Chamani Shiranthika, Parvaneh Saeedi, Ivan V. Bajić

Abstract: Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split po… ▽ More Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split points. It explores various parameter aggregation strategies of SplitFed by examining the impact of splitting the model at different points-either shallow split or deep split-on the final global model performance. The experiments, conducted on a human embryo image segmentation task, reveal a statistically significant advantage of a deeper split point. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted for poster presentation at the Women in Computer Vision (WiCV) workshop in CVPR 2024

arXiv:2405.12456 [pdf, other]

Mutual Information Analysis in Multimodal Learning Systems

Authors: Hadi Hadizadeh, S. Faegheh Yeganli, Bahador Rashidi, Ivan V. Bajić

Abstract: In recent years, there has been a significant increase in applications of multimodal signal processing and analysis, largely driven by the increased availability of multimodal datasets and the rapid progress in multimodal learning systems. Well-known examples include autonomous vehicles, audiovisual generative systems, vision-language systems, and so on. Such systems integrate multiple signal moda… ▽ More In recent years, there has been a significant increase in applications of multimodal signal processing and analysis, largely driven by the increased availability of multimodal datasets and the rapid progress in multimodal learning systems. Well-known examples include autonomous vehicles, audiovisual generative systems, vision-language systems, and so on. Such systems integrate multiple signal modalities: text, speech, images, video, LiDAR, etc., to perform various tasks. A key issue for understanding such systems is the relationship between various modalities and how it impacts task performance. In this paper, we employ the concept of mutual information (MI) to gain insight into this issue. Taking advantage of the recent progress in entropy modeling and estimation, we develop a system called InfoMeter to estimate MI between modalities in a multimodal learning system. We then apply InfoMeter to analyze a multimodal 3D object detection system over a large-scale dataset for autonomous driving. Our experiments on this system suggest that a lower MI between modalities is beneficial for detection accuracy. This new insight may facilitate improvements in the development of future multimodal learning systems. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 6 pages, 7 figures, IEEE MIPR 2024

arXiv:2405.09077 [pdf, other]

Compressive Feature Selection for Remote Visual Multi-Task Inference

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: Deep models produce a number of features in each internal layer. A key problem in applications such as feature compression for remote inference is determining how important each feature is for the task(s) performed by the model. The problem is especially challenging in the case of multi-task inference, where the same feature may carry different importance for different tasks. In this paper, we exa… ▽ More Deep models produce a number of features in each internal layer. A key problem in applications such as feature compression for remote inference is determining how important each feature is for the task(s) performed by the model. The problem is especially challenging in the case of multi-task inference, where the same feature may carry different importance for different tasks. In this paper, we examine how effective is mutual information (MI) between a feature and a model's task output as a measure of the feature's importance for that task. Experiments involving hard selection and soft selection (unequal compression) based on MI are carried out to compare the MI-based method with alternative approaches. Multi-objective analysis is provided to offer further insight. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 6 pages, 8 figures, IEEE ICME Workshop on Coding for Machines

arXiv:2402.12532 [pdf, other]

Scalable Human-Machine Point Cloud Compression

Authors: Mateen Ulhaq, Ivan V. Bajić

Abstract: Due to the limited computational capabilities of edge devices, deep learning inference can be quite expensive. One remedy is to compress and transmit point cloud data over the network for server-side processing. Unfortunately, this approach can be sensitive to network factors, including available bitrate. Luckily, the bitrate requirements can be reduced without sacrificing inference accuracy by us… ▽ More Due to the limited computational capabilities of edge devices, deep learning inference can be quite expensive. One remedy is to compress and transmit point cloud data over the network for server-side processing. Unfortunately, this approach can be sensitive to network factors, including available bitrate. Luckily, the bitrate requirements can be reduced without sacrificing inference accuracy by using a machine task-specialized codec. In this paper, we present a scalable codec for point-cloud data that is specialized for the machine task of classification, while also providing a mechanism for human viewing. In the proposed scalable codec, the "base" bitstream supports the machine task, and an "enhancement" bitstream may be used for better input reconstruction performance for human viewing. We base our architecture on PointNet++, and test its efficacy on the ModelNet40 dataset. We show significant improvements over prior non-specialized codecs. △ Less

Submitted 23 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures, 2024 Picture Coding Symposium (PCS)

arXiv:2308.05959 [pdf, other]

Learned Point Cloud Compression for Classification

Authors: Mateen Ulhaq, Ivan V. Bajić

Abstract: Deep learning is increasingly being used to perform machine vision tasks such as classification, object detection, and segmentation on 3D point cloud data. However, deep learning inference is computationally expensive. The limited computational capabilities of end devices thus necessitate a codec for transmitting point cloud data over the network for server-side processing. Such a codec must be li… ▽ More Deep learning is increasingly being used to perform machine vision tasks such as classification, object detection, and segmentation on 3D point cloud data. However, deep learning inference is computationally expensive. The limited computational capabilities of end devices thus necessitate a codec for transmitting point cloud data over the network for server-side processing. Such a codec must be lightweight and capable of achieving high compression ratios without sacrificing accuracy. Motivated by this, we present a novel point cloud codec that is highly specialized for the machine task of classification. Our codec, based on PointNet, achieves a significantly better rate-accuracy trade-off in comparison to alternative methods. In particular, it achieves a 94% reduction in BD-bitrate over non-specialized codecs on the ModelNet40 dataset. For low-resource end devices, we also propose two lightweight configurations of our encoder that achieve similar BD-bitrate reductions of 93% and 92% with 3% and 5% drops in top-1 accuracy, while consuming only 0.470 and 0.048 encoder-side kMACs/point, respectively. Our codec demonstrates the potential of specialized codecs for machine analysis of point clouds, and provides a basis for extension to more complex tasks and datasets in the future. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures, IEEE MMSP 2023

arXiv:2307.13851 [pdf, other]

SplitFed resilience to packet loss: Where to split, that is the question

Authors: Chamani Shiranthika, Zahra Hafezi Kafshgari, Parvaneh Saeedi, Ivan V. Bajić

Abstract: Decentralized machine learning has broadened its scope recently with the invention of Federated Learning (FL), Split Learning (SL), and their hybrids like Split Federated Learning (SplitFed or SFL). The goal of SFL is to reduce the computational power required by each client in FL and parallelize SL while maintaining privacy. This paper investigates the robustness of SFL against packet loss on com… ▽ More Decentralized machine learning has broadened its scope recently with the invention of Federated Learning (FL), Split Learning (SL), and their hybrids like Split Federated Learning (SplitFed or SFL). The goal of SFL is to reduce the computational power required by each client in FL and parallelize SL while maintaining privacy. This paper investigates the robustness of SFL against packet loss on communication links. The performance of various SFL aggregation strategies is examined by splitting the model at two points -- shallow split and deep split -- and testing whether the split point makes a statistically significant difference to the accuracy of the final model. Experiments are carried out on a segmentation model for human embryo images and indicate the statistically significant advantage of a deeper split point. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 10 pages, 4 figures, MICCAI 2023 Workshop on Distributed, Collaborative and Federated Learning

arXiv:2307.08978 [pdf, other]

Learned Scalable Video Coding For Humans and Machines

Authors: Hadi Hadizadeh, Ivan V. Bajić

Abstract: Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic moni… ▽ More Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce the first end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer. We will provide the implementation of the proposed system at www.github.com upon completion of the review process. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 14 pages, 16 figures

arXiv:2307.02430 [pdf, other]

Base Layer Efficiency in Scalable Human-Machine Coding

Authors: Yalda Foroutan, Alon Harell, Anderson de Andrade, Ivan V. Bajić

Abstract: A basic premise in scalable human-machine coding is that the base layer is intended for automated machine analysis and is therefore more compressible than the same content would be for human viewing. Use cases for such coding include video surveillance and traffic monitoring, where the majority of the content will never be seen by humans. Therefore, base layer efficiency is of paramount importance… ▽ More A basic premise in scalable human-machine coding is that the base layer is intended for automated machine analysis and is therefore more compressible than the same content would be for human viewing. Use cases for such coding include video surveillance and traffic monitoring, where the majority of the content will never be seen by humans. Therefore, base layer efficiency is of paramount importance because the system would most frequently operate at the base-layer rate. In this paper, we analyze the coding efficiency of the base layer in a state-of-the-art scalable human-machine image codec, and show that it can be improved. In particular, we demonstrate that gains of 20-40% in BD-Rate compared to the currently best results on object detection and instance segmentation are possible. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 5 pages, 6 figures, IEEE ICIP 2023

arXiv:2307.01846 [pdf, other]

Grad-FEC: Unequal Loss Protection of Deep Features in Collaborative Intelligence

Authors: Korcan Uyanik, S. Faegheh Yeganli, Ivan V. Bajić

Abstract: Collaborative intelligence (CI) involves dividing an artificial intelligence (AI) model into two parts: front-end, to be deployed on an edge device, and back-end, to be deployed in the cloud. The deep feature tensors produced by the front-end are transmitted to the cloud through a communication channel, which may be subject to packet loss. To address this issue, in this paper, we propose a novel a… ▽ More Collaborative intelligence (CI) involves dividing an artificial intelligence (AI) model into two parts: front-end, to be deployed on an edge device, and back-end, to be deployed in the cloud. The deep feature tensors produced by the front-end are transmitted to the cloud through a communication channel, which may be subject to packet loss. To address this issue, in this paper, we propose a novel approach to enhance the resilience of the CI system in the presence of packet loss through Unequal Loss Protection (ULP). The proposed ULP approach involves a feature importance estimator, which estimates the importance of feature packets produced by the front-end, and then selectively applies Forward Error Correction (FEC) codes to protect important packets. Experimental results demonstrate that the proposed approach can significantly improve the reliability and robustness of the CI system in the presence of packet loss. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 5 pages, 6 figures, IEEE ICIP 2023

arXiv:2307.00309 [pdf, other]

Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey

Authors: Hanieh Naderi, Ivan V. Bajić

Abstract: Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural n… ▽ More Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification.This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes adversarial example generation methods in recent years. Additionally, it provides an overview of defense strategies, organized into data-focused and model-focused methods. Finally, it presents several current challenges and potential future research directions in this domain. △ Less

Submitted 1 December, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

arXiv:2305.17295 [pdf, other]

Rate-Distortion Theory in Coding for Machines and its Application

Authors: Alon Harell, Yalda Foroutan, Nilesh Ahuja, Parual Datta, Bhavya Kanzariya, V. Srinivasa Somayaulu, Omesh Tickoo, Anderson de Andrade, Ivan V. Bajic

Abstract: Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a… ▽ More Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilize this newfound understanding to improve several methods for learnable image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks such as classification, instance segmentation, and object detection. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.10453 [pdf, other]

VVC+M: Plug and Play Scalable Image Coding for Humans and Machines

Authors: Alon Harell, Yalda Foroutan, Ivan V. Bajic

Abstract: Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results… ▽ More Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in sub-optimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.02562 [pdf, ps, other]

Conditional and Residual Methods in Scalable Coding for Humans and Machines

Authors: Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Bajić

Abstract: We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional codin… ▽ More We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines. △ Less

Submitted 4 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: IEEE ICME Workshop on Coding for Machines, Brisbane, Australia, 2023

arXiv:2304.14976 [pdf, other]

Quality-Adaptive Split-Federated Learning for Segmenting Medical Images with Inaccurate Annotations

Authors: Zahra Hafezi Kafshgari, Chamani Shiranthika, Parvaneh Saeedi, Ivan V. Bajić

Abstract: SplitFed Learning, a combination of Federated and Split Learning (FL and SL), is one of the most recent developments in the decentralized machine learning domain. In SplitFed learning, a model is trained by clients and a server collaboratively. For image segmentation, labels are created at each client independently and, therefore, are subject to clients' bias, inaccuracies, and inconsistencies. In… ▽ More SplitFed Learning, a combination of Federated and Split Learning (FL and SL), is one of the most recent developments in the decentralized machine learning domain. In SplitFed learning, a model is trained by clients and a server collaboratively. For image segmentation, labels are created at each client independently and, therefore, are subject to clients' bias, inaccuracies, and inconsistencies. In this paper, we propose a data quality-based adaptive averaging strategy for SplitFed learning, called QA-SplitFed, to cope with the variation of annotated ground truth (GT) quality over multiple clients. The proposed method is compared against five state-of-the-art model averaging methods on the task of learning human embryo image segmentation. Our experiments show that all five baseline methods fail to maintain accuracy as the number of corrupted clients increases. QA-SplitFed, however, copes effectively with corruption as long as there is at least one uncorrupted client. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 5 pages, 4 figures, IEEE International Symposium on Biomedical Imaging (ISBI) 2023

arXiv:2210.14164 [pdf, other]

No-Box Attacks on 3D Point Cloud Classification

Authors: Hanieh Naderi, Chinthaka Dinesh, Ivan V. Bajic, Shohreh Kasaei

Abstract: Adversarial attacks pose serious challenges for deep neural network (DNN)-based analysis of various input signals. In the case of 3D point clouds, methods have been developed to identify points that play a key role in network decision, and these become crucial in generating existing adversarial attacks. For example, a saliency map approach is a popular method for identifying adversarial drop point… ▽ More Adversarial attacks pose serious challenges for deep neural network (DNN)-based analysis of various input signals. In the case of 3D point clouds, methods have been developed to identify points that play a key role in network decision, and these become crucial in generating existing adversarial attacks. For example, a saliency map approach is a popular method for identifying adversarial drop points, whose removal would significantly impact the network decision. Generally, methods for identifying adversarial points rely on the access to the DNN model itself to determine which points are critically important for the model's decision. This paper aims to provide a novel viewpoint on this problem, where adversarial points can be predicted without access to the target DNN model, which is referred to as a ``no-box'' attack. To this end, we define 14 point cloud features and use multiple linear regression to examine whether these features can be used for adversarial point prediction, and which combination of features is best suited for this purpose. Experiments show that a suitable combination of features is able to predict adversarial points of four different networks -- PointNet, PointNet++, DGCNN, and PointConv -- significantly better than a random guess and comparable to white-box attacks. Additionally, we show that no-box attack is transferable to unseen models. The results also provide further insight into DNNs for point cloud classification, by showing which features play key roles in their decision-making process. △ Less

Submitted 27 January, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 10 pages, 6 figures

arXiv:2210.00727 [pdf, other]

Privacy-Preserving Feature Coding for Machines

Authors: Bardia Azizian, Ivan V. Bajić

Abstract: Automated machine vision pipelines do not need the exact visual content to perform their tasks. Therefore, there is a potential to remove private information from the data without significantly affecting the machine vision accuracy. We present a novel method to create a privacy-preserving latent representation of an image that could be used by a downstream machine vision model. This latent represe… ▽ More Automated machine vision pipelines do not need the exact visual content to perform their tasks. Therefore, there is a potential to remove private information from the data without significantly affecting the machine vision accuracy. We present a novel method to create a privacy-preserving latent representation of an image that could be used by a downstream machine vision model. This latent representation is constructed using adversarial training to prevent accurate reconstruction of the input while preserving the task accuracy. Specifically, we split a Deep Neural Network (DNN) model and insert an autoencoder whose purpose is to both reduce the dimensionality as well as remove information relevant to input reconstruction while minimizing the impact on task accuracy. Our results show that input reconstruction ability can be reduced by about 0.8 dB at the equivalent task accuracy, with degradation concentrated near the edges, which is important for privacy. At the same time, 30% bit savings are achieved compared to coding the features directly. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, Picture Coding Symposium (PCS) 2022

arXiv:2209.11694 [pdf, other]

Rate-Distortion in Image Coding for Machines

Authors: Alon Harell, Anderson De Andrade, Ivan V. Bajic

Abstract: In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on hum… ▽ More In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.08726 [pdf, other]

Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment

Authors: Chinthaka Dinesh, Gene Cheung, Saghar Bagheri, Ivan V. Bajic

Abstract: A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is exploited for graph filtering. However, existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. In this paper, we show that for datasets with strong inherent anti-correlations, a suitable gra… ▽ More A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is exploited for graph filtering. However, existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. In this paper, we show that for datasets with strong inherent anti-correlations, a suitable graph contains both positive and negative edge weights. In response, we propose a linear-time signed graph sampling method centered on the concept of balanced signed graphs. Specifically, given an empirical covariance data matrix $\bar{\bf{C}}$, we first learn a sparse inverse matrix (graph Laplacian) $\mathcal{L}$ corresponding to a signed graph $\mathcal{G}$. We define the eigenvectors of Laplacian $\mathcal{L}_B$ for a balanced signed graph $\mathcal{G}_B$ -- approximating $\mathcal{G}$ via edge weight augmentation -- as graph frequency components. Next, we choose samples to minimize the low-pass filter reconstruction error in two steps. We first align all Gershgorin disc left-ends of Laplacian $\mathcal{L}_B$ at smallest eigenvalue $λ_{\min}(\mathcal{L}_B)$ via similarity transform $\mathcal{L}_p = §\mathcal{L}_B §^{-1}$, leveraging a recent linear algebra theorem called Gershgorin disc perfect alignment (GDPA). We then perform sampling on $\mathcal{L}_p$ using a previous fast Gershgorin disc alignment sampling (GDAS) scheme. Experimental results show that our signed graph sampling method outperformed existing fast sampling schemes noticeably on various datasets. △ Less

Submitted 15 January, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2103.06153

arXiv:2208.05641 [pdf, other]

doi 10.1145/3552437.3555705

Towards Automated Key-Point Detection in Images with Partial Pool View

Authors: T. J. Woinoski, I. V. Bajic

Abstract: Sports analytics has been an up-and-coming field of research among professional sporting organizations and academic institutions alike. With the insurgence and collection of athlete data, the primary goal of such analysis is to improve athletes' performance in a measurable and quantifiable manner. This work is aimed at alleviating some of the challenges encountered in the collection of adequate sw… ▽ More Sports analytics has been an up-and-coming field of research among professional sporting organizations and academic institutions alike. With the insurgence and collection of athlete data, the primary goal of such analysis is to improve athletes' performance in a measurable and quantifiable manner. This work is aimed at alleviating some of the challenges encountered in the collection of adequate swimming data. Past works on this subject have shown that the detection and tracking of swimmers is feasible, but not without challenges. Among these challenges are pool localization and determining the relative positions of the swimmers relative to the pool. This work presents two contributions towards solving these challenges. First, we present a pool model with invariant key-points relevant for swimming analytics. Second, we study the detectability of such key-points in images with partial pool view, which are challenging but also quite common in swimming race videos. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Journal ref: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports (MMSports '22), October 10, 2022, Lisboa, Portugal

arXiv:2208.02512 [pdf, other]

Scalable Video Coding for Humans and Machines

Authors: Hyomin Choi, Ivan V. Bajić

Abstract: Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstr… ▽ More Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstream and human vision via its enhancement layer bitstream. The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding. The results show that on object detection, the proposed framework achieves 13-19% bit savings compared to state-of-the-art video codecs, while remaining competitive in terms of MS-SSIM on the human vision task. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures, IEEE MMSP 2022

arXiv:2206.01365 [pdf, other]

doi 10.1109/MMUL.2015.59

Adversarial Attacks on Human Vision

Authors: Victor A. Mateescu, Ivan V. Bajić

Abstract: This article presents an introduction to visual attention retargeting, its connection to visual saliency, the challenges associated with it, and ideas for how it can be approached. The difficulty of attention retargeting as a saliency inversion problem lies in the lack of one-to-one map** between saliency and the image domain, in addition to the possible negative impact of saliency alterations o… ▽ More This article presents an introduction to visual attention retargeting, its connection to visual saliency, the challenges associated with it, and ideas for how it can be approached. The difficulty of attention retargeting as a saliency inversion problem lies in the lack of one-to-one map** between saliency and the image domain, in addition to the possible negative impact of saliency alterations on image aesthetics. A few approaches from recent literature to solve this challenging problem are reviewed, and several suggestions for future development are presented. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 21 pages, 8 figures, 1 table

Journal ref: Extended version of IEEE MultiMedia, vol. 23, no. 1, pp. 82-91, Jan.-Mar. 2016

arXiv:2205.01874 [pdf, other]

Joint Image Compression and Denoising via Latent-Space Scalability

Authors: Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Bajić

Abstract: When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-ba… ▽ More When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-based image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space (the base layer), while the noisy image is decoded from the full latent space at a higher rate. Using a subset of the latent space for the denoised image allows denoising to be carried out at a lower rate. Besides providing a scalable representation of the noisy input image, performing denoising jointly with compression makes intuitive sense because noise is hard to compress; hence, compressibility is one of the criteria that may help distinguish noise from the signal. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings compared to a cascade combination of a state-of-the-art codec and a state-of-the-art denoiser. △ Less

Submitted 4 September, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2205.01724 [pdf, other]

License Plate Privacy in Collaborative Visual Analysis of Traffic Scenes

Authors: Saeed Ranjbar Alvar, Korcan Uyanik, Ivan V. Bajić

Abstract: Traffic scene analysis is important for emerging technologies such as smart traffic management and autonomous vehicles. However, such analysis also poses potential privacy threats. For example, a system that can recognize license plates may construct patterns of behavior of the corresponding vehicles' owners and use that for various illegal purposes. In this paper we present a system that enables… ▽ More Traffic scene analysis is important for emerging technologies such as smart traffic management and autonomous vehicles. However, such analysis also poses potential privacy threats. For example, a system that can recognize license plates may construct patterns of behavior of the corresponding vehicles' owners and use that for various illegal purposes. In this paper we present a system that enables traffic scene analysis while at the same time preserving license plate privacy. The system is based on a multi-task model whose latent space is selectively compressed depending on the amount of information the specific features carry about analysis tasks and private information. Effectiveness of the proposed method is illustrated by experiments on the Cityscapes dataset, for which we also provide license plate annotations. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: submitted to IEEE MIPR'22

arXiv:2202.00892 [pdf, other]

Does Video Compression Impact Tracking Accuracy?

Authors: Takehiro Tanaka, Alon Harell, Ivan V. Bajić

Abstract: Everyone "knows" that compressing a video will degrade the accuracy of object tracking. Yet, a literature search on this topic reveals that there is very little documented evidence for this presumed fact. Part of the reason is that, until recently, there were no object tracking datasets for uncompressed video, which made studying the effects of compression on tracking accuracy difficult. In this p… ▽ More Everyone "knows" that compressing a video will degrade the accuracy of object tracking. Yet, a literature search on this topic reveals that there is very little documented evidence for this presumed fact. Part of the reason is that, until recently, there were no object tracking datasets for uncompressed video, which made studying the effects of compression on tracking accuracy difficult. In this paper, using a recently published dataset that contains tracking annotations for uncompressed videos, we examined the degradation of tracking accuracy due to video compression using rigorous statistical methods. Specifically, we examined the impact of quantization parameter (QP) and motion search range (MSR) on Multiple Object Tracking Accuracy (MOTA). The results show that QP impacts MOTA at the 95% confidence level, while there is insufficient evidence to claim that MSR impacts MOTA. Moreover, regression analysis allows us to derive a quantitative relationship between MOTA and QP for the specific tracker used in the experiments. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 5 pages, 6 figures, 3 tables, IEEE International Symposium on Circuits and Systems (ISCAS) 2022

arXiv:2201.12773 [pdf, other]

Practical Noise Simulation for RGB Images

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: This document describes a noise generator that simulates realistic noise found in smartphone cameras. The generator simulates Poissonian-Gaussian noise whose parameters have been estimated on the Smartphone Image Denoising Dataset (SIDD). The generator is available online, and is currently being used in compressed-domain denoising exploration experiments in JPEG AI. This document describes a noise generator that simulates realistic noise found in smartphone cameras. The generator simulates Poissonian-Gaussian noise whose parameters have been estimated on the Smartphone Image Denoising Dataset (SIDD). The generator is available online, and is currently being used in compressed-domain denoising exploration experiments in JPEG AI. △ Less

Submitted 30 January, 2022; originally announced January 2022.

Comments: Reference paper for the code

arXiv:2112.14934 [pdf, other]

SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Authors: Takehiro Tanaka, Hyomin Choi, Ivan V. Bajić

Abstract: We present a dataset that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-HW-Tracks-v1. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and i… ▽ More We present a dataset that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-HW-Tracks-v1. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset can be used to evaluate object tracking performance on uncompressed video sequences and study the relationship between video compression and object tracking. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Comments: 4 pages, 3 figures, submitted to Data in Brief

arXiv:2112.00794 [pdf, other]

DFTS2: Simulating Deep Feature Transmission Over Packet Loss Channels

Authors: Ashiv Dhondea, Robert A. Cohen, Ivan V. Bajić

Abstract: In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called D… ▽ More In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called DFTS2, which enables researchers to define the components of the CI system in TensorFlow~2, select a packet-based channel model with various parameters, and simulate system behavior under various channel conditions and error/loss control strategies. Using DFTS2, we also present the most comprehensive study to date of the packet loss concealment methods for collaborative image classification models. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 6 pages, 4 figures, IEEE Conference on Visual Communications and Image Processing (VCIP) 2021

arXiv:2106.05531 [pdf, other]

CALTeC: Content-Adaptive Linear Tensor Completion for Collaborative Intelligence

Authors: Ashiv Dhondea, Robert A. Cohen, Ivan V. Bajić

Abstract: In collaborative intelligence, an artificial intelligence (AI) model is typically split between an edge device and the cloud. Feature tensors produced by the edge sub-model are sent to the cloud via an imperfect communication channel. At the cloud side, parts of the feature tensor may be missing due to packet loss. In this paper we propose a method called Content-Adaptive Linear Tensor Completion… ▽ More In collaborative intelligence, an artificial intelligence (AI) model is typically split between an edge device and the cloud. Feature tensors produced by the edge sub-model are sent to the cloud via an imperfect communication channel. At the cloud side, parts of the feature tensor may be missing due to packet loss. In this paper we propose a method called Content-Adaptive Linear Tensor Completion (CALTeC) to recover the missing feature data. The proposed method is fast, data-adaptive, does not require pre-training, and produces better results than existing methods for tensor data recovery in collaborative intelligence. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 5 pages, 4 figures, accepted for presentation at IEEE ICIP 2021

arXiv:2105.10341 [pdf, other]

Error Resilient Collaborative Intelligence via Low-Rank Tensor Completion

Authors: Lior Bragilevsky, Ivan V. Bajić

Abstract: In the race to bring Artificial Intelligence (AI) to the edge, collaborative intelligence has emerged as a promising way to lighten the computation load on edge devices that run applications based on Deep Neural Networks (DNNs). Typically, a deep model is split at a certain layer into edge and cloud sub-models. The deep feature tensor produced by the edge sub-model is transmitted to the cloud, whe… ▽ More In the race to bring Artificial Intelligence (AI) to the edge, collaborative intelligence has emerged as a promising way to lighten the computation load on edge devices that run applications based on Deep Neural Networks (DNNs). Typically, a deep model is split at a certain layer into edge and cloud sub-models. The deep feature tensor produced by the edge sub-model is transmitted to the cloud, where the remaining computationally intensive workload is performed by the cloud sub-model. The communication channel between the edge and cloud is imperfect, which will result in missing data in the deep feature tensor received at the cloud side. In this study, we examine the effectiveness of four low-rank tensor completion methods in recovering missing data in the deep feature tensor. We consider both sparse tensors, such as those produced by the VGG16 model, as well as non-sparse tensors, such as those produced by ResNet34 model. We study tensor completion effectiveness in both conplexity-constrained and unconstrained scenario. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: 2 pages, 1 figure, extended abstract for a poster at IEEE Communication Theory Workshop (CTW) 2020 (moved to 2021)

arXiv:2105.07102 [pdf, other]

doi 10.1109/OJCAS.2021.3072884

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Bajić

Abstract: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features outpu… ▽ More In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clip** and quantization error of ReLU and leaky-ReLU activations at this intermediate layer are developed and used to compute optimal clip** ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while kee** the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: Accepted for publication in IEEE Open Journal of Circuits and Systems

Journal ref: IEEE Open Journal of Circuits and Systems, vol. 2, 13 May 2021, pp. 350-362

arXiv:2105.06002 [pdf, other]

doi 10.1109/ICME46284.2020.9102797

Lightweight compression of neural network feature tensors for collaborative intelligence

Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Bajić

Abstract: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DN… ▽ More In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer, while having a low complexity suitable for edge devices and not requiring any retraining. We also present a modified entropy-constrained quantizer design algorithm optimized for clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point activations down to 0.6 to 0.8 bits, while kee** the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding a layer's activations in split neural networks for edge/cloud applications. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: Accepted for publication in IEEE ICME 2020

Journal ref: 2020 IEEE International Conference on Multimedia and Expo (ICME)

arXiv:2104.12056 [pdf, other]

Swimmer Stroke Rate Estimation From Overhead Race Video

Authors: Timothy Woinoski, Ivan V. Bajić

Abstract: In this work, we propose a swimming analytics system for automatically determining swimmer stroke rates from overhead race video (ORV). General ORV is defined as any footage of swimmers in competition, taken for the purposes of viewing or analysis. Examples of this are footage from live streams, broadcasts, or specialized camera equipment, with or without camera motion. These are the most typical… ▽ More In this work, we propose a swimming analytics system for automatically determining swimmer stroke rates from overhead race video (ORV). General ORV is defined as any footage of swimmers in competition, taken for the purposes of viewing or analysis. Examples of this are footage from live streams, broadcasts, or specialized camera equipment, with or without camera motion. These are the most typical forms of swimming competition footage. We detail how to create a system that will automatically collect swimmer stroke rates in any competition, given the video of the competition of interest. With this information, better systems can be created and additions to our analytics system can be proposed to automatically extract other swimming metrics of interest. △ Less

Submitted 20 May, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: 6 pages, 4 figures, to be presented at the IEEE ICME Workshop on Artificial Intelligence in Sports (AI-Sports), July 2021

arXiv:2102.06841 [pdf, other]

Collaborative Intelligence: Challenges and Opportunities

Authors: Ivan V. Bajić, Weisi Lin, Yonghong Tian

Abstract: This paper presents an overview of the emerging area of collaborative intelligence (CI). Our goal is to raise awareness in the signal processing community of the challenges and opportunities in this area of growing importance, where key developments are expected to come from signal processing and related disciplines. The paper surveys the current state of the art in CI, with special emphasis on si… ▽ More This paper presents an overview of the emerging area of collaborative intelligence (CI). Our goal is to raise awareness in the signal processing community of the challenges and opportunities in this area of growing importance, where key developments are expected to come from signal processing and related disciplines. The paper surveys the current state of the art in CI, with special emphasis on signal processing-related challenges in feature compression, error resilience, privacy, and system-level design. △ Less

Submitted 12 February, 2021; originally announced February 2021.

Comments: 5 pages, 2 figures, accepted for presentation at IEEE ICASSP 2021

arXiv:2102.04018 [pdf, other]

Analysis of Latent-Space Motion for Collaborative Intelligence

Authors: Mateen Ulhaq, Ivan V. Bajić

Abstract: When the input to a deep neural network (DNN) is a video signal, a sequence of feature tensors is produced at the intermediate layers of the model. If neighboring frames of the input video are related through motion, a natural question is, "what is the relationship between the corresponding feature tensors?" By analyzing the effect of common DNN operations on optical flow, we show that the motion… ▽ More When the input to a deep neural network (DNN) is a video signal, a sequence of feature tensors is produced at the intermediate layers of the model. If neighboring frames of the input video are related through motion, a natural question is, "what is the relationship between the corresponding feature tensors?" By analyzing the effect of common DNN operations on optical flow, we show that the motion present in each channel of a feature tensor is approximately equal to the scaled version of the input motion. The analysis is validated through experiments utilizing common motion models. %These results will be useful in collaborative intelligence applications where sequences of feature tensors need to be compressed or further analyzed. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: 6 pages, 6 figures, extended version of an IEEE ICASSP 2021 paper

arXiv:2102.00142 [pdf, other]

Latent-Space Inpainting for Packet Loss Concealment in Collaborative Object Detection

Authors: Ivan V. Bajić

Abstract: Edge devices, such as cameras and mobile units, are increasingly capable of performing sophisticated computation in addition to their traditional roles in sensing and communicating signals. The focus of this paper is on collaborative object detection, where deep features computed on the edge device from input images are transmitted to the cloud for further processing. We consider the impact of pac… ▽ More Edge devices, such as cameras and mobile units, are increasingly capable of performing sophisticated computation in addition to their traditional roles in sensing and communicating signals. The focus of this paper is on collaborative object detection, where deep features computed on the edge device from input images are transmitted to the cloud for further processing. We consider the impact of packet loss on the transmitted features and examine several ways for recovering the missing data. In particular, through theory and experiments, we show that methods for image inpainting based on partial differential equations work well for the recovery of missing features in the latent space. The obtained results represent the new state of the art for missing data recovery in collaborative object detection. △ Less

Submitted 29 January, 2021; originally announced February 2021.

Comments: Extended version of the paper "Latent Space Inpainting for Loss-Resilient Collaborative Object Detection," to be presented at the IEEE International Conference on Communications (ICC), Montreal, Canada, June 14-23, 2021

arXiv:2101.08427 [pdf, other]

Analysis of Information Flow Through U-Nets

Authors: Suemin Lee, Ivan V. Bajić

Abstract: Deep Neural Networks (DNNs) have become ubiquitous in medical image processing and analysis. Among them, U-Nets are very popular in various image segmentation tasks. Yet, little is known about how information flows through these networks and whether they are indeed properly designed for the tasks they are being proposed for. In this paper, we employ information-theoretic tools in order to gain ins… ▽ More Deep Neural Networks (DNNs) have become ubiquitous in medical image processing and analysis. Among them, U-Nets are very popular in various image segmentation tasks. Yet, little is known about how information flows through these networks and whether they are indeed properly designed for the tasks they are being proposed for. In this paper, we employ information-theoretic tools in order to gain insight into information flow through U-Nets. In particular, we show how mutual information between input/output and an intermediate layer can be a useful tool to understand information flow through various portions of a U-Net, assess its architectural efficiency, and even propose more efficient designs. △ Less

Submitted 2 April, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

arXiv:2009.12430 [pdf, other]

doi 10.1109/TIP.2021.3060875

Pareto-Optimal Bit Allocation for Collaborative Intelligence

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: In recent studies, collaborative intelligence (CI) has emerged as a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile/edge devices. In CI, the AI model (a deep neural network) is split between the edge and the cloud, and intermediate features are sent from the edge sub-model to the cloud sub-model. In this paper, we study bit allocation for feature coding… ▽ More In recent studies, collaborative intelligence (CI) has emerged as a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile/edge devices. In CI, the AI model (a deep neural network) is split between the edge and the cloud, and intermediate features are sent from the edge sub-model to the cloud sub-model. In this paper, we study bit allocation for feature coding in multi-stream CI systems. We model task distortion as a function of rate using convex surfaces similar to those found in distortion-rate theory. Using such models, we are able to provide closed-form bit allocation solutions for single-task systems and scalarized multi-task systems. Moreover, we provide analytical characterization of the full Pareto set for 2-stream k-task systems, and bounds on the Pareto set for 3-stream 2-task systems. Analytical results are examined on a variety of DNN models from the literature to demonstrate wide applicability of the results △ Less

Submitted 29 April, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

Journal ref: IEEE Trans. Image Processing, vol. 30, pp. 3348-3361, Feb. 2021

arXiv:2009.07756 [pdf, ps, other]

Exploring Bayesian Surprise to Prevent Overfitting and to Predict Model Performance in Non-Intrusive Load Monitoring

Authors: Richard Jones, Christoph Klemenjak, Stephen Makonin, Ivan V. Bajic

Abstract: Non-Intrusive Load Monitoring (NILM) is a field of research focused on segregating constituent electrical loads in a system based only on their aggregated signal. Significant computational resources and research time are spent training models, often using as much data as possible, perhaps driven by the preconception that more data equates to more accurate models and better performing algorithms. W… ▽ More Non-Intrusive Load Monitoring (NILM) is a field of research focused on segregating constituent electrical loads in a system based only on their aggregated signal. Significant computational resources and research time are spent training models, often using as much data as possible, perhaps driven by the preconception that more data equates to more accurate models and better performing algorithms. When has enough prior training been done? When has a NILM algorithm encountered new, unseen data? This work applies the notion of Bayesian surprise to answer these questions which are important for both supervised and unsupervised algorithms. We quantify the degree of surprise between the predictive distribution (termed postdictive surprise), as well as the transitional probabilities (termed transitional surprise), before and after a window of observations. We compare the performance of several benchmark NILM algorithms supported by NILMTK, in order to establish a useful threshold on the two combined measures of surprise. We validate the use of transitional surprise by exploring the performance of a popular Hidden Markov Model as a function of surprise threshold. Finally, we explore the use of a surprise threshold as a regularization technique to avoid overfitting in cross-dataset performance. Although the generality of the specific surprise threshold discussed herein may be suspect without further testing, this work provides clear evidence that a point of diminishing returns of model performance with respect to dataset size exists. This has implications for future model development, dataset acquisition, as well as aiding in model flexibility during deployment. △ Less

Submitted 16 September, 2020; originally announced September 2020.

arXiv:2007.13645 [pdf, other]

PowerGAN: Synthesizing Appliance Power Signatures Using Generative Adversarial Networks

Authors: Alon Harell, Richard Jones, Stephen Makonin, Ivan V. Bajic

Abstract: Non-intrusive load monitoring (NILM) allows users and energy providers to gain insight into home appliance electricity consumption using only the building's smart meter. Most current techniques for NILM are trained using significant amounts of labeled appliances power data. The collection of such data is challenging, making data a major bottleneck in creating well generalizing NILM solutions. To h… ▽ More Non-intrusive load monitoring (NILM) allows users and energy providers to gain insight into home appliance electricity consumption using only the building's smart meter. Most current techniques for NILM are trained using significant amounts of labeled appliances power data. The collection of such data is challenging, making data a major bottleneck in creating well generalizing NILM solutions. To help mitigate the data limitations, we present the first truly synthetic appliance power signature generator. Our solution, PowerGAN, is based on conditional, progressively growing, 1-D Wasserstein generative adversarial network (GAN). Using PowerGAN, we are able to synthesise truly random and realistic appliance power data signatures. We evaluate the samples generated by PowerGAN in a qualitative way as well as numerically by using traditional GAN evaluation methods such as the Inception score. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2003.03092 [pdf, other]

doi 10.1109/TMM.2020.2975420

Soft Video Multicasting Using Adaptive Compressed Sensing

Authors: Hadi Hadizadeh, Ivan V. bajic

Abstract: Recently, soft video multicasting has gained a lot of attention, especially in broadcast and mobile scenarios where the bit rate supported by the channel may differ across receivers, and may vary quickly over time. Unlike the conventional designs that force the source to use a single bit rate according to the receiver with the worst channel quality, soft video delivery schemes transmit the video s… ▽ More Recently, soft video multicasting has gained a lot of attention, especially in broadcast and mobile scenarios where the bit rate supported by the channel may differ across receivers, and may vary quickly over time. Unlike the conventional designs that force the source to use a single bit rate according to the receiver with the worst channel quality, soft video delivery schemes transmit the video such that the video quality at each receiver is commensurate with its specific instantaneous channel quality. In this paper, we present a soft video multicasting system using an adaptive block-based compressed sensing (BCS) method. The proposed system consists of an encoder, a transmission system, and a decoder. At the encoder side, each block in each frame of the input video is adaptively sampled with a rate that depends on the texture complexity and visual saliency of the block. The obtained BCS samples are then placed into several packets, and the packets are transmitted via a channel-aware OFDM (orthogonal frequency division multiplexing) transmission system with a number of subchannels. At the decoder side, the received BCS samples are first used to build an initial approximation of the transmitted frame. To further improve the reconstruction quality, an iterative BCS reconstruction algorithm is then proposed that uses an adaptive transform and an adaptive soft-thresholding operator, which exploits the temporal similarity between adjacent frames to achieve better reconstruction quality. The extensive objective and subjective experimental results indicate the superiority of the proposed system over the state-of-the-art soft video multicasting systems. △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:2002.07048 [pdf, other]

Bit Allocation for Multi-Task Collaborative Intelligence

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: Recent studies have shown that collaborative intelligence (CI) is a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile devices. In CI, a deep neural network is split between the mobile device and the cloud. Deep features obtained at the mobile are compressed and transferred to the cloud to complete the inference. So far, the methods in the literature focuse… ▽ More Recent studies have shown that collaborative intelligence (CI) is a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile devices. In CI, a deep neural network is split between the mobile device and the cloud. Deep features obtained at the mobile are compressed and transferred to the cloud to complete the inference. So far, the methods in the literature focused on transferring a single deep feature tensor from the mobile to the cloud. Such methods are not applicable to some recent, high-performance networks with multiple branches and skip connections. In this paper, we propose the first bit allocation method for multi-stream, multi-task CI. We first establish a model for the joint distortion of the multiple tasks as a function of the bit rates assigned to different deep feature tensors. Then, using the proposed model, we solve the rate-distortion optimization problem under a total rate constraint to obtain the best rate allocation among the tensors to be transferred. Experimental results illustrate the efficacy of the proposed scheme compared to several alternative bit allocation methods. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: Accepted for publication ICASSP'20

arXiv:2002.07036 [pdf, other]

Back-and-Forth prediction for deep tensor compression

Authors: Hyomin Choi, Robert A. Cohen, Ivan V. Bajic

Abstract: Recent AI applications such as Collaborative Intelligence with neural networks involve transferring deep feature tensors between various computing devices. This necessitates tensor compression in order to optimize the usage of bandwidth-constrained channels between devices. In this paper we present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors, whic… ▽ More Recent AI applications such as Collaborative Intelligence with neural networks involve transferring deep feature tensors between various computing devices. This necessitates tensor compression in order to optimize the usage of bandwidth-constrained channels between devices. In this paper we present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors, which allows us to dramatically reduce tensor size and improve its compressibility. Our experiments with a state-of-the-art object detector demonstrate that the proposed method allows us to significantly reduce the number of bits needed for compressing feature tensors extracted from deep within the model, with negligible degradation of the detection performance and without requiring any retraining of the network weights. We achieve a 62% and 75% reduction in tensor size while kee** the loss in accuracy of the network to less than 1% and 2%, respectively. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: Accepted for publication in IEEE ICASSP'20

arXiv:2002.00157 [pdf, other]

Shared Mobile-Cloud Inference for Collaborative Intelligence

Authors: Mateen Ulhaq, Ivan V. Bajić

Abstract: As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased netw… ▽ More As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. We demonstrate an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can improve inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission. △ Less

Submitted 1 February, 2020; originally announced February 2020.

Comments: 5 pages, 3 figures

arXiv:2001.04433 [pdf, other]

Towards Automated Swimming Analytics Using Deep Neural Networks

Authors: Timothy Woinoski, Alon Harell, Ivan V. Bajic

Abstract: Methods for creating a system to automate the collection of swimming analytics on a pool-wide scale are considered in this paper. There has not been much work on swimmer tracking or the creation of a swimmer database for machine learning purposes. Consequently, methods for collecting swimmer data from videos of swim competitions are explored and analyzed. The result is a guide to the creation of a… ▽ More Methods for creating a system to automate the collection of swimming analytics on a pool-wide scale are considered in this paper. There has not been much work on swimmer tracking or the creation of a swimmer database for machine learning purposes. Consequently, methods for collecting swimmer data from videos of swim competitions are explored and analyzed. The result is a guide to the creation of a comprehensive collection of swimming data suitable for training swimmer detection and tracking systems. With this database in place, systems can then be created to automate the collection of swimming analytics. △ Less

Submitted 13 January, 2020; originally announced January 2020.

arXiv:1906.11942 [pdf]

Datasets for Face and Object Detection in Fisheye Images

Authors: Jianglin Fu, Ivan V. Bajic, Rodney G. Vaughan

Abstract: We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for map** regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and class… ▽ More We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for map** regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for develo** face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway. △ Less

Submitted 27 June, 2019; originally announced June 2019.

arXiv:1902.05179 [pdf, other]

Multi-task learning with compressible features for Collaborative Intelligence

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: A promising way to deploy Artificial Intelligence (AI)-based services on mobile devices is to run a part of the AI model (a deep neural network) on the mobile itself, and the rest in the cloud. This is sometimes referred to as collaborative intelligence. In this framework, intermediate features from the deep network need to be transmitted to the cloud for further processing. We study the case wher… ▽ More A promising way to deploy Artificial Intelligence (AI)-based services on mobile devices is to run a part of the AI model (a deep neural network) on the mobile itself, and the rest in the cloud. This is sometimes referred to as collaborative intelligence. In this framework, intermediate features from the deep network need to be transmitted to the cloud for further processing. We study the case where such features are used for multiple purposes in the cloud (multi-tasking) and where they need to be compressible in order to allow efficient transmission to the cloud. To this end, we introduce a new loss function that encourages feature compressibility while improving system performance on multiple tasks. Experimental results show that with the compression-friendly loss, one can achieve around 20% bitrate reduction without sacrificing the performance on several vision-related tasks. △ Less

Submitted 15 May, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

arXiv:1902.02777 [pdf, other]

FDDB-360: Face Detection in 360-degree Fisheye Images

Authors: Jianglin Fu, Saeed Ranjbar Alvar, Ivan V. Bajic, Rodney G. Vaughan

Abstract: 360-degree cameras offer the possibility to cover a large area, for example an entire room, without using multiple distributed vision sensors. However, geometric distortions introduced by their lenses make computer vision problems more challenging. In this paper we address face detection in 360-degree fisheye images. We show how a face detector trained on regular images can be re-trained for this… ▽ More 360-degree cameras offer the possibility to cover a large area, for example an entire room, without using multiple distributed vision sensors. However, geometric distortions introduced by their lenses make computer vision problems more challenging. In this paper we address face detection in 360-degree fisheye images. We show how a face detector trained on regular images can be re-trained for this purpose, and we also provide a 360-degree fisheye-like version of the popular FDDB face detection dataset, which we call FDDB-360. △ Less

Submitted 7 February, 2019; originally announced February 2019.

arXiv:1901.00062 [pdf, other]

Deep Frame Prediction for Video Coding

Authors: Hyomin Choi, Ivan V. Bajic

Abstract: We propose a novel frame prediction method using a deep neural network (DNN), with the goal of improving video coding efficiency. The proposed DNN makes use of decoded frames, at both encoder and decoder, to predict textures of the current coding block. Unlike conventional inter-prediction, the proposed method does not require any motion information to be transferred between the encoder and the de… ▽ More We propose a novel frame prediction method using a deep neural network (DNN), with the goal of improving video coding efficiency. The proposed DNN makes use of decoded frames, at both encoder and decoder, to predict textures of the current coding block. Unlike conventional inter-prediction, the proposed method does not require any motion information to be transferred between the encoder and the decoder. Still, both uni-directional and bi-directional prediction are possible using the proposed DNN, which is enabled by the use of the temporal index channel, in addition to color channels. In this study, we developed a jointly trained DNN for both uni- and bi- directional prediction, as well as separate networks for uni- and bi-directional prediction, and compared the efficacy of both approaches. The proposed DNNs were compared with the conventional motion-compensated prediction in the latest video coding standard, HEVC, in terms of BD-Bitrate. The experiments show that the proposed joint DNN (for both uni- and bi-directional prediction) reduces the luminance bitrate by about 4.4%, 2.4%, and 2.3% in the Low delay P, Low delay, and Random access configurations, respectively. In addition, using the separately trained DNNs brings further bit savings of about 0.3%-0.5%. △ Less

Submitted 20 June, 2019; v1 submitted 31 December, 2018; originally announced January 2019.

Comments: This paper is accepted by IEEE Transactions on Circuits and Systems for Video Technology in 2019

arXiv:1805.00107 [pdf, other]

MV-YOLO: Motion Vector-aided Tracking by Semantic Object Detection

Authors: Saeed Ranjbar Alvar, Ivan V. Bajić

Abstract: Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded fr… ▽ More Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded frames to construct a fast and efficient tracking engine. The proposed approach is compared with several well-known recent trackers on the OTB tracking dataset. The results indicate advantages of the proposed method in terms of speed and/or accuracy.Other desirable features of the proposed method are its simplicity and deployment efficiency, which stems from the fact that it reuses the resources and information that may already exist in the system for other reasons. △ Less

Submitted 15 June, 2018; v1 submitted 30 April, 2018; originally announced May 2018.

Showing 1–50 of 57 results for author: Bajić, I V