Search | arXiv e-print repository

Saliency Diversified Deep Ensemble for Robustness to Adversaries

Authors: Alex Bogun, Dimche Kostadinov, Damian Borth

Abstract: Deep learning models have shown incredible performance on numerous image recognition, classification, and reconstruction tasks. Although very appealing and valuable due to their predictive capabilities, one common threat remains challenging to resolve. A specifically trained attacker can introduce malicious input perturbations to fool the network, thus causing potentially harmful mispredictions. M… ▽ More Deep learning models have shown incredible performance on numerous image recognition, classification, and reconstruction tasks. Although very appealing and valuable due to their predictive capabilities, one common threat remains challenging to resolve. A specifically trained attacker can introduce malicious input perturbations to fool the network, thus causing potentially harmful mispredictions. Moreover, these attacks can succeed when the adversary has full access to the target model (white-box) and even when such access is limited (black-box setting). The ensemble of models can protect against such attacks but might be brittle under shared vulnerabilities in its members (attack transferability). To that end, this work proposes a novel diversity-promoting learning approach for the deep ensembles. The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once by introducing an additional term in our learning objective. During training, this helps us minimize the alignment between model saliencies to reduce shared member vulnerabilities and, thus, increase ensemble robustness to adversaries. We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense against medium and high strength white-box attacks. In addition, we demonstrate that our approach combined with existing methods outperforms state-of-the-art ensemble algorithms for defense under white-box and black-box attacks. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: Accepted to AAAI Workshop on Adversarial Machine Learning and Beyond 2022

ACM Class: I.2.0

arXiv:2110.15288 [pdf, other]

Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction

Authors: Konstantin Schürholt, Dimche Kostadinov, Damian Borth

Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn hyper-representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architectu… ▽ More Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn hyper-representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings. △ Less

Submitted 14 December, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: Published at 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia. 31 Pages, 14 figures

arXiv:2110.14422 [pdf]

Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Authors: Shijun Wang, Dimche Kostadinov, Damian Borth

Abstract: Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress with disentanglement methods that separate utterance content and speaker characteristics from speech audio recordings. However, many of these methods are subject… ▽ More Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress with disentanglement methods that separate utterance content and speaker characteristics from speech audio recordings. However, many of these methods are subject to the leakage of prosody (e.g., pitch, volume), causing the speaker voice in the synthesized speech to be different from the desired target speakers. To prevent this issue, we propose a novel self-supervised approach that effectively learns disentangled pitch and volume representations that can represent the prosody styles of different speakers. We then use the learned prosodic representations as conditional information to train and enhance our VC model for zero-shot conversion. In our experiments, we show that our prosody representations are disentangled and rich in prosody information. Moreover, we demonstrate that the addition of our prosody representations improves our VC performance and surpasses state-of-the-art zero-shot VC performances. △ Less

Submitted 31 May, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)

arXiv:2102.04274 [pdf, other]

Privacy-Preserving Near Neighbor Search via Sparse Coding with Ambiguation

Authors: Behrooz Razeghi, Sohrab Ferdowsi, Dimche Kostadinov, Flavio. P. Calmon, Slava Voloshynovskiy

Abstract: In this paper, we propose a framework for privacy-preserving approximate near neighbor search via stochastic sparsifying encoding. The core of the framework relies on sparse coding with ambiguation (SCA) mechanism that introduces the notion of inherent shared secrecy based on the support intersection of sparse codes. This approach is `fairness-aware', in the sense that any point in the neighborhoo… ▽ More In this paper, we propose a framework for privacy-preserving approximate near neighbor search via stochastic sparsifying encoding. The core of the framework relies on sparse coding with ambiguation (SCA) mechanism that introduces the notion of inherent shared secrecy based on the support intersection of sparse codes. This approach is `fairness-aware', in the sense that any point in the neighborhood has an equiprobable chance to be chosen. Our approach can be applied to raw data, latent representation of autoencoders, and aggregated local descriptors. The proposed method is tested on both synthetic i.i.d data and real large-scale image databases. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: To be presented at 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

arXiv:2009.11044 [pdf, other]

Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation

Authors: Dimche Kostadinov, Davide Scaramuzza

Abstract: Event-based cameras record an asynchronous stream of per-pixel brightness changes. As such, they have numerous advantages over the standard frame-based cameras, including high temporal resolution, high dynamic range, and no motion blur. Due to the asynchronous nature, efficient learning of compact representation for event data is challenging. While it remains not explored the extent to which the s… ▽ More Event-based cameras record an asynchronous stream of per-pixel brightness changes. As such, they have numerous advantages over the standard frame-based cameras, including high temporal resolution, high dynamic range, and no motion blur. Due to the asynchronous nature, efficient learning of compact representation for event data is challenging. While it remains not explored the extent to which the spatial and temporal event "information" is useful for pattern recognition tasks. In this paper, we focus on single-layer architectures. We analyze the performance of two general problem formulations: the direct and the inverse, for unsupervised feature learning from local event data (local volumes of events described in space-time). We identify and show the main advantages of each approach. Theoretically, we analyze guarantees for an optimal solution, possibility for asynchronous, parallel parameter update, and the computational complexity. We present numerical experiments for object recognition. We evaluate the solution under the direct and the inverse problem and give a comparison with the state-of-the-art methods. Our empirical results highlight the advantages of both approaches for representation learning from event data. We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods from the same class of methods. △ Less

Submitted 30 September, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Journal ref: IAPR IEEE/Computer Society International Conference on Pattern Recognition (ICPR), Milan, 2021

arXiv:2008.02532 [pdf, other]

Online Weight-adaptive Nonlinear Model Predictive Control

Authors: Dimche Kostadinov, Davide Scaramuzza

Abstract: Nonlinear Model Predictive Control (NMPC) is a powerful and widely used technique for nonlinear dynamic process control under constraints. In NMPC, the state and control weights of the corresponding state and control costs are commonly selected based on human-expert knowledge, which usually reflects the acceptable stability in practice. Although broadly used, this approach might not be optimal for… ▽ More Nonlinear Model Predictive Control (NMPC) is a powerful and widely used technique for nonlinear dynamic process control under constraints. In NMPC, the state and control weights of the corresponding state and control costs are commonly selected based on human-expert knowledge, which usually reflects the acceptable stability in practice. Although broadly used, this approach might not be optimal for the execution of a trajectory with the lowest positional error and sufficiently "smooth" changes in the predicted controls. Furthermore, NMPC with an online weight update strategy for fast, agile, and precise unmanned aerial vehicle navigation, has not been studied extensively. To this end, we propose a novel control problem formulation that allows online updates of the state and control weights. As a solution, we present an algorithm that consists of two alternating stages: (i) state and command variable prediction and (ii) weights update. We present a numerical evaluation with a comparison and analysis of different trade-offs for the problem of quadrotor navigation. Our computer simulation results show improvements of up to 70% in the accuracy of the executed trajectory compared to the standard solution of NMPC with fixed weights. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020

arXiv:1902.00016 [pdf, other]

Network Parameter Learning Using Nonlinear Transforms, Local Representation Goals and Local Propagation Constraints

Authors: Dimche Kostadinov, Behrooz Razdehi, Slava Voloshynovskiy

Abstract: In this paper, we introduce a novel concept for learning of the parameters in a neural network. Our idea is grounded on modeling a learning problem that addresses a trade-off between (i) satisfying local objectives at each node and (ii) achieving desired data propagation through the network under (iii) local propagation constraints. We consider two types of nonlinear transforms which describe the… ▽ More In this paper, we introduce a novel concept for learning of the parameters in a neural network. Our idea is grounded on modeling a learning problem that addresses a trade-off between (i) satisfying local objectives at each node and (ii) achieving desired data propagation through the network under (iii) local propagation constraints. We consider two types of nonlinear transforms which describe the network representations. One of the nonlinear transforms serves as activation function. The other one enables a locally adjusted, deviation corrective components to be included in the update of the network weights in order to enable attaining target specific representations at the last network node. Our learning principle not only provides insight into the understanding and the interpretation of the learning dynamics, but it offers theoretical guarantees over decoupled and parallel parameter estimation strategy that enables learning in synchronous and asynchronous mode. Numerical experiments validate the potential of our approach on image recognition task. The preliminary results show advantages in comparison to the state-of-the-art methods, w.r.t. the learning time and the network size while having competitive recognition accuracy. △ Less

Submitted 31 January, 2019; originally announced February 2019.

Comments: arXiv admin note: text overlap with arXiv:1805.07802

arXiv:1901.10760 [pdf, ps, other]

Clustering with Jointly Learned Nonlinear Transforms Over Discriminating Min-Max Similarity/Dissimilarity Assignment

Authors: Dimche Kostadinov, Behrooz Razeghi, Taras Holotyak, Slava Voloshynovskiy

Abstract: This paper presents a novel clustering concept that is based on jointly learned nonlinear transforms (NTs) with priors on the information loss and the discrimination. We introduce a clustering principle that is based on evaluation of a parametric min-max measure for the discriminative prior. The decomposition of the prior measure allows to break down the assignment into two steps. In the first ste… ▽ More This paper presents a novel clustering concept that is based on jointly learned nonlinear transforms (NTs) with priors on the information loss and the discrimination. We introduce a clustering principle that is based on evaluation of a parametric min-max measure for the discriminative prior. The decomposition of the prior measure allows to break down the assignment into two steps. In the first step, we apply NTs to a data point in order to produce candidate NT representations. In the second step, we preform the actual assignment by evaluating the parametric measure over the candidate NT representations. Numerical experiments on image clustering task validate the potential of the proposed approach. The evaluation shows advantages in comparison to the state-of-the-art clustering methods. △ Less

Submitted 30 January, 2019; originally announced January 2019.

arXiv:1806.08658 [pdf, ps, other]

Privacy-Preserving Identification via Layered Sparse Code Design: Distributed Servers and Multiple Access Authorization

Authors: Behrooz Razeghi, Slava Voloshynovskiy, Sohrab Ferdowsi, Dimche Kostadinov

Abstract: We propose a new computationally efficient privacy-preserving identification framework based on layered sparse coding. The key idea of the proposed framework is a sparsifying transform learning with ambiguization, which consists of a trained linear map, a component-wise nonlinearity and a privacy amplification. We introduce a practical identification framework, which consists of two phases: public… ▽ More We propose a new computationally efficient privacy-preserving identification framework based on layered sparse coding. The key idea of the proposed framework is a sparsifying transform learning with ambiguization, which consists of a trained linear map, a component-wise nonlinearity and a privacy amplification. We introduce a practical identification framework, which consists of two phases: public and private identification. The public untrusted server provides the fast search service based on the sparse privacy protected codebook stored at its side. The private trusted server or the local client application performs the refined accurate similarity search using the results of the public search and the layered sparse codebooks stored at its side. The private search is performed in the decoded domain and also the accuracy of private search is chosen based on the authorization level of the client. The efficiency of the proposed method is in computational complexity of encoding, decoding, "encryption" (ambiguization) and "decryption" (purification) as well as storage complexity of the codebooks. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: EUSIPCO 2018

arXiv:1805.07802 [pdf, other]

Network Learning with Local Propagation

Authors: Dimche Kostadinov, Behrooz Razeghi, Sohrab Ferdowsi, Slava Voloshynovskiy

Abstract: This paper presents a locally decoupled network parameter learning with local propagation. Three elements are taken into account: (i) sets of nonlinear transforms that describe the representations at all nodes, (ii) a local objective at each node related to the corresponding local representation goal, and (iii) a local propagation model that relates the nonlinear error vectors at each node with th… ▽ More This paper presents a locally decoupled network parameter learning with local propagation. Three elements are taken into account: (i) sets of nonlinear transforms that describe the representations at all nodes, (ii) a local objective at each node related to the corresponding local representation goal, and (iii) a local propagation model that relates the nonlinear error vectors at each node with the goal error vectors from the directly connected nodes. The modeling concepts (i), (ii) and (iii) offer several advantages, including (a) a unified learning principle for any network that is represented as a graph, (b) understanding and interpretation of the local and the global learning dynamics, (c) decoupled and parallel parameter learning, (d) a possibility for learning in infinitely long, multi-path and multi-goal networks. Numerical experiments validate the potential of the learning principle. The preliminary results show advantages in comparison to the state-of-the-art methods, w.r.t. the learning time and the network size while having comparable recognition accuracy. △ Less

Submitted 20 May, 2018; originally announced May 2018.

Comments: preprint, a similar version submitted to NIPS 2018

arXiv:1710.11510 [pdf, other]

A multi-layer network based on Sparse Ternary Codes for universal vector compression

Authors: Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov

Abstract: We present the multi-layer extension of the Sparse Ternary Codes (STC) for fast similarity search where we focus on the reconstruction of the database vectors from the ternary codes. To consider the trade-offs between the compactness of the STC and the quality of the reconstructed vectors, we study the rate-distortion behavior of these codes under different setups. We show that a single-layer code… ▽ More We present the multi-layer extension of the Sparse Ternary Codes (STC) for fast similarity search where we focus on the reconstruction of the database vectors from the ternary codes. To consider the trade-offs between the compactness of the STC and the quality of the reconstructed vectors, we study the rate-distortion behavior of these codes under different setups. We show that a single-layer code cannot achieve satisfactory results at high rates. Therefore, we extend the concept of STC to multiple layers and design the ML-STC, a codebook-free system that successively refines the reconstruction of the residuals of previous layers. While the ML-STC keeps the sparse ternary structure of the single-layer STC and hence is suitable for fast similarity search in large-scale databases, we show its superior rate-distortion performance on both model-based synthetic data and public large-scale databases, as compared to several binary hashing methods. △ Less

Submitted 31 October, 2017; originally announced October 2017.

Comments: Submitted to ICASSP 2018

arXiv:1709.10297 [pdf, other]

Privacy Preserving Identification Using Sparse Approximation with Ambiguization

Authors: Behrooz Razeghi, Slava Voloshynovskiy, Dimche Kostadinov, Olga Taran

Abstract: In this paper, we consider a privacy preserving encoding framework for identification applications covering biometrics, physical object security and the Internet of Things (IoT). The proposed framework is based on a sparsifying transform, which consists of a trained linear map, an element-wise nonlinearity, and privacy amplification. The sparsifying transform and privacy amplification are not symm… ▽ More In this paper, we consider a privacy preserving encoding framework for identification applications covering biometrics, physical object security and the Internet of Things (IoT). The proposed framework is based on a sparsifying transform, which consists of a trained linear map, an element-wise nonlinearity, and privacy amplification. The sparsifying transform and privacy amplification are not symmetric for the data owner and data user. We demonstrate that the proposed approach is closely related to sparse ternary codes (STC), a recent information-theoretic concept proposed for fast approximate nearest neighbor (ANN) search in high dimensional feature spaces that being machine learning in nature also offers significant benefits in comparison to sparse approximation and binary embedding approaches. We demonstrate that the privacy of the database outsourced to a server as well as the privacy of the data user are preserved at a low computational cost, storage and communication burdens. △ Less

Submitted 29 September, 2017; originally announced September 2017.

Comments: submitted to WIFS 2017

arXiv:1707.02194 [pdf, other]

A multi-layer image representation using Regularized Residual Quantization: application to compression and denoising

Authors: Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov

Abstract: A learning-based framework for representation of domain-specific images is proposed where joint compression and denoising can be done using a VQ-based multi-layer network. While it learns to compress the images from a training set, the compression performance is very well generalized on images from a test set. Moreover, when fed with noisy versions of the test set, since it has priors from clean i… ▽ More A learning-based framework for representation of domain-specific images is proposed where joint compression and denoising can be done using a VQ-based multi-layer network. While it learns to compress the images from a training set, the compression performance is very well generalized on images from a test set. Moreover, when fed with noisy versions of the test set, since it has priors from clean images, the network also efficiently denoises the test images during the reconstruction. The proposed framework is a regularized version of the Residual Quantization (RQ) where at each stage, the quantization error from the previous stage is further quantized. Instead of codebook learning from the k-means which over-trains for high-dimensional vectors, we show that only generating the codewords from a random, but properly regularized distribution suffices to compress the images globally and without the need to resort to patch-based division of images. The experiments are done on the \textit{CroppedYale-B} set of facial images and the method is compared with the JPEG-2000 codec for compression and BM3D for denoising, showing promising results. △ Less

Submitted 7 July, 2017; originally announced July 2017.

Comments: At the International Conference on Image Processing 2017 (ICIP'17), Bei**g, China

arXiv:1705.00522 [pdf, other]

Regularized Residual Quantization: a multi-layer sparse dictionary learning approach

Authors: Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov

Abstract: The Residual Quantization (RQ) framework is revisited where the quantization distortion is being successively reduced in multi-layers. Inspired by the reverse-water-filling paradigm in rate-distortion theory, an efficient regularization on the variances of the codewords is introduced which allows to extend the RQ for very large numbers of layers and also for high dimensional data, without getting… ▽ More The Residual Quantization (RQ) framework is revisited where the quantization distortion is being successively reduced in multi-layers. Inspired by the reverse-water-filling paradigm in rate-distortion theory, an efficient regularization on the variances of the codewords is introduced which allows to extend the RQ for very large numbers of layers and also for high dimensional data, without getting over-trained. The proposed Regularized Residual Quantization (RRQ) results in multi-layer dictionaries which are additionally sparse, thanks to the soft-thresholding nature of the regularization when applied to variance-decaying data which can arise from de-correlating transformations applied to correlated data. Furthermore, we also propose a general-purpose pre-processing for natural images which makes them suitable for such quantization. The RRQ framework is first tested on synthetic variance-decaying data to show its efficiency in quantization of high-dimensional data. Next, we use the RRQ in super-resolution of a database of facial images where it is shown that low-resolution facial images from the test set quantized with codebooks trained on high-resolution images from the training set show relevant high-frequency content when reconstructed with those codebooks. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: To be presented at SPARS 2017, Lisbon, Portugal

arXiv:1701.07675 [pdf, other]

Sparse Ternary Codes for similarity search have higher coding gain than dense binary codes

Authors: Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov, Taras Holotyak

Abstract: This paper addresses the problem of Approximate Nearest Neighbor (ANN) search in pattern recognition where feature vectors in a database are encoded as compact codes in order to speed-up the similarity search in large-scale databases. Considering the ANN problem from an information-theoretic perspective, we interpret it as an encoding, which maps the original feature vectors to a less entropic spa… ▽ More This paper addresses the problem of Approximate Nearest Neighbor (ANN) search in pattern recognition where feature vectors in a database are encoded as compact codes in order to speed-up the similarity search in large-scale databases. Considering the ANN problem from an information-theoretic perspective, we interpret it as an encoding, which maps the original feature vectors to a less entropic sparse representation while requiring them to be as informative as possible. We then define the coding gain for ANN search using information-theoretic measures. We next show that the classical approach to this problem, which consists of binarization of the projected vectors is sub-optimal. Instead, a properly designed ternary encoding achieves higher coding gains and lower complexity. △ Less

Submitted 25 April, 2017; v1 submitted 26 January, 2017; originally announced January 2017.

Comments: Accepted at 2017 IEEE International Symposium on Information Theory (ISIT'17)

arXiv:1506.03998 [pdf, ps, other]

Sparse Multi-layer Image Approximation: Facial Image Compression

Authors: Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Dimche Kostadinov

Abstract: We propose a scheme for multi-layer representation of images. The problem is first treated from an information-theoretic viewpoint where we analyze the behavior of different sources of information under a multi-layer data compression framework and compare it with a single-stage (shallow) structure. We then consider the image data as the source of information and link the proposed representation sc… ▽ More We propose a scheme for multi-layer representation of images. The problem is first treated from an information-theoretic viewpoint where we analyze the behavior of different sources of information under a multi-layer data compression framework and compare it with a single-stage (shallow) structure. We then consider the image data as the source of information and link the proposed representation scheme to the problem of multi-layer dictionary learning for visual data. For the current work we focus on the problem of image compression for a special class of images where we report a considerable performance boost in terms of PSNR at high compression ratios in comparison with the JPEG2000 codec. △ Less

Submitted 12 June, 2015; originally announced June 2015.

Comments: Submitted to the MLSP 2015

Showing 1–16 of 16 results for author: Kostadinov, D