Search | arXiv e-print repository

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville

Abstract: Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse… ▽ More Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while kee** the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2404.00675 [pdf, other]

LLM meets Vision-Language Models for Zero-Shot One-Class Classification

Authors: Yassir Bendou, Giulia Lioi, Bastien Pasdeloup, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux, Vincent Gripon

Abstract: We consider the problem of zero-shot one-class visual classification, extending traditional one-class classification to scenarios where only the label of the target class is available. This method aims to discriminate between positive and negative query samples without requiring examples from the target class. We propose a two-step solution that first queries large language models for visually con… ▽ More We consider the problem of zero-shot one-class visual classification, extending traditional one-class classification to scenarios where only the label of the target class is available. This method aims to discriminate between positive and negative query samples without requiring examples from the target class. We propose a two-step solution that first queries large language models for visually confusing objects and then relies on vision-language pre-trained models (e.g., CLIP) to perform classification. By adapting large-scale vision benchmarks, we demonstrate the ability of the proposed method to outperform adapted off-the-shelf alternatives in this setting. Namely, we propose a realistic benchmark where negative query samples are drawn from the same original dataset as positive ones, including a granularity-controlled version of iNaturalist, where negative samples are at a fixed distance in the taxonomy tree from the positive ones. To our knowledge, we are the first to demonstrate the ability to discriminate a single category from other semantically related ones using only its label. △ Less

Submitted 27 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2401.11311 [pdf, other]

A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models

Authors: Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux

Abstract: In recent years, the rapid evolution of computer vision has seen the emergence of various foundation models, each tailored to specific data types and tasks. In this study, we explore the adaptation of these models for few-shot semantic segmentation. Specifically, we conduct a comprehensive comparative analysis of four prominent foundation models: DINO V2, Segment Anything, CLIP, Masked AutoEncoder… ▽ More In recent years, the rapid evolution of computer vision has seen the emergence of various foundation models, each tailored to specific data types and tasks. In this study, we explore the adaptation of these models for few-shot semantic segmentation. Specifically, we conduct a comprehensive comparative analysis of four prominent foundation models: DINO V2, Segment Anything, CLIP, Masked AutoEncoders, and of a straightforward ResNet50 pre-trained on the COCO dataset. We also include 5 adaptation methods, ranging from linear probing to fine tuning. Our findings show that DINO V2 outperforms other models by a large margin, across various datasets and adaptation methods. On the other hand, adaptation methods provide little discrepancy in the obtained results, suggesting that a simple linear probing can compete with advanced, more computationally intensive, alternatives △ Less

Submitted 2 April, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

arXiv:2311.14544 [pdf, other]

Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning

Authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Giulia Lioi, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene

Abstract: In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual feat… ▽ More In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual features distribution. In this paper, we present a novel approach that leverages text-derived statistics to predict the mean and covariance of the visual feature distribution for each class. This predictive framework enriches the latent space, yielding more robust and generalizable few-shot learning models. We demonstrate the efficacy of incorporating both mean and covariance statistics in improving few-shot classification performance across various datasets. Our method shows that we can use text to predict the mean and covariance of the distribution offering promising improvements in few-shot learning scenarios. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

arXiv:2309.03974 [pdf, other]

DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation

Authors: Pau Mulet Arabi, Alec Flowers, Lukas Mauch, Fabien Cardinaux

Abstract: Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true… ▽ More Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true probability distribution and the drawn samples, a common issue in low sampling regimes that results in inaccurate gradient estimates. In this paper, we introduce DBsurf, a reinforce-based estimator for discrete distributions that uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. To assess the performance of our estimator, we subject it to a diverse set of tasks. Among existing estimators, DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking. Furthermore, DBsurf achieves the best results for training variational auto-encoders (VAE) across different datasets and sampling setups. Finally, we apply DBsurf to build a simple and efficient Neural Architecture Search (NAS) algorithm with state-of-the-art performance. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 22 pages, 7 figures

ACM Class: I.2.0

arXiv:2306.01442 [pdf, other]

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

Authors: Fabian Kögel, Bac Nguyen, Fabien Cardinaux

Abstract: State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate that such artefacts are introduced to the vocoder reconstruction by over-smooth mel-spectrogram predictions, which are induced by the choice of mean-squa… ▽ More State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate that such artefacts are introduced to the vocoder reconstruction by over-smooth mel-spectrogram predictions, which are induced by the choice of mean-squared-error (MSE) loss for training the mel-spectrogram decoder. With MSE loss FastSpeech 2 is limited to learn conditional averages of the training distribution, which might not lie close to a natural sample if the distribution still appears multimodal after all conditioning signals. To alleviate this problem, we introduce TVC-GMM, a mixture model of Trivariate-Chain Gaussian distributions, to model the residual multimodality. TVC-GMM reduces spectrogram smoothness and improves perceptual audio quality in particular for expressive datasets as shown by both objective and subjective evaluation. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted at INTERSPEECH 2023

arXiv:2303.03717 [pdf, other]

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux

Abstract: Self-supervised learning (SSL) has recently shown remarkable results in closing the gap between supervised and unsupervised learning. The idea is to learn robust features that are invariant to distortions of the input data. Despite its success, this idea can suffer from a collapsing issue where the network produces a constant representation. To this end, we introduce SELFIE, a novel Self-supervise… ▽ More Self-supervised learning (SSL) has recently shown remarkable results in closing the gap between supervised and unsupervised learning. The idea is to learn robust features that are invariant to distortions of the input data. Despite its success, this idea can suffer from a collapsing issue where the network produces a constant representation. To this end, we introduce SELFIE, a novel Self-supervised Learning approach for audio representation via Feature Diversity and Decorrelation. SELFIE avoids the collapsing issue by ensuring that the representation (i) maintains a high diversity among embeddings and (ii) decorrelates the dependencies between dimensions. SELFIE is pre-trained on the large-scale AudioSet dataset and its embeddings are validated on nine audio downstream tasks, including speech, music, and sound event recognition. Experimental results show that SELFIE outperforms existing SSL methods in several tasks. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: ICASSP 2023

arXiv:2212.06461 [pdf, ps, other]

A Statistical Model for Predicting Generalization in Few-Shot Classification

Authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux, Ghouthi Boukli Hacene, Javier Alonso Garcia

Abstract: The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaus… ▽ More The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We empirically show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy. △ Less

Submitted 28 March, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2205.12918 [pdf, other]

A Low Memory Footprint Quantized Neural Network for Depth Completion of Very Sparse Time-of-Flight Depth Maps

Authors: Xiaowen Jiang, Valerio Cambareri, Gianluca Agresti, Cynthia Ifeyinwa Ugwu, Adriano Simonetto, Fabien Cardinaux, Pietro Zanuttigh

Abstract: Sparse active illumination enables precise time-of-flight depth sensing as it maximizes signal-to-noise ratio for low power budgets. However, depth completion is required to produce dense depth maps for 3D perception. We address this task with realistic illumination and sensor resolution constraints by simulating ToF datasets for indoor 3D perception with challenging sparsity levels. We propose a… ▽ More Sparse active illumination enables precise time-of-flight depth sensing as it maximizes signal-to-noise ratio for low power budgets. However, depth completion is required to produce dense depth maps for 3D perception. We address this task with realistic illumination and sensor resolution constraints by simulating ToF datasets for indoor 3D perception with challenging sparsity levels. We propose a quantized convolutional encoder-decoder network for this task. Our model achieves optimal depth map quality by means of input pre-processing and carefully tuned training with a geometry-preserving loss function. We also achieve low memory footprint for weights and activations by means of mixed precision quantization-at-training techniques. The resulting quantized models are comparable to the state of the art in terms of quality, but they require very low GPU times and achieve up to 14-fold memory size reduction for the weights w.r.t. their floating point counterpart with minimal impact on quality metrics. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022. Presented at the 5th Efficient Deep Learning for Computer Vision Workshop

arXiv:2203.11049 [pdf, other]

AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling

Authors: Bac Nguyen, Fabien Cardinaux, Stefan Uhlich

Abstract: Parallel text-to-speech (TTS) models have recently enabled fast and highly-natural speech synthesis. However, they typically require external alignment models, which are not necessarily optimized for the decoder as they are not jointly trained. In this paper, we propose a differentiable duration method for learning monotonic alignments between input and output sequences. Our method is based on a s… ▽ More Parallel text-to-speech (TTS) models have recently enabled fast and highly-natural speech synthesis. However, they typically require external alignment models, which are not necessarily optimized for the decoder as they are not jointly trained. In this paper, we propose a differentiable duration method for learning monotonic alignments between input and output sequences. Our method is based on a soft-duration mechanism that optimizes a stochastic process in expectation. Using this differentiable duration method, we introduce AutoTTS, a direct text-to-waveform speech synthesis model. AutoTTS enables high-fidelity speech synthesis through a combination of adversarial training and matching the total ground-truth duration. Experimental results show that our model obtains competitive results while enjoying a much simpler training pipeline. Audio samples are available online. △ Less

Submitted 7 March, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: ICASSP 2023

arXiv:2106.00992 [pdf, other]

NVC-Net: End-to-End Adversarial Voice Conversion

Authors: Bac Nguyen, Fabien Cardinaux

Abstract: Voice conversion has gained increasing popularity in many applications of speech synthesis. The idea is to change the voice identity from one speaker into another while kee** the linguistic content unchanged. Many voice conversion approaches rely on the use of a vocoder to reconstruct the speech from acoustic features, and as a consequence, the speech quality heavily depends on such a vocoder. I… ▽ More Voice conversion has gained increasing popularity in many applications of speech synthesis. The idea is to change the voice identity from one speaker into another while kee** the linguistic content unchanged. Many voice conversion approaches rely on the use of a vocoder to reconstruct the speech from acoustic features, and as a consequence, the speech quality heavily depends on such a vocoder. In this paper, we propose NVC-Net, an end-to-end adversarial network, which performs voice conversion directly on the raw audio waveform of arbitrary length. By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many voice conversion as well as zero-shot voice conversion from a short utterance of an unseen target speaker. Importantly, NVC-Net is non-autoregressive and fully convolutional, achieving fast inference. Our model is capable of producing samples at a rate of more than 3600 kHz on an NVIDIA V100 GPU, being orders of magnitude faster than state-of-the-art methods under the same hardware configurations. Objective and subjective evaluations on non-parallel many-to-many voice conversion tasks show that NVC-Net obtains competitive results with significantly fewer parameters. △ Less

Submitted 2 June, 2021; originally announced June 2021.

arXiv:2103.13322 [pdf, other]

DNN Quantization with Attention

Authors: Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux

Abstract: Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that rel… ▽ More Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that relaxes the low-bit quantization. We call this procedure \textit{DNN Quantization with Attention} (DQA). The relaxation is achieved by using a learnable linear combination of high, medium and low-bit quantizations. Our learning procedure converges step by step to a low-bit quantization using an attention mechanism with temperature scheduling. In experiments, our approach outperforms other low-bit quantization techniques on various object recognition benchmarks such as CIFAR10, CIFAR100 and ImageNet ILSVRC 2012, achieves almost the same accuracy as a full precision DNN, and considerably reduces the accuracy drop when quantizing lightweight DNN architectures. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2102.06725 [pdf, other]

Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives

Authors: Takuya Narihira, Javier Alonsogarcia, Fabien Cardinaux, Akio Hayakawa, Masato Ishii, Kazunori Iwaki, Thomas Kemp, Yoshiyuki Kobayashi, Lukas Mauch, Akira Nakamura, Yukio Obuchi, Andrew Shin, Kenji Suzuki, Stephen Tiedmann, Stefan Uhlich, Takuya Yashima, Kazuki Yoshiyama

Abstract: While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspe… ▽ More While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspective, with emphasis on usability and compatibility as its core design principles. We elaborate on each of our design principles and its merits, and validate our attempts via experiments. △ Less

Submitted 21 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: https://nnabla.org

arXiv:2011.12043 [pdf, other]

Efficient Sampling for Predictor-Based Neural Architecture Search

Authors: Lukas Mauch, Stephen Tiedemann, Javier Alonso Garcia, Bac Nguyen Cong, Kazuki Yoshiyama, Fabien Cardinaux, Thomas Kemp

Abstract: Recently, predictor-based algorithms emerged as a promising approach for neural architecture search (NAS). For NAS, we typically have to calculate the validation accuracy of a large number of Deep Neural Networks (DNNs), what is computationally complex. Predictor-based NAS algorithms address this problem. They train a proxy model that can infer the validation accuracy of DNNs directly from their n… ▽ More Recently, predictor-based algorithms emerged as a promising approach for neural architecture search (NAS). For NAS, we typically have to calculate the validation accuracy of a large number of Deep Neural Networks (DNNs), what is computationally complex. Predictor-based NAS algorithms address this problem. They train a proxy model that can infer the validation accuracy of DNNs directly from their network structure. During optimization, the proxy can be used to narrow down the number of architectures for which the true validation accuracy must be computed, what makes predictor-based algorithms sample efficient. Usually, we compute the proxy for all DNNs in the network search space and pick those that maximize the proxy as candidates for optimization. However, that is intractable in practice, because the search spaces are often very large and contain billions of network architectures. The contributions of this paper are threefold: 1) We define a sample efficiency gain to compare different predictor-based NAS algorithms. 2) We conduct experiments on the NASBench-101 dataset and show that the sample efficiency of predictor-based algorithms decreases dramatically if the proxy is only computed for a subset of the search space. 3) We show that if we choose the subset of the search space on which the proxy is evaluated in a smart way, the sample efficiency of the original predictor-based algorithm that has access to the full search space can be regained. This is an important step to make predictor-based NAS algorithms useful, in practice. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2005.07810 [pdf, other]

Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Authors: Mohammad Asif Khan, Fabien Cardinaux, Stefan Uhlich, Marc Ferras, Asja Fischer

Abstract: In recent years generative adversarial network (GAN) based models have been successfully applied for unsupervised speech-to-speech conversion.The rich compact harmonic view of the magnitude spectrogram is considered a suitable choice for training these models with audio data. To reconstruct the speech signal first a magnitude spectrogram is generated by the neural network, which is then utilized b… ▽ More In recent years generative adversarial network (GAN) based models have been successfully applied for unsupervised speech-to-speech conversion.The rich compact harmonic view of the magnitude spectrogram is considered a suitable choice for training these models with audio data. To reconstruct the speech signal first a magnitude spectrogram is generated by the neural network, which is then utilized by methods like the Griffin-Lim algorithm to reconstruct a phase spectrogram. This procedure bears the problem that the generated magnitude spectrogram may not be consistent, which is required for finding a phase such that the full spectrogram has a natural-sounding speech waveform. In this work, we approach this problem by proposing a condition encouraging spectrogram consistency during the adversarial training procedure. We demonstrate our approach on the task of translating the voice of a male speaker to that of a female speaker, and vice versa. Our experimental results on the Librispeech corpus show that the model trained with the TF consistency provides a perceptually better quality of speech-to-speech conversion. △ Less

Submitted 18 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:1911.04951 [pdf, ps, other]

doi 10.1109/JSTSP.2020.3005030

Iteratively Training Look-Up Tables for Network Quantization

Authors: Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso Garcia, Lukas Mauch, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

Abstract: Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word length of the network parameters or remove weights from the network if they are not needed. In this article we discuss a general framework for network reduction… ▽ More Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word length of the network parameters or remove weights from the network if they are not needed. In this article we discuss a general framework for network reduction which we call `Look-Up Table Quantization` (LUT-Q). For each layer, we learn a value dictionary and an assignment matrix to represent the network weights. We propose a special solver which combines gradient descent and a one-step k-means update to learn both the value dictionaries and assignment matrices iteratively. This method is very flexible: by constraining the value dictionary, many different reduction problems such as non-uniform network quantization, training of multiplierless networks, network pruning or simultaneous quantization and pruning can be implemented without changing the solver. This flexibility of the LUT-Q method allows us to use the same method to train networks for different hardware capabilities. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:1905.11452 [pdf]

Mixed Precision DNNs: All you need is a good parametrization

Authors: Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

Abstract: Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desira… ▽ More Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable. Differentiable quantization with straight-through gradients allows to learn the quantizer's parameters using gradient methods. We show that a suited parametrization of the quantizer is the key to achieve a stable training and a good final performance. Specifically, we propose to parametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferred from them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. We confirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance. △ Less

Submitted 22 May, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: International Conference on Learning Representations (ICLR) 2020; Source code at https://github.com/sony/ai-research-code

arXiv:1811.05355 [pdf, ps, other]

Iteratively Training Look-Up Tables for Network Quantization

Authors: Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso García, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

Abstract: Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements. In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary's values. We show that this method is very flexible and that many other techniques can be… ▽ More Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements. In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary's values. We show that this method is very flexible and that many other techniques can be seen as special cases of LUT-Q. For example, we can constrain the dictionary trained with LUT-Q to generate networks with pruned weight matrices or restrict the dictionary to powers-of-two to avoid the need for multiplications. In order to obtain fully multiplier-less networks, we also introduce a multiplier-less version of batch normalization. Extensive experiments on image recognition and object detection tasks show that LUT-Q consistently achieves better performance than other methods with the same quantization bitwidth. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: NIPS 2018 workshop on Compact Deep Neural Networks with industrial applications

arXiv:1807.02710 [pdf, other]

Improving DNN-based Music Source Separation using Phase Features

Authors: Joachim Muth, Stefan Uhlich, Nathanael Perraudin, Thomas Kemp, Fabien Cardinaux, Yuki Mitsufuji

Abstract: Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentall… ▽ More Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentally and propose a new DNN architecture which combines amplitude and phase. This joint approach achieves a better signal-to distortion ratio on the DSD100 dataset for all instruments compared to a network that uses only amplitude features. Especially, the bass instrument benefits from the phase information. △ Less

Submitted 16 July, 2018; v1 submitted 7 July, 2018; originally announced July 2018.

Comments: 7 pages, 9 figures, Joint Workshop on Machine Learning for Music at ICML, IJCAI/ECAI and AAMAS, 2018

arXiv:1411.0314 [pdf, other]

doi 10.1103/PhysRevE.91.032302

Structure of marginally jammed polydisperse packings of frictionless spheres

Authors: Chi Zhang, Cathal B. O'Donovan, Eric I. Corwin, Frédéric Cardinaux, Thomas G. Mason, Matthias E. Möbius, Frank Scheffold

Abstract: We model the packing structure of a marginally jammed bulk ensemble of polydisperse spheres using an extended granocentric mode explicitly taking into account rattlers. This leads to a relation- ship between the characteristic parameters of the packing, such as the mean number of neighbors and the fraction of rattlers, and the radial distribution function g(r). We find excellent agreement between… ▽ More We model the packing structure of a marginally jammed bulk ensemble of polydisperse spheres using an extended granocentric mode explicitly taking into account rattlers. This leads to a relation- ship between the characteristic parameters of the packing, such as the mean number of neighbors and the fraction of rattlers, and the radial distribution function g(r). We find excellent agreement between the model predictions for g(r) and packing simulations as well as experiments on jammed emulsion droplets. The observed quantitative agreement opens the path towards a full structural characterization of jammed particle systems for imaging and scattering experiments. △ Less

Submitted 2 November, 2014; originally announced November 2014.

Comments: submitted

Journal ref: Phys. Rev. E 91, 032302 (2015)

arXiv:1408.4181 [pdf, ps, other]

doi 10.1103/PhysRevE.90.060301

Accounting for inertia effects to access the high-frequency microrheology of viscoelastic fluids

Authors: P. Domínguez-García, Frédéric Cardinaux, Elena Bertseva, László Forró, Frank Scheffold, Sylvia Jeney

Abstract: We study the Brownian motion of microbeads immersed in water and in a viscoelastic wormlike micelles solution by optical trap** interferometry and diffusing wave spectroscopy. Through the mean-square displacement obtained from both techniques, we deduce the mechanical properties of the fluids at high frequencies by explicitly accounting for inertia effects of the particle and the surrounding flu… ▽ More We study the Brownian motion of microbeads immersed in water and in a viscoelastic wormlike micelles solution by optical trap** interferometry and diffusing wave spectroscopy. Through the mean-square displacement obtained from both techniques, we deduce the mechanical properties of the fluids at high frequencies by explicitly accounting for inertia effects of the particle and the surrounding fluid at short time scales. For wormlike micelle solutions, we recover the 3/4 scaling exponent for the loss modulus over two decades in frequency as predicted by the theory for semiflexible polymers. △ Less

Submitted 3 December, 2014; v1 submitted 18 August, 2014; originally announced August 2014.

Journal ref: Physical Review E 90, 060301(R), 2014

arXiv:1305.5182 [pdf, other]

doi 10.1088/0953-8984/25/50/502101

Linear and nonlinear rheology of dense emulsions: Identifying the glass and jamming regimes

Authors: Frank Scheffold, Frédéric Cardinaux, Thomas G. Mason

Abstract: We discuss the linear and non-linear rheology of concentrated (sub)microscale emulsions, amorphous disordered solids composed of repulsive and deformable soft colloidal spheres. Based on recent results from simulation and theory, we derive quantitative predictions for the dependences of the elastic shear modulus and the yield stress on the droplet volume fraction. The remarkable agreement with exp… ▽ More We discuss the linear and non-linear rheology of concentrated (sub)microscale emulsions, amorphous disordered solids composed of repulsive and deformable soft colloidal spheres. Based on recent results from simulation and theory, we derive quantitative predictions for the dependences of the elastic shear modulus and the yield stress on the droplet volume fraction. The remarkable agreement with experiments we observe supports the scenario that the repulsive glass and the jammed state can be clearly identified in the rheology of soft spheres at finite temperature while crossing continuously from a liquid to a highly compressed yet disordered solid. △ Less

Submitted 22 May, 2013; originally announced May 2013.

Comments: submitted

Journal ref: J. Phys.: Condens. Matter 25, 502101 (2013)

arXiv:1209.3362 [pdf]

doi 10.1063/1.4755747

Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

Authors: Giovanni Cerchiari, Fabrizio Croccolo, Frédéric Cardinaux, Frank Scheffold

Abstract: We present an implementation of the analysis of dynamic near field scattering (NFS) data using a graphics processing unit (GPU). We introduce an optimized data management scheme thereby limiting the number of operations required. Overall, we reduce the processing time from hours to minutes, for typical experimental conditions. Previously the limiting step in such experiments, the processing time i… ▽ More We present an implementation of the analysis of dynamic near field scattering (NFS) data using a graphics processing unit (GPU). We introduce an optimized data management scheme thereby limiting the number of operations required. Overall, we reduce the processing time from hours to minutes, for typical experimental conditions. Previously the limiting step in such experiments, the processing time is now comparable to the data acquisition time. Our approach is applicable to various dynamic NFS methods, including shadowgraph, Schlieren and differential dynamic microscopy. △ Less

Submitted 15 September, 2012; originally announced September 2012.

Comments: accepted for publication in Review of Scientific Instruments (Note), supplementary material not included

Journal ref: Rev. Sci. Instrum. 83, 106101 (2012)

arXiv:1112.2510 [pdf, ps, other]

doi 10.1063/1.3673442

Effect of glycerol and dimethyl sulfoxide on the phase behavior of lysozyme: Theory and experiments

Authors: Christoph Gögelein, Dana Wagner, Frédéric Cardinaux, Gerhard Nägele, Stefan U. Egelhaaf

Abstract: Salt, glycerol and dimethyl sulfoxide (DMSO) are used to modify the properties of protein solutions. We experimentally determined the effect of these additives on the phase behavior of lysozyme solutions. Upon the addition of glycerol and DMSO, the fluid-solid transition and the gas-liquid coexistence curve (binodal) shift to lower temperatures and the gap between them increases. The experimentall… ▽ More Salt, glycerol and dimethyl sulfoxide (DMSO) are used to modify the properties of protein solutions. We experimentally determined the effect of these additives on the phase behavior of lysozyme solutions. Upon the addition of glycerol and DMSO, the fluid-solid transition and the gas-liquid coexistence curve (binodal) shift to lower temperatures and the gap between them increases. The experimentally observed trends are consistent with our theoretical predictions based on the thermodynamic perturbation theory (TPT) and the Derjaguin-Landau-Verwey-Overbeek (DLVO) model for the lysozyme-lysozyme pair interactions. The values of the parameters describing the interactions, namely the refractive indices, dielectric constants, Hamaker constant and cut-off length, are extracted from literature or are experimentally determined by independent experiments, including static light scattering to determine the second virial coefficient. We observe that both, glycerol and DMSO, render the potential more repulsive, while sodium chloride reduces the repulsion. △ Less

Submitted 23 December, 2011; v1 submitted 12 December, 2011; originally announced December 2011.

Comments: Manuscript accepted for publication in The Journal of Chemical Physics

Journal ref: The Journal of Chemical Physics 136: 015102, 2012

arXiv:1101.4447 [pdf, other]

doi 10.1039/c0sm01175d

Phase separation and dynamical arrest for particles interacting with mixed potentials--the case of globular proteins revisited

Authors: Thomas Gibaud, Frederic Cardinaux, Johan Bergenholtz, Anna Stradner, Peter Schurtenberger

Abstract: We examine the applicability of the extended law of corresponding states (ELCS) to equilibrium and non equilibrium features of the state diagram of the globular protein lysozyme. We provide compelling evidence that the ELCS correctly reproduces the location of the binodal for different ionic strengths, but fails in describing the location of the arrest line. We subsequently use Mode Coupling Theor… ▽ More We examine the applicability of the extended law of corresponding states (ELCS) to equilibrium and non equilibrium features of the state diagram of the globular protein lysozyme. We provide compelling evidence that the ELCS correctly reproduces the location of the binodal for different ionic strengths, but fails in describing the location of the arrest line. We subsequently use Mode Coupling Theory (MCT) to gain additional insight into the origin of these observations. We demonstrate that while the critical point and the connected binodal and spinodal are governed by the integral features of the interaction potential described by the normalized second virial coefficient, the arrest line is mainly determined by the attractive well depth or bond strength. This article is published in Soft Matter. The reference is: DOI: 10.1039/c0sm01175d △ Less

Submitted 24 January, 2011; originally announced January 2011.

Journal ref: Soft Matter 7, 857 (2011)

arXiv:0902.0310 [pdf]

doi 10.1103/PhysRevLett.99.118301

Interplay between Spinodal Decomposition and Glass Formation in Proteins Exhibiting Short-Range Attractions

Authors: Frederic Cardinaux, Thomas Gibaud, Anna Stradner, Peter Schurtenberger

Abstract: We investigate the competition between spinodal decomposition and dynamical arrest using aqueous solutions of the globular protein lysozyme as a model system for colloids with short-range attractions. We show that quenches below a temperature Ta lead to gel formation as a result of a local arrest of the proteindense phase during spinodal decomposition. The rheological properties of these gels al… ▽ More We investigate the competition between spinodal decomposition and dynamical arrest using aqueous solutions of the globular protein lysozyme as a model system for colloids with short-range attractions. We show that quenches below a temperature Ta lead to gel formation as a result of a local arrest of the proteindense phase during spinodal decomposition. The rheological properties of these gels allow us to use centrifugation experiments to determine the local densities of both phases and to precisely locate the gel boundary and the attractive glass line close to and within the unstable region of the phase diagram. △ Less

Submitted 2 February, 2009; originally announced February 2009.

Journal ref: PRL 99, 118301 (2007)

arXiv:0812.0952 [pdf, ps, other]

Rheology, Structure and Dynamics of Colloid-Polymer Mixtures: from Liquids to Gels

Authors: M. Laurati, G. Petekidis, N. Koumakis, F. Cardinaux, A. B. Schofield, J. M. Brader, M. Fuchs, S. U. Egelhaaf

Abstract: We investigated the viscoelastic properties of colloid-polymer mixtures at intermediate colloid volume fraction and varying polymer concentrations, thereby tuning the attractive interactions. Within the examined range of polymer concentrations, the samples ranged from fluids to gels. Already in the liquid phase the viscoelastic properties significantly changed when approaching the gelation bound… ▽ More We investigated the viscoelastic properties of colloid-polymer mixtures at intermediate colloid volume fraction and varying polymer concentrations, thereby tuning the attractive interactions. Within the examined range of polymer concentrations, the samples ranged from fluids to gels. Already in the liquid phase the viscoelastic properties significantly changed when approaching the gelation boundary, indicating the formation of clusters and transient networks. This is supported by an increasing correlation length of the density fluctuations, observed by static light scattering and microscopy. At the same time, the correlation function determined by dynamic light scattering completely decays, indicating the absence of dynamical arrest. Upon increasing the polymer concentration beyond the gelation boundary, the rheological properties changed qualitatively again, now they are consistent with the formation of colloidal gels. Our experimental results, namely the location of the gelation boundary as well as the elastic (storage) and viscous (loss) moduli, are compared to different theoretical models. These include consideration of the escape time as well as predictions for the viscoelastic moduli based on scaling relations and Mode Coupling Theories (MCT). △ Less

Submitted 4 December, 2008; originally announced December 2008.

Comments: 34 pages, 11 figures, To be submitted to JCP

arXiv:cond-mat/0607264 [pdf, ps, other]

doi 10.1209/0295-5075/77/48004

Modeling Equilibrium Clusters in Lysozyme Solutions

Authors: Frédéric Cardinaux, Anna Stradner, Peter Schurtenberger, Francesco Sciortino, Emanuela Zaccarelli

Abstract: We present a combined experimental and numerical study of the equilibrium cluster formation in globular protein solutions under no-added salt conditions. We show that a cluster phase emerges as a result of a competition between a long-range screened Coulomb repulsion and a short-range attraction. A simple effective potential, in which only depth and width of the attractive part of the potential… ▽ More We present a combined experimental and numerical study of the equilibrium cluster formation in globular protein solutions under no-added salt conditions. We show that a cluster phase emerges as a result of a competition between a long-range screened Coulomb repulsion and a short-range attraction. A simple effective potential, in which only depth and width of the attractive part of the potential are optimized, accounts in a remarkable way for the wavevector dependence of the X-ray scattering structure factor. △ Less

Submitted 4 August, 2006; v1 submitted 11 July, 2006; originally announced July 2006.

Comments: 4 pages, 4 figures

Journal ref: Europhys. Lett. 77, 48004 (2007)

arXiv:cond-mat/0509637 [pdf, ps, other]

doi 10.1103/PhysRevE.73.011413

Multi-speckle diffusing wave spectroscopy with a single mode detection scheme

Authors: P. Zakharov, F. Cardinaux, F. Scheffold

Abstract: We present a detection scheme for diffusing wave spectroscopy (DWS) based on a two cell geometry that allows efficient ensemble averaging. This is achieved by putting a fast rotating diffuser in the optical path between laser and sample. We show that the recorded (multi-speckle) correlation echoes provide an ensemble averaged signal that does not require additional time averaging. We find the pe… ▽ More We present a detection scheme for diffusing wave spectroscopy (DWS) based on a two cell geometry that allows efficient ensemble averaging. This is achieved by putting a fast rotating diffuser in the optical path between laser and sample. We show that the recorded (multi-speckle) correlation echoes provide an ensemble averaged signal that does not require additional time averaging. We find the performance of our experimental scheme comparable or even superior to camera based multi-speckle techniques that rely on direct spatial averaging. Furthermore, combined with traditional two-cell DWS, the full intensity autocorrelation function can be measured with a single experimental setup covering more than 10 decades in correlation time. △ Less

Submitted 27 September, 2005; v1 submitted 26 September, 2005; originally announced September 2005.

Comments: Submitted to PRE

Showing 1–29 of 29 results for author: Cardinaux, F