Search | arXiv e-print repository

Boosting Vision-Language Models with Transduction

Authors: Maxime Zanella, Benoît Gérin, Ismail Ben Ayed

Abstract: Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models, consistently improving their performances. Our new objec… ▽ More Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models, consistently improving their performances. Our new objective function can be viewed as a regularized maximum-likelihood estimation, constrained by a KL divergence penalty that integrates the text-encoder knowledge and guides the transductive learning process. We further derive an iterative Block Majorize-Minimize (BMM) procedure for optimizing our objective, with guaranteed convergence and decoupled sample-assignment updates, yielding computationally efficient transduction for large-scale datasets. We report comprehensive evaluations, comparisons, and ablation studies that demonstrate: (i) Transduction can greatly enhance the generalization capabilities of inductive pretrained zero- and few-shot VLMs; (ii) TransCLIP substantially outperforms standard transductive few-shot learning methods relying solely on vision features, notably due to the KL-based language constraint. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18541 [pdf, other]

Low-Rank Few-Shot Adaptation of Vision-Language Models

Authors: Maxime Zanella, Ismail Ben Ayed

Abstract: Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parame… ▽ More Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, task-specific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-of-the-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times and kee** the same hyper-parameters in all the target tasks, i.e., across all the datasets and numbers of shots. Certainly, our surprising results do not dismiss the potential of prompt-learning and adapter-based research. However, we believe that our strong baseline could be used to evaluate progress in these emergent subjects in few-shot VLMs. △ Less

Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.02266 [pdf, other]

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

Authors: Maxime Zanella, Ismail Ben Ayed

Abstract: The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts to… ▽ More The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2211.10119 [pdf, other]

Mixture Domain Adaptation to Improve Semantic Segmentation in Real-World Surveillance

Authors: Sébastien Piérard, Anthony Cioppa, Anaïs Halin, Renaud Vandeghen, Maxime Zanella, Benoît Macq, Saïd Mahmoudi, Marc Van Droogenbroeck

Abstract: Various tasks encountered in real-world surveillance can be addressed by determining posteriors (e.g. by Bayesian inference or machine learning), based on which critical decisions must be taken. However, the surveillance domain (acquisition device, operating conditions, etc.) is often unknown, which prevents any possibility of scene-specific optimization. In this paper, we define a probabilistic f… ▽ More Various tasks encountered in real-world surveillance can be addressed by determining posteriors (e.g. by Bayesian inference or machine learning), based on which critical decisions must be taken. However, the surveillance domain (acquisition device, operating conditions, etc.) is often unknown, which prevents any possibility of scene-specific optimization. In this paper, we define a probabilistic framework and present a formal proof of an algorithm for the unsupervised many-to-infinity domain adaptation of posteriors. Our proposed algorithm is applicable when the probability measure associated with the target domain is a convex combination of the probability measures of the source domains. It makes use of source models and a domain discriminator model trained off-line to compute posteriors adapted on the fly to the target domain. Finally, we show the effectiveness of our algorithm for the task of semantic segmentation in real-world surveillance. The code is publicly available at https://github.com/rvandeghen/MDA. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2211.05226 [pdf, other]

A kinetic approach to consensus-based segmentation of biomedical images

Authors: Raffaella Fiamma Cabini, Anna Pichiecchio, Alessandro Lascialfari, Silvia Figini, Mattia Zanella

Abstract: In this work, we apply a kinetic version of a bounded confidence consensus model to biomedical segmentation problems. In the presented approach, time-dependent information on the microscopic state of each particle/pixel includes its space position and a feature representing a static characteristic of the system, i.e. the gray level of each pixel. From the introduced microscopic model we derive a k… ▽ More In this work, we apply a kinetic version of a bounded confidence consensus model to biomedical segmentation problems. In the presented approach, time-dependent information on the microscopic state of each particle/pixel includes its space position and a feature representing a static characteristic of the system, i.e. the gray level of each pixel. From the introduced microscopic model we derive a kinetic formulation of the model. The large time behavior of the system is then computed with the aid of a surrogate Fokker-Planck approach that can be obtained in the quasi-invariant scaling. We exploit the computational efficiency of direct simulation Monte Carlo methods for the obtained Boltzmann-type description of the problem for parameter identification tasks. Based on a suitable loss function measuring the distance between the ground truth segmentation mask and the evaluated mask, we minimize the introduced segmentation metric for a relevant set of 2D gray-scale images. Applications to biomedical segmentation concentrate on different imaging research contexts. △ Less

Submitted 14 June, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: 29 pages, 13 figures

arXiv:2210.12456 [pdf, other]

Abstract Interpretation-Based Feature Importance for SVMs

Authors: Abhinandan Pal, Francesco Ranzato, Caterina Urban, Marco Zanella

Abstract: We propose a symbolic representation for support vector machines (SVMs) by means of abstract interpretation, a well-known and successful technique for designing and implementing static program analyses. We leverage this abstraction in two ways: (1) to enhance the interpretability of SVMs by deriving a novel feature importance measure, called abstract feature importance (AFI), that does not depend… ▽ More We propose a symbolic representation for support vector machines (SVMs) by means of abstract interpretation, a well-known and successful technique for designing and implementing static program analyses. We leverage this abstraction in two ways: (1) to enhance the interpretability of SVMs by deriving a novel feature importance measure, called abstract feature importance (AFI), that does not depend in any way on a given dataset of the accuracy of the SVM and is very fast to compute, and (2) for verifying stability, notably individual fairness, of SVMs and producing concrete counterexamples when the verification fails. We implemented our approach and we empirically demonstrated its effectiveness on SVMs based on linear and non-linear (polynomial and radial basis function) kernels. Our experimental results show that, independently of the accuracy of the SVM, our AFI measure correlates much more strongly with the stability of the SVM to feature perturbations than feature importance measures widely available in machine learning software such as permutation feature importance. It thus gives better insight into the trustworthiness of SVMs. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2101.00909 [pdf, other]

Fair Training of Decision Tree Classifiers

Authors: Francesco Ranzato, Caterina Urban, Marco Zanella

Abstract: We study the problem of formally verifying individual fairness of decision tree ensembles, as well as training tree models which maximize both accuracy and individual fairness. In our approach, fairness verification and fairness-aware training both rely on a notion of stability of a classification model, which is a variant of standard robustness under input perturbations used in adversarial machin… ▽ More We study the problem of formally verifying individual fairness of decision tree ensembles, as well as training tree models which maximize both accuracy and individual fairness. In our approach, fairness verification and fairness-aware training both rely on a notion of stability of a classification model, which is a variant of standard robustness under input perturbations used in adversarial machine learning. Our verification and training methods leverage abstract interpretation, a well established technique for static program analysis which is able to automatically infer assertions about stability properties of decision trees. By relying on a tool for adversarial training of decision trees, our fairness-aware learning method has been implemented and experimentally evaluated on the reference datasets used to assess fairness properties. The experimental results show that our approach is able to train tree models exhibiting a high degree of individual fairness w.r.t. the natural state-of-the-art CART trees and random forests. Moreover, as a by-product, these fair decision trees turn out to be significantly compact, thus enhancing the interpretability of their fairness properties. △ Less

Submitted 4 January, 2021; originally announced January 2021.

arXiv:2012.11352 [pdf, other]

Genetic Adversarial Training of Decision Trees

Authors: Francesco Ranzato, Marco Zanella

Abstract: We put forward a novel learning methodology for ensembles of decision trees based on a genetic algorithm which is able to train a decision tree for maximizing both its accuracy and its robustness to adversarial perturbations. This learning algorithm internally leverages a complete formal verification technique for robustness properties of decision trees based on abstract interpretation, a well kno… ▽ More We put forward a novel learning methodology for ensembles of decision trees based on a genetic algorithm which is able to train a decision tree for maximizing both its accuracy and its robustness to adversarial perturbations. This learning algorithm internally leverages a complete formal verification technique for robustness properties of decision trees based on abstract interpretation, a well known static program analysis technique. We implemented this genetic adversarial training algorithm in a tool called Meta-Silvae (MS) and we experimentally evaluated it on some reference datasets used in adversarial training. The experimental results show that MS is able to train robust models that compete with and often improve on the current state-of-the-art of adversarial training of decision trees while being much more compact and therefore interpretable and efficient tree models. △ Less

Submitted 21 December, 2020; originally announced December 2020.

arXiv:1904.11803 [pdf, other]

Robustness Verification of Support Vector Machines

Authors: Francesco Ranzato, Marco Zanella

Abstract: We study the problem of formally verifying the robustness to adversarial examples of support vector machines (SVMs), a major machine learning model for classification and regression tasks. Following a recent stream of works on formal robustness verification of (deep) neural networks, our approach relies on a sound abstract version of a given SVM classifier to be used for checking its robustness. T… ▽ More We study the problem of formally verifying the robustness to adversarial examples of support vector machines (SVMs), a major machine learning model for classification and regression tasks. Following a recent stream of works on formal robustness verification of (deep) neural networks, our approach relies on a sound abstract version of a given SVM classifier to be used for checking its robustness. This methodology is parametric on a given numerical abstraction of real values and, analogously to the case of neural networks, needs neither abstract least upper bounds nor widening operators on this abstraction. The standard interval domain provides a simple instantiation of our abstraction technique, which is enhanced with the domain of reduced affine forms, which is an efficient abstraction of the zonotope abstract domain. This robustness verification technique has been fully implemented and experimentally evaluated on SVMs based on linear and nonlinear (polynomial and radial basis function) kernels, which have been trained on the popular MNIST dataset of images and on the recent and more challenging Fashion-MNIST dataset. The experimental results of our prototype SVM robustness verifier appear to be encouraging: this automated verification is fast, scalable and shows significantly high percentages of provable robustness on the test set of MNIST, in particular compared to the analogous provable robustness of neural networks. △ Less

Submitted 26 April, 2019; originally announced April 2019.

arXiv:1805.01892 [pdf, other]

doi 10.1103/PhysRevE.98.022315

Opinion modeling on social media and marketing aspects

Authors: G. Toscani, A. Tosin, M. Zanella

Abstract: We introduce and discuss kinetic models of opinion formation on social networks in which the distribution function depends on both the opinion and the connectivity of the agents. The opinion formation model is subsequently coupled with a kinetic model describing the spreading of popularity of a product on the web through a social network. Numerical experiments on the underlying kinetic models show… ▽ More We introduce and discuss kinetic models of opinion formation on social networks in which the distribution function depends on both the opinion and the connectivity of the agents. The opinion formation model is subsequently coupled with a kinetic model describing the spreading of popularity of a product on the web through a social network. Numerical experiments on the underlying kinetic models show a good qualitative agreement with some measured trends of hashtags on social media websites and illustrate how companies can take advantage of the network structure to obtain at best the advertisement of their products. △ Less

Submitted 4 May, 2018; originally announced May 2018.

MSC Class: 35Q20; 35Q84; 35Q91; 82B21; 91D30

Journal ref: Phys. Rev. E 98, 022315 (2018)

arXiv:1604.00421 [pdf, other]

Opinion dynamics over complex networks: kinetic modeling and numerical methods

Authors: Giacomo Albi, Lorenzo Pareschi, Mattia Zanella

Abstract: In this paper we consider the modeling of opinion dynamics over time dependent large scale networks. A kinetic description of the agents' distribution over the evolving network is considered which combines an opinion update based on binary interactions between agents with a dynamic creation and removal process of new connections. The number of connections of each agent influences the spreading of… ▽ More In this paper we consider the modeling of opinion dynamics over time dependent large scale networks. A kinetic description of the agents' distribution over the evolving network is considered which combines an opinion update based on binary interactions between agents with a dynamic creation and removal process of new connections. The number of connections of each agent influences the spreading of opinions in the network but also the way connections are created is influenced by the agents' opinion. The evolution of the network of connections is studied by showing that its asymptotic behavior is consistent both with Poisson distributions and truncated power-laws. In order to study the large time behavior of the opinion dynamics a mean field description is derived which allows to compute exact stationary solutions in some simplified situations. Numerical methods which are capable to describe correctly the large time behavior of the system are also introduced and discussed. Finally, several numerical examples showing the influence of the agents' number of connections in the opinion dynamics are reported. △ Less

Submitted 1 April, 2016; originally announced April 2016.

Showing 1–11 of 11 results for author: Zanella, M