-
Domain Aligned CLIP for Few-shot Classification
Authors:
Muhammad Waleed Gondal,
Jochen Gast,
Inigo Alonso Ruiz,
Richard Droste,
Tommaso Macri,
Suren Kumar,
Luitpold Staudigl
Abstract:
Large vision-language representation learning models like CLIP have demonstrated impressive performance for zero-shot transfer to downstream tasks while largely benefiting from inter-modal (image-text) alignment via contrastive objectives. This downstream performance can further be enhanced by full-scale fine-tuning which is often compute intensive, requires large labelled data, and can reduce out…
▽ More
Large vision-language representation learning models like CLIP have demonstrated impressive performance for zero-shot transfer to downstream tasks while largely benefiting from inter-modal (image-text) alignment via contrastive objectives. This downstream performance can further be enhanced by full-scale fine-tuning which is often compute intensive, requires large labelled data, and can reduce out-of-distribution (OOD) robustness. Furthermore, sole reliance on inter-modal alignment might overlook the rich information embedded within each individual modality. In this work, we introduce a sample-efficient domain adaptation strategy for CLIP, termed Domain Aligned CLIP (DAC), which improves both intra-modal (image-image) and inter-modal alignment on target distributions without fine-tuning the main model. For intra-modal alignment, we introduce a lightweight adapter that is specifically trained with an intra-modal contrastive objective. To improve inter-modal alignment, we introduce a simple framework to modulate the precomputed class text embeddings. The proposed few-shot fine-tuning framework is computationally efficient, robust to distribution shifts, and does not alter CLIP's parameters. We study the effectiveness of DAC by benchmarking on 11 widely used image classification tasks with consistent improvements in 16-shot classification upon strong baselines by about 2.3% and demonstrate competitive performance on 4 OOD robustness benchmarks.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Classifying Tweet Sentiment Using the Hidden State and Attention Matrix of a Fine-tuned BERTweet Model
Authors:
Tommaso Macrì,
Freya Murphy,
Yunfan Zou,
Yves Zumbach
Abstract:
This paper introduces a study on tweet sentiment classification. Our task is to classify a tweet as either positive or negative. We approach the problem in two steps, namely embedding and classifying. Our baseline methods include several combinations of traditional embedding methods and classification algorithms. Furthermore, we explore the current state-of-the-art tweet analysis model, BERTweet,…
▽ More
This paper introduces a study on tweet sentiment classification. Our task is to classify a tweet as either positive or negative. We approach the problem in two steps, namely embedding and classifying. Our baseline methods include several combinations of traditional embedding methods and classification algorithms. Furthermore, we explore the current state-of-the-art tweet analysis model, BERTweet, and propose a novel approach in which features are engineered from the hidden states and attention matrices of the model, inspired by empirical study of the tweets. Using a multi-layer perceptron trained with a high dropout rate for classification, our proposed approach achieves a validation accuracy of 0.9111.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Segmentation of Roads in Satellite Images using specially modified U-Net CNNs
Authors:
Jonas Bokstaller,
Yihang She,
Zhehan Fu,
Tommaso Macrì
Abstract:
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision al…
▽ More
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision algorithms, convolutional neural networks (CNNs) provide accurate and reliable results on this task. Our novel approach uses a sliding window to extract patches out of the whole image, data augmentation for generating more training/testing data and lastly a series of specially modified U-Net CNNs. This proposed technique outperforms all other baselines tested in terms of mean F-score metric.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.