Skip to main content

Showing 1–14 of 14 results for author: Massiceti, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04236  [pdf, other

    cs.CV

    Understanding Information Storage and Transfer in Multi-modal Large Language Models

    Authors: Samyadeep Basu, Martin Grayson, Cecily Morrison, Besmira Nushi, Soheil Feizi, Daniela Massiceti

    Abstract: Understanding the mechanisms of information storage and transfer in Transformer-based models is important for driving model understanding progress. Recent work has studied these mechanisms for Large Language Models (LLMs), revealing insights on how information is stored in a model's parameters and how information flows to and from these parameters in response to specific prompts. However, these st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages

  2. arXiv:2311.17315  [pdf, other

    cs.CV

    Explaining CLIP's performance disparities on data from blind/low vision users

    Authors: Daniela Massiceti, Camilla Longden, Agnieszka SÅ‚owik, Samuel Wills, Martin Grayson, Cecily Morrison

    Abstract: Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classific… ▽ More

    Submitted 25 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  3. arXiv:2310.02426  [pdf, other

    cs.CV

    EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

    Authors: Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi

    Abstract: A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark fo… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2308.02866  [pdf, other

    cs.CV cs.LG

    NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

    Authors: Jianfeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz

    Abstract: Semi-supervised semantic segmentation involves assigning pixel-wise labels to unlabeled images at training time. This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost. Current approaches to semi-supervised semantic segmentation work by predicting pseudo-labels for each pixel from a class-wise probability distribution output by… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Appear at ICML2023. Source codes are available at: https://github.com/Jianf-Wang/NP-SemiSeg

  5. arXiv:2307.09233  [pdf, other

    cs.CV

    Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP

    Authors: Samyadeep Basu, Shell Xu Hu, Maziar Sanjabi, Daniela Massiceti, Soheil Feizi

    Abstract: Image-text contrastive models like CLIP have wide applications in zero-shot classification, image-text retrieval, and transfer learning. However, they often struggle on compositional visio-linguistic tasks (e.g., attribute-binding or object-relationships) where their performance is no better than random chance. To address this, we introduce SDS-CLIP, a lightweight and sample-efficient distillation… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Short paper

  6. arXiv:2304.01917  [pdf, other

    cs.CV

    Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

    Authors: Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi

    Abstract: Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the d… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  7. arXiv:2207.01066  [pdf, other

    cs.LG cs.CV

    NP-Match: When Neural Processes meet Semi-Supervised Learning

    Authors: Jianfeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou

    Abstract: Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022. The source codes are at https://github.com/Jianf-Wang/NP-Match

  8. arXiv:2107.01105  [pdf, other

    stat.ML cs.LG

    Memory Efficient Meta-Learning with Large Images

    Authors: John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner

    Abstract: Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing th… ▽ More

    Submitted 26 October, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  9. ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition

    Authors: Daniela Massiceti, Luisa Zintgraf, John Bronskill, Lida Theodorou, Matthew Tobias Harris, Edward Cutrell, Cecily Morrison, Katja Hofmann, Simone Stumpf

    Abstract: Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variatio… ▽ More

    Submitted 8 October, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  10. arXiv:2004.09272  [pdf, other

    cs.CV cs.CL

    A Revised Generative Evaluation of Visual Dialogue

    Authors: Daniela Massiceti, Viveka Kulharia, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

    Abstract: Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge. The current evaluation scheme of the VisDial dataset computes the ranks of ground-truth answers in predefined candidate sets, which Massiceti et al. (2018) show can be susceptible to the exploitation of dataset biases. This scheme also does little to account for… ▽ More

    Submitted 24 April, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: 16 pages, 5 figures

  11. arXiv:1812.06417  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Dialogue without Vision or Dialogue

    Authors: Daniela Massiceti, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

    Abstract: We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli. To do so, we develop an embarrassingly simple method based on Canonical Correlation Analysis (CCA) that, on the standard dataset, achieves near state-of-the-art performance on mean ra… ▽ More

    Submitted 22 October, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

    Comments: 2018 NeurIPS Workshop on Critiquing and Correcting Trends in Machine Learning

  12. arXiv:1802.03803  [pdf, other

    cs.CV

    FlipDial: A Generative Model for Two-Way Visual Dialogue

    Authors: Daniela Massiceti, N. Siddharth, Puneet K. Dokania, Philip H. S. Torr

    Abstract: We present FlipDial, a generative model for visual dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FlipDial learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which… ▽ More

    Submitted 3 April, 2018; v1 submitted 11 February, 2018; originally announced February 2018.

  13. arXiv:1612.02101  [pdf, other

    cs.CV

    Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation

    Authors: Qinbin Hou, Puneet Kumar Dokania, Daniela Massiceti, Yunchao Wei, Ming-Ming Cheng, Philip Torr

    Abstract: We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels which specify the object classes present in the image. Our method uses deep convolutional neural networks (CNNs) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-s… ▽ More

    Submitted 9 April, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

  14. arXiv:1609.05797  [pdf, other

    cs.CV cs.RO

    Random Forests versus Neural Networks - What's Best for Camera Localization?

    Authors: Daniela Massiceti, Alexander Krull, Eric Brachmann, Carsten Rother, Philip H. S. Torr

    Abstract: This work addresses the task of camera localization in a known 3D scene given a single input RGB image. State-of-the-art approaches accomplish this in two steps: firstly, regressing for every pixel in the image its 3D scene coordinate and subsequently, using these coordinates to estimate the final 6D camera pose via RANSAC. To solve the first step, Random Forests (RFs) are typically used. On the o… ▽ More

    Submitted 13 July, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

    Comments: 8 pages, 4 figures