Search | arXiv e-print repository

Explaining CLIP's performance disparities on data from blind/low vision users

Authors: Daniela Massiceti, Camilla Longden, Agnieszka Słowik, Samuel Wills, Martin Grayson, Cecily Morrison

Abstract: Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classific… ▽ More Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images. This disparity stems from CLIP's sensitivities to 1) image content (e.g. not recognizing disability objects as well as other objects); 2) image quality (e.g. not being robust to lighting variation); and 3) text content (e.g. not recognizing objects described by tactile adjectives as well as visual ones). We delve deeper with a textual analysis of three common pre-training datasets: LAION-400M, LAION-2B and DataComp-1B, showing that disability content is rarely mentioned. We then provide three examples that illustrate how the performance disparities extend to three downstream models underpinned by CLIP: OWL-ViT, CLIPSeg and DALL-E2. We find that few-shot learning with as few as 5 images can mitigate CLIP's quality-of-service disparities for BLV users in some scenarios, which we discuss alongside a set of other possible mitigations. △ Less

Submitted 25 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Accepted at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2110.15277

A Novel Sleep Stage Classification Using CNN Generated by an Efficient Neural Architecture Search with a New Data Processing Trick

Authors: Yu Xue, Ziming Yuan, Adam Slowik

Abstract: With the development of automatic sleep stage classification (ASSC) techniques, many classical methods such as k-means, decision tree, and SVM have been used in automatic sleep stage classification. However, few methods explore deep learning on ASSC. Meanwhile, most deep learning methods require extensive expertise and suffer from a mass of handcrafted steps which are time-consuming especially whe… ▽ More With the development of automatic sleep stage classification (ASSC) techniques, many classical methods such as k-means, decision tree, and SVM have been used in automatic sleep stage classification. However, few methods explore deep learning on ASSC. Meanwhile, most deep learning methods require extensive expertise and suffer from a mass of handcrafted steps which are time-consuming especially when dealing with multi-classification tasks. In this paper, we propose an efficient five-sleep-stage classification method using convolutional neural networks (CNNs) with a novel data processing trick and we design neural architecture search (NAS) technique based on genetic algorithm (GA), NAS-G, to search for the best CNN architecture. Firstly, we attach each kernel with an adaptive coefficient to enhance the signal processing of the inputs. This can enhance the propagation of informative features and suppress the propagation of useless features in the early stage of the network. Then, we make full use of GA's heuristic search and the advantage of no need for the gradient to search for the best architecture of CNN. This can achieve a CNN with better performance than a handcrafted one in a large search space at the minimum cost. We verify the convergence of our data processing trick and compare the performance of traditional CNNs before and after using our trick. Meanwhile, we compare the performance between the CNN generated through NAS-G and the traditional CNNs with our trick. The experiments demonstrate that the convergence of CNNs with data processing trick is faster than without data processing trick and the CNN with data processing trick generated by NAS-G outperforms the handcrafted counterparts that use the data processing trick too. △ Less

Submitted 19 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: We need to withdraw this article due to a conflict of interest

arXiv:2106.09467 [pdf, other]

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

Authors: Agnieszka Słowik, Léon Bottou

Abstract: Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigati… ▽ More Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigation in machine learning, the topic leads to contentious debates on how to ensure fairness in practice (data bias versus algorithmic bias). Distributionally Robust Optimization (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. We show that neither DRO nor curating the training set should be construed as a complete solution for bias mitigation: in the same way that there is no universally robust training set, there is no universal way to setup a DRO problem and ensure a socially acceptable set of results. We then leverage these insights to provide a mininal set of practical recommendations for addressing bias with DRO. Finally, we discuss ramifications of our results in other related applications of DRO, using an example of adversarial robustness. Our results show that there is merit to both the algorithm-focused and the data-focused side of the bias debate, as long as arguments in favor of these positions are precisely qualified and backed by relevant mathematics known today. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:2102.10867 [pdf, other]

Linear unit-tests for invariance discovery

Authors: Benjamin Aubin, Agnieszka Słowik, Martin Arjovsky, Leon Bottou, David Lopez-Paz

Abstract: There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a p… ▽ More There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a precise manner. Following initial experiments, none of the three recently proposed alternatives passes all tests. By providing the code to automatically replicate all the results in this manuscript (https://www.github.com/facebookresearch/InvarianceUnitTests), we hope that our unit tests become a standard step**stone for researchers in out-of-distribution generalization. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 5 pages, Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems

arXiv:2012.02875 [pdf, other]

Inductive Bias and Language Expressivity in Emergent Communication

Authors: Shangmin Guo, Yi Ren, Agnieszka Słowik, Kory Mathewson

Abstract: Referential games and reconstruction games are the most common game types for studying emergent languages. We investigate how the type of the language game affects the emergent language in terms of: i) language compositionality and ii) transfer of an emergent language to a task different from its origin, which we refer to as language expressivity. With empirical experiments on a handcrafted symbol… ▽ More Referential games and reconstruction games are the most common game types for studying emergent languages. We investigate how the type of the language game affects the emergent language in terms of: i) language compositionality and ii) transfer of an emergent language to a task different from its origin, which we refer to as language expressivity. With empirical experiments on a handcrafted symbolic dataset, we show that languages emerged from different games have different compositionality and further different expressivity. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2010.08012 [pdf, other]

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Authors: Alex Lamb, Anirudh Goyal, Agnieszka Słowik, Michael Mozer, Philippe Beaudoin, Yoshua Bengio

Abstract: Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods whi… ▽ More Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2002.01335 [pdf, other]

Structural Inductive Biases in Emergent Communication

Authors: Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden, Christopher Pal

Abstract: In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investigate the impact of representation learning in artificial agents by develo** graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models, whic… ▽ More In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investigate the impact of representation learning in artificial agents by develo** graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models, which allows them to systematically generalize to new combinations of familiar features. △ Less

Submitted 27 July, 2021; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: The first two authors contributed equally. Poster presented at CogSci 2021

arXiv:2001.09063 [pdf, other]

Towards Graph Representation Learning in Emergent Communication

Authors: Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden

Abstract: Recent findings in neuroscience suggest that the human brain represents information in a geometric structure (for instance, through conceptual spaces). In order to communicate, we flatten the complex representation of entities and their attributes into a single word or a sentence. In this paper we use graph convolutional networks to support the evolution of language and cooperation in multi-agent… ▽ More Recent findings in neuroscience suggest that the human brain represents information in a geometric structure (for instance, through conceptual spaces). In order to communicate, we flatten the complex representation of entities and their attributes into a single word or a sentence. In this paper we use graph convolutional networks to support the evolution of language and cooperation in multi-agent systems. Motivated by an image-based referential game, we propose a graph referential game with varying degrees of complexity, and we provide strong baseline models that exhibit desirable properties in terms of language emergence and cooperation. We show that the emerged communication protocol is robust, that the agents uncover the true factors of variation in the game, and that they learn to generalize beyond the samples encountered during training. △ Less

Submitted 4 February, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: The first two authors contributed equally. Accepted at the Reinforcement Learning in Games workshop at AAAI 2020

arXiv:1910.06283 [pdf, other]

Monkey Optimization System with Active Membranes: A New Meta-heuristic Optimization System

Authors: Moustafa Zein, Aboul Ella Hassanien, Ammar Adl, Adam Slowik

Abstract: Optimization techniques, used to get the optimal solution in search spaces, have not solved the time-consuming problem. The objective of this study is to tackle the sequential processing problem in Monkey Algorithm and simulating the natural parallel behavior of monkeys. Therefore, a P system with active membranes is constructed by providing a codification for Monkey Algorithm within the context o… ▽ More Optimization techniques, used to get the optimal solution in search spaces, have not solved the time-consuming problem. The objective of this study is to tackle the sequential processing problem in Monkey Algorithm and simulating the natural parallel behavior of monkeys. Therefore, a P system with active membranes is constructed by providing a codification for Monkey Algorithm within the context of a cell-like P system, defining accordingly the elements of the model - membrane structure, objects, rules and the behavior of it. The proposed algorithm has modeled the natural behavior of climb process using separate membranes, rather than the original algorithm. Moreover, it introduced the membrane migration process to select the best solution and the time stamp was added as an additional stop** criterion to control the timing of the algorithm. The results indicate a substantial solution for the time consumption problem, significant representation of the natural behavior of monkeys, and considerable chance to reach the best solution in the context of meta-heuristics purpose. In addition, experiments use the commonly used benchmark functions to test the performance of the algorithm as well as the expected time of the proposed P Monkey optimization algorithm and the traditional Monkey Algorithm running on population size. The unit times are calculated based on the complexity of algorithms, where P Monkey takes a time unit to fire rule(s) over a population size n; as soon as, Monkey Algorithm takes a time unit to run a step every mathematical equation over a population size. △ Less

Submitted 30 September, 2019; originally announced October 2019.

arXiv:1909.09137 [pdf, other]

Bayesian Optimisation with Gaussian Processes for Premise Selection

Authors: Agnieszka Słowik, Chaitanya Mangla, Mateja Jamnik, Sean B. Holden, Lawrence C. Paulson

Abstract: Heuristics in theorem provers are often parameterised. Modern theorem provers such as Vampire utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assign… ▽ More Heuristics in theorem provers are often parameterised. Modern theorem provers such as Vampire utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assignment. In this work, we introduce a principled probablistic framework for heuristics optimisation in theorem provers. We present results using a heuristic for premise selection and The Archive of Formal Proofs (AFP) as a case study. △ Less

Submitted 18 September, 2019; originally announced September 2019.

arXiv:1909.05310 [pdf, other]

Spatial Graph Convolutional Networks

Authors: Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek Śmieja, Łukasz Struski, Agnieszka Słowik, Łukasz Maziarka

Abstract: Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remed… ▽ More Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Spatial Graph Convolutional Network (SGCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalization of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, SGCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks. △ Less

Submitted 2 July, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1907.12380 [pdf, other]

Completing partial recipes using item-based collaborative filtering to recommend ingredients

Authors: Paula Fermín Cueto, Meeke Roet, Agnieszka Słowik

Abstract: Increased public interest in healthy lifestyles has motivated the study of algorithms that encourage people to follow a healthy diet. Applying collaborative filtering to build recommendation systems in domains where only implicit feedback is available is also a rapidly growing research area. In this report we combine these two trends by develo** a recommendation system to suggest ingredients tha… ▽ More Increased public interest in healthy lifestyles has motivated the study of algorithms that encourage people to follow a healthy diet. Applying collaborative filtering to build recommendation systems in domains where only implicit feedback is available is also a rapidly growing research area. In this report we combine these two trends by develo** a recommendation system to suggest ingredients that can be added to a partial recipe. We implement the item-based collaborative filtering algorithm using a high-dimensional, sparse dataset of recipes, which inherently contains only implicit feedback. We explore the effect of different similarity measures and dimensionality reduction on the quality of the recommendations, and find that our best method achieves a recall@10 of circa 40%. △ Less

Submitted 4 February, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: The authors contributed equally

arXiv:1811.00410 [pdf, other]

Dilated DenseNets for Relational Reasoning

Authors: Antreas Antoniou, Agnieszka Słowik, Elliot J. Crowley, Amos Storkey

Abstract: Despite their impressive performance in many tasks, deep neural networks often struggle at relational reasoning. This has recently been remedied with the introduction of a plug-in relational module that considers relations between pairs of objects. Unfortunately, this is combinatorially expensive. In this extended abstract, we show that a DenseNet incorporating dilated convolutions excels at relat… ▽ More Despite their impressive performance in many tasks, deep neural networks often struggle at relational reasoning. This has recently been remedied with the introduction of a plug-in relational module that considers relations between pairs of objects. Unfortunately, this is combinatorially expensive. In this extended abstract, we show that a DenseNet incorporating dilated convolutions excels at relational reasoning on the Sort-of-CLEVR dataset, allowing us to forgo this relational module and its associated expense. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: Extended Abstract

Showing 1–13 of 13 results for author: Słowik, A