-
Position: Stop Making Unscientific AGI Performance Claims
Authors:
Patrick Altmeyer,
Andrew M. Demetriou,
Antony Bartlett,
Cynthia C. S. Liem
Abstract:
Developments in the field of Artificial Intelligence (AI), and particularly large language models (LLMs), have created a 'perfect storm' for observing 'sparks' of Artificial General Intelligence (AGI) that are spurious. Like simpler models, LLMs distill meaningful representations in their latent embeddings that have been shown to correlate with external variables. Nonetheless, the correlation of s…
▽ More
Developments in the field of Artificial Intelligence (AI), and particularly large language models (LLMs), have created a 'perfect storm' for observing 'sparks' of Artificial General Intelligence (AGI) that are spurious. Like simpler models, LLMs distill meaningful representations in their latent embeddings that have been shown to correlate with external variables. Nonetheless, the correlation of such representations has often been linked to human-like intelligence in the latter but not the former. We probe models of varying complexity including random projections, matrix decompositions, deep autoencoders and transformers: all of them successfully distill information that can be used to predict latent or external variables and yet none of them have previously been linked to AGI. We argue and empirically demonstrate that the finding of meaningful patterns in latent spaces of models cannot be seen as evidence in favor of AGI. Additionally, we review literature from the social sciences that shows that humans are prone to seek such patterns and anthropomorphize. We conclude that both the methodological setup and common public image of AI are ideal for the misinterpretation that correlations between model representations and some variables of interest are 'caused' by the model's understanding of underlying 'ground truth' relationships. We, therefore, call for the academic community to exercise extra caution, and to be keenly aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.
△ Less
Submitted 31 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals
Authors:
Patrick Altmeyer,
Mojtaba Farmanbar,
Arie van Deursen,
Cynthia C. S. Liem
Abstract:
Counterfactual explanations offer an intuitive and straightforward way to explain black-box models and offer algorithmic recourse to individuals. To address the need for plausible explanations, existing work has primarily relied on surrogate models to learn how the input data is distributed. This effectively reallocates the task of learning realistic explanations for the data from the model itself…
▽ More
Counterfactual explanations offer an intuitive and straightforward way to explain black-box models and offer algorithmic recourse to individuals. To address the need for plausible explanations, existing work has primarily relied on surrogate models to learn how the input data is distributed. This effectively reallocates the task of learning realistic explanations for the data from the model itself to the surrogate. Consequently, the generated explanations may seem plausible to humans but need not necessarily describe the behaviour of the black-box model faithfully. We formalise this notion of faithfulness through the introduction of a tailored evaluation metric and propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits. Through extensive empirical studies, we demonstrate that ECCCo reconciles the need for faithfulness and plausibility. In particular, we show that for models with gradient access, it is possible to achieve state-of-the-art performance without the need for surrogate models. To do so, our framework relies solely on properties defining the black-box model itself by leveraging recent advances in energy-based modelling and conformal prediction. To our knowledge, this is the first venture in this direction for generating faithful counterfactual explanations. Thus, we anticipate that ECCCo can serve as a baseline for future research. We believe that our work opens avenues for researchers and practitioners seeking tools to better distinguish trustworthy from unreliable models.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
The Biased Journey of MSD_AUDIO.ZIP
Authors:
Haven Kim,
Keunwoo Choi,
Mateusz Modrzejewski,
Cynthia C. S. Liem
Abstract:
The equitable distribution of academic data is crucial for ensuring equal research opportunities, and ultimately further progress. Yet, due to the complexity of using the API for audio data that corresponds to the Million Song Dataset along with its misreporting (before 2016) and the discontinuation of this API (after 2016), access to this data has become restricted to those within certain affilia…
▽ More
The equitable distribution of academic data is crucial for ensuring equal research opportunities, and ultimately further progress. Yet, due to the complexity of using the API for audio data that corresponds to the Million Song Dataset along with its misreporting (before 2016) and the discontinuation of this API (after 2016), access to this data has become restricted to those within certain affiliations that are connected peer-to-peer. In this paper, we delve into this issue, drawing insights from the experiences of 22 individuals who either attempted to access the data or played a role in its creation. With this, we hope to initiate more critical dialogue and more thoughtful consideration with regard to access privilege in the MIR community.
△ Less
Submitted 1 December, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Endogenous Macrodynamics in Algorithmic Recourse
Authors:
Patrick Altmeyer,
Giovan Angela,
Aleksander Buszydlik,
Karol Dobiczek,
Arie van Deursen,
Cynthia C. S. Liem
Abstract:
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research chal…
▽ More
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work, we aim to close that gap. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various state-of the-art counterfactual generators and several benchmark datasets, we generate large numbers of counterfactuals and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability of Algorithmic Recourse in some situations. Fortunately, we find various strategies to mitigate these concerns. Our simulation framework for studying recourse dynamics is fast and opensourced.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Explaining Black-Box Models through Counterfactuals
Authors:
Patrick Altmeyer,
Arie van Deursen,
Cynthia C. S. Liem
Abstract:
We present CounterfactualExplanations.jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia. CE explain how inputs into a model need to change to yield specific model predictions. Explanations that involve realistic and actionable changes can be used to provide AR: a set of proposed actions for individuals to change an undesirable…
▽ More
We present CounterfactualExplanations.jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia. CE explain how inputs into a model need to change to yield specific model predictions. Explanations that involve realistic and actionable changes can be used to provide AR: a set of proposed actions for individuals to change an undesirable outcome for the better. In this article, we discuss the usefulness of CE for Explainable Artificial Intelligence and demonstrate the functionality of our package. The package is straightforward to use and designed with a focus on customization and extensibility. We envision it to one day be the go-to place for explaining arbitrary predictive models in Julia through a diverse suite of counterfactual generators.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Registered Report : Perception of Other's Musical Preferences Based on Their Personal Values
Authors:
Sandy Manolios,
Catholijn M. Jonker,
Cynthia C. S. Liem
Abstract:
The present work is part of a research line seeking to uncover the mysteries of what lies behind people's musical preferences in order to provide better music recommendations. More specifically, it takes the angle of personal values. Personal values are what we as people strive for, and are a popular tool in marketing research to understand customer preferences for certain types of product. Theref…
▽ More
The present work is part of a research line seeking to uncover the mysteries of what lies behind people's musical preferences in order to provide better music recommendations. More specifically, it takes the angle of personal values. Personal values are what we as people strive for, and are a popular tool in marketing research to understand customer preferences for certain types of product. Therefore, it makes sense to explore their usefulness in the music domain. Based on a previous qualitative work using the Means-End theory, we designed a survey in an attempt to more quantitatively approach the relationship between personal values and musical preferences. We support our approach with a simulation study as a tool to improve the experimental procedure and decisions.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Treat societally impactful scientific insights as open-source software artifacts
Authors:
Cynthia C. S. Liem,
Andrew M. Demetriou
Abstract:
So far, the relationship between open science and software engineering expertise has largely focused on the open release of software engineering research insights and reproducible artifacts, in the form of open-access papers, open data, and open-source tools and libraries. In this position paper, we draw attention to another perspective: scientific insight itself is a complex and collaborative art…
▽ More
So far, the relationship between open science and software engineering expertise has largely focused on the open release of software engineering research insights and reproducible artifacts, in the form of open-access papers, open data, and open-source tools and libraries. In this position paper, we draw attention to another perspective: scientific insight itself is a complex and collaborative artifact under continuous development and in need of continuous quality assurance, and as such, has many parallels to software artifacts. Considering current calls for more open, collaborative and reproducible science; increasing demands for public accountability on matters of scientific integrity and credibility; methodological challenges coming with transdisciplinary science; political and communication tensions when scientific insight on societally relevant topics is to be translated to policy; and struggles to incentivize and reward academics who truly want to move into these directions beyond traditional publishing habits and cultures, we make the parallels between the emerging open science requirements and concepts already well-known in (open-source) software engineering research more explicit. We argue that the societal impact of software engineering expertise can reach far beyond the software engineering research community, and call upon the community members to pro-actively help driving the necessary systems and cultural changes towards more open and accountable research.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
Hidden Author Bias in Book Recommendation
Authors:
Savvina Daniil,
Mirjam Cuper,
Cynthia C. S. Liem,
Jacco van Ossenbruggen,
Laura Hollink
Abstract:
Collaborative filtering algorithms have the advantage of not requiring sensitive user or item information to provide recommendations. However, they still suffer from fairness related issues, like popularity bias. In this work, we argue that popularity bias often leads to other biases that are not obvious when additional user or item information is not provided to the researcher. We examine our hyp…
▽ More
Collaborative filtering algorithms have the advantage of not requiring sensitive user or item information to provide recommendations. However, they still suffer from fairness related issues, like popularity bias. In this work, we argue that popularity bias often leads to other biases that are not obvious when additional user or item information is not provided to the researcher. We examine our hypothesis in the book recommendation case on a commonly used dataset with book ratings. We enrich it with author information using publicly available external sources. We find that popular books are mainly written by US citizens in the dataset, and that these books tend to be recommended disproportionally by popular collaborative filtering algorithms compared to the users' profiles. We conclude that the societal implications of popularity bias should be further examined by the scholar community.
△ Less
Submitted 8 September, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Social Inclusion in Curated Contexts: Insights from Museum Practices
Authors:
Han-Yin Huang,
Cynthia C. S. Liem
Abstract:
Artificial intelligence literature suggests that minority and fragile communities in society can be negatively impacted by machine learning algorithms due to inherent biases in the design process, which lead to socially exclusive decisions and policies. Faced with similar challenges in dealing with an increasingly diversified audience, the museum sector has seen changes in theory and practice, par…
▽ More
Artificial intelligence literature suggests that minority and fragile communities in society can be negatively impacted by machine learning algorithms due to inherent biases in the design process, which lead to socially exclusive decisions and policies. Faced with similar challenges in dealing with an increasingly diversified audience, the museum sector has seen changes in theory and practice, particularly in the areas of representation and meaning-making. While rarity and grandeur used to be at the centre stage of the early museum practices, folk life and museums' relationships with the diverse communities they serve become a widely integrated part of the contemporary practices. These changes address issues of diversity and accessibility in order to offer more socially inclusive services. Drawing on these changes and reflecting back on the AI world, we argue that the museum experience provides useful lessons for building AI with socially inclusive approaches, especially in situations in which both a collection and access to it will need to be curated or filtered, as frequently happens in search engines, recommender systems and digital libraries. We highlight three principles: (1) Instead of upholding the value of neutrality, practitioners are aware of the influences of their own backgrounds and those of others on their work. By not claiming to be neutral but practising cultural humility, the chances of addressing potential biases can be increased. (2) There should be room for situational interpretation beyond the stages of data collection and machine learning. Before applying models and predictions, the contexts in which relevant parties exist should be taken into account. (3) Community participation serves the needs of communities and has the added benefit of bringing practitioners and communities together.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection
Authors:
Annibale Panichella,
Cynthia C. S. Liem
Abstract:
Mutation testing is a well-established technique for assessing a test suite's quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep learning (DL) in particular; researchers have proposed approaches, tools, and statistically sound heuristics to determine whether mutants in DL systems are killed or…
▽ More
Mutation testing is a well-established technique for assessing a test suite's quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep learning (DL) in particular; researchers have proposed approaches, tools, and statistically sound heuristics to determine whether mutants in DL systems are killed or not. However, as we will argue in this work, questions can be raised to what extent currently used mutation testing techniques in DL are actually in line with the classical interpretation of mutation testing. We observe that ML model development resembles a test-driven development (TDD) process, in which a training algorithm (`programmer') generates a model (program) that fits the data points (test data) to labels (implicit assertions), up to a certain threshold. However, considering proposed mutation testing techniques for ML systems under this TDD metaphor, in current approaches, the distinction between production and test code is blurry, and the realism of mutation operators can be challenged. We also consider the fundamental hypotheses underlying classical mutation testing: the competent programmer hypothesis and coupling effect hypothesis. As we will illustrate, these hypotheses do not trivially translate to ML system development, and more conscious and explicit sco** and concept map** will be needed to truly draw parallels. Based on our observations, we propose several action points for better alignment of mutation testing techniques for ML with paradigms and vocabularies of classical mutation testing.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Run, Forest, Run? On Randomization and Reproducibility in Predictive Software Engineering
Authors:
Cynthia C. S. Liem,
Annibale Panichella
Abstract:
Machine learning (ML) has been widely used in the literature to automate software engineering tasks. However, ML outcomes may be sensitive to randomization in data sampling mechanisms and learning procedures. To understand whether and how researchers in SE address these threats, we surveyed 45 recent papers related to three predictive tasks: defect prediction (DP), predictive mutation testing (PMT…
▽ More
Machine learning (ML) has been widely used in the literature to automate software engineering tasks. However, ML outcomes may be sensitive to randomization in data sampling mechanisms and learning procedures. To understand whether and how researchers in SE address these threats, we surveyed 45 recent papers related to three predictive tasks: defect prediction (DP), predictive mutation testing (PMT), and code smell detection (CSD). We found that less than 50% of the surveyed papers address the threats related to randomized data sampling (via multiple repetitions); only 8% of the papers address the random nature of ML; and parameter values are rarely reported (only 18% of the papers). To assess the severity of these threats, we conducted an empirical study using 26 real-world datasets commonly considered for the three predictive tasks of interest, considering eight common supervised ML classifiers. We show that different data resamplings for 10-fold cross-validation lead to extreme variability in observed performance results. Furthermore, randomized ML methods also show non-negligible variability for different choices of random seeds. More worryingly, performance and variability are inconsistent for different implementations of the conceptually same ML method in different libraries, as also shown through multi-dataset pairwise comparison. To cope with these critical threats, we provide practical guidelines on how to validate, assess, and report the results of predictive methods.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
ReproducedPapers.org: Openly teaching and structuring machine learning reproducibility
Authors:
Burak Yildiz,
Hayley Hung,
Jesse H. Krijthe,
Cynthia C. S. Liem,
Marco Loog,
Gosia Migut,
Frans Oliehoek,
Annibale Panichella,
Przemyslaw Pawelczak,
Stjepan Picek,
Mathijs de Weerdt,
Jan van Gemert
Abstract:
We present ReproducedPapers.org: an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest that students who do a reproduction project place more val…
▽ More
We present ReproducedPapers.org: an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest that students who do a reproduction project place more value on scientific reproductions and become more critical thinkers. Students and AI researchers agree that our online reproduction repository is valuable.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Are Nearby Neighbors Relatives?: Testing Deep Music Embeddings
Authors:
Jaehun Kim,
Julián Urbano,
Cynthia C. S. Liem,
Alan Hanjalic
Abstract:
Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the…
▽ More
Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the data, yet not actually meaningful or interpretable. In this paper, we therefore propose a systematic way to test the trustworthiness of deep music representations, considering musical semantics. The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space. We generate known related points through semantically meaningful transformations, both considering imperceptible and graver transformations. Then, we examine within- and between-space distance consistencies, both considering audio space and latent embedded space, the latter either being a result of a conventional feature extractor or a deep encoder. We illustrate how our method, as a complement to task-specific performance, provides interpretable insight into what a network may have captured from training data signals.
△ Less
Submitted 17 October, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Transfer Learning of Artist Group Factors to Musical Genre Classification
Authors:
Jaehun Kim,
Minz Won,
Xavier Serra,
Cynthia C. S. Liem
Abstract:
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we prop…
▽ More
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.
△ Less
Submitted 12 January, 2019; v1 submitted 5 May, 2018;
originally announced May 2018.
-
One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies
Authors:
Jaehun Kim,
Julián Urbano,
Cynthia C. S. Liem,
Alan Hanjalic
Abstract:
Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output o…
▽ More
Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, their representation is unlikely to be informative for all possible future tasks. In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.
△ Less
Submitted 11 February, 2019; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos
Authors:
Hugo Jair Escalante,
Heysem Kaya,
Albert Ali Salah,
Sergio Escalera,
Yagmur Gucluturk,
Umut Guclu,
Xavier Baro,
Isabelle Guyon,
Julio Jacques Junior,
Meysam Madadi,
Stephane Ayache,
Evelyne Viegas,
Furkan Gurpinar,
Achmadnoer Sukma Wicaksana,
Cynthia C. S. Liem,
Marcel A. J. van Gerven,
Rob van Lier
Abstract:
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in…
▽ More
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, the evaluation protocol, and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.
△ Less
Submitted 28 September, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Authors:
A. Bazzica,
J. C. van Gemert,
C. C. S. Liem,
A. Hanjalic
Abstract:
Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world…
▽ More
Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world clarinetist videos and carry out preliminary experiments on a 3D convolutional neural network based on multiple streams and purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5 hours of clarinetist videos together with cleaned annotations which include about 36,000 onsets and the coordinates for a number of salient points and regions of interest. By performing several training trials on our dataset, we learned that the problem is challenging. We found that the CNN model is highly sensitive to the optimization algorithm and hyper-parameters, and that treating the problem as binary classification may prevent the joint optimization of precision and recall. To encourage further research, we publicly share our dataset, annotations and all models and detail which issues we came across during our preliminary experiments.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.