-
An Invitation to Universality in Physics, Computer Science, and Beyond
Authors:
Tomáš Gonda,
Gemma De les Coves
Abstract:
A universal Turing machine is a powerful concept - a single device can compute any function that is computable. A universal spin model, similarly, is a class of physical systems whose low energy behavior simulates that of any spin system. Our categorical framework for universality (arXiv:2307.06851) captures these and other examples of universality as instances. In this article, we present an acce…
▽ More
A universal Turing machine is a powerful concept - a single device can compute any function that is computable. A universal spin model, similarly, is a class of physical systems whose low energy behavior simulates that of any spin system. Our categorical framework for universality (arXiv:2307.06851) captures these and other examples of universality as instances. In this article, we present an accessible account thereof with a focus on its basic ingredients and ways to use it. Specifically, we show how to identify necessary conditions for universality, compare types of universality within each instance, and establish that universality and negation give rise to unreachability (such as uncomputability).
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Stochastic Optimisation Framework using the Core Imaging Library and Synergistic Image Reconstruction Framework for PET Reconstruction
Authors:
Evangelos Papoutsellis,
Casper da Costa-Luis,
Daniel Deidda,
Claire Delplancke,
Margaret Duff,
Gemma Fardell,
Ashley Gillman,
Jakob S. Jørgensen,
Zeljko Kereta,
Evgueni Ovtchinnikov,
Edoardo Pasca,
Georg Schramm,
Kris Thielemans
Abstract:
We introduce a stochastic framework into the open--source Core Imaging Library (CIL) which enables easy development of stochastic algorithms. Five such algorithms from the literature are developed, Stochastic Gradient Descent, Stochastic Average Gradient (-Amélioré), (Loopless) Stochastic Variance Reduced Gradient. We showcase the functionality of the framework with a comparative study against a d…
▽ More
We introduce a stochastic framework into the open--source Core Imaging Library (CIL) which enables easy development of stochastic algorithms. Five such algorithms from the literature are developed, Stochastic Gradient Descent, Stochastic Average Gradient (-Amélioré), (Loopless) Stochastic Variance Reduced Gradient. We showcase the functionality of the framework with a comparative study against a deterministic algorithm on a simulated 2D PET dataset, with the use of the open-source Synergistic Image Reconstruction Framework. We observe that stochastic optimisation methods can converge in fewer passes of the data than a standard deterministic algorithm.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Authors:
Martina G. Vilas,
Federico Adolfi,
David Poeppel,
Gemma Roig
Abstract:
Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field:…
▽ More
Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems
Authors:
Miquel Miró-Nicolau,
Gabriel Moyà-Alcover,
Antoni Jaume-i-Capó,
Manuel González-Hidalgo,
Maria Gemma Sempere Campello,
Juan Antonio Palmer Sancho
Abstract:
The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI…
▽ More
The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted a case study in a realistic medical scenario: the usage of XAI system for the detection of pneumonia from x-ray images.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Learning Object Semantic Similarity with Self-Supervision
Authors:
Arthur Aubret,
Timothy Schaumlöffel,
Gemma Roig,
Jochen Triesch
Abstract:
Humans judge the similarity of two objects not just based on their visual appearance but also based on their semantic relatedness. However, it remains unclear how humans learn about semantic relationships between objects and categories. One important source of semantic knowledge is that semantically related objects frequently co-occur in the same context. For instance, forks and plates are perceiv…
▽ More
Humans judge the similarity of two objects not just based on their visual appearance but also based on their semantic relatedness. However, it remains unclear how humans learn about semantic relationships between objects and categories. One important source of semantic knowledge is that semantically related objects frequently co-occur in the same context. For instance, forks and plates are perceived as similar, at least in part, because they are often experienced together in a ``kitchen" or ``eating'' context. Here, we investigate whether a bio-inspired learning principle exploiting such co-occurrence statistics suffices to learn a semantically structured object representation {\em de novo} from raw visual or combined visual and linguistic input. To this end, we simulate temporal sequences of visual experience by binding together short video clips of real-world scenes showing objects in different contexts. A bio-inspired neural network model aligns close-in-time visual representations while also aligning visual and category label representations to simulate visuo-language alignment. Our results show that our model clusters object representations based on their context, e.g. kitchen or bedroom, in particular in high-level layers of the network, akin to humans. In contrast, lower-level layers tend to better reflect object identity or category. To achieve this, the model exploits two distinct strategies: the visuo-language alignment ensures that different objects of the same category are represented similarly, whereas the temporal alignment leverages that objects from the same context are frequently seen in succession to make their representations more similar. Overall, our work suggests temporal and visuo-language alignment as plausible computational principles for explaining the origins of certain forms of semantic knowledge in humans.
△ Less
Submitted 19 April, 2024;
originally announced May 2024.
-
Positive Moments Forever: Undecidable and Decidable Cases
Authors:
Gemma De les Coves,
Joshua Graf,
Andreas Klingler,
Tim Netzer
Abstract:
Is there an algorithm to determine attributes such as positivity or non-zeroness of linear recurrence sequences? This long-standing question is known as Skolem's problem. In this paper, we study the complexity of an equivalent problem, namely the (generalized) moment membership problem for matrices. We show that this problem is decidable for orthogonal, unitary and real eigenvalue matrices, and un…
▽ More
Is there an algorithm to determine attributes such as positivity or non-zeroness of linear recurrence sequences? This long-standing question is known as Skolem's problem. In this paper, we study the complexity of an equivalent problem, namely the (generalized) moment membership problem for matrices. We show that this problem is decidable for orthogonal, unitary and real eigenvalue matrices, and undecidable for matrices over certain commutative and non-commutative polynomial rings. Our results imply that the positivity problem for simple unitary linear recurrence sequences is decidable, and is undecidable for linear recurrence sequences over the ring of commutative polynomials. As a byproduct, we prove a free version of Polya's theorem.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A process mining-based error correction approach to improve data quality of an IoT-sourced event log
Authors:
Mohsen Shirali,
Zahra Ahmadi,
Carlos Fernández-Llatas,
Jose-Luis Bayo-Monton,
Gemma Di Federico
Abstract:
Internet of Things (IoT) systems are vulnerable to data collection errors and these errors can significantly degrade the quality of collected data, impact data analysis and lead to inaccurate or distorted results. This article emphasizes the importance of evaluating data quality and errors before proceeding with analysis and considering the effectiveness of error correction methods for a smart hom…
▽ More
Internet of Things (IoT) systems are vulnerable to data collection errors and these errors can significantly degrade the quality of collected data, impact data analysis and lead to inaccurate or distorted results. This article emphasizes the importance of evaluating data quality and errors before proceeding with analysis and considering the effectiveness of error correction methods for a smart home use case.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Twisted conjugacy in dihedral Artin groups II: Baumslag Solitar groups $\mathrm{BS}(n,n)$
Authors:
Gemma Crowe
Abstract:
In this second paper we solve the twisted conjugacy problem for even dihedral Artin groups, that is, groups with presentation $G(m) = \langle a,b \mid {}_{m}(a,b) = {}_{m}(b,a) \rangle$, where $m \geq 2$ is even, and $_{m}(a,b)$ is the word $abab\dots$ of length $m$. Similar to odd dihedral Artin groups, we prove orbit decidability for all subgroups $A \leq \mathrm{Aut}(G(m))$, which then implies…
▽ More
In this second paper we solve the twisted conjugacy problem for even dihedral Artin groups, that is, groups with presentation $G(m) = \langle a,b \mid {}_{m}(a,b) = {}_{m}(b,a) \rangle$, where $m \geq 2$ is even, and $_{m}(a,b)$ is the word $abab\dots$ of length $m$. Similar to odd dihedral Artin groups, we prove orbit decidability for all subgroups $A \leq \mathrm{Aut}(G(m))$, which then implies that the conjugacy problem is solvable in extensions of even dihedral Artin groups.
△ Less
Submitted 10 May, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Twisted conjugacy in dihedral Artin groups I: Torus Knot groups
Authors:
Gemma Crowe
Abstract:
In this paper we provide an alternative solution to a result by Juhász that the twisted conjugacy problem for odd dihedral Artin groups is solvable, that is, groups with presentation $G(m) = \langle a,b \; | \; _{m}(a,b) = {}_{m}(b,a) \rangle$, where $m\geq 3$ is odd, and $_{m}(a,b)$ is the word $abab \dots$ of length $m$, is solvable. Our solution provides an implementable linear time algorithm,…
▽ More
In this paper we provide an alternative solution to a result by Juhász that the twisted conjugacy problem for odd dihedral Artin groups is solvable, that is, groups with presentation $G(m) = \langle a,b \; | \; _{m}(a,b) = {}_{m}(b,a) \rangle$, where $m\geq 3$ is odd, and $_{m}(a,b)$ is the word $abab \dots$ of length $m$, is solvable. Our solution provides an implementable linear time algorithm, by considering an alternative group presentation to that of a torus knot group, and working with geodesic normal forms. An application of this result is that the conjugacy problem is solvable in extensions of odd dihedral Artin groups.
△ Less
Submitted 10 May, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Gemma: Open Models Based on Gemini Research and Technology
Authors:
Gemma Team,
Thomas Mesnard,
Cassidy Hardin,
Robert Dadashi,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti,
Léonard Hussenot,
Pier Giuseppe Sessa,
Aakanksha Chowdhery,
Adam Roberts,
Aditya Barua,
Alex Botev,
Alex Castro-Ros,
Ambrose Slone,
Amélie Héliou,
Andrea Tacchetti,
Anna Bulanova,
Antonia Paterson,
Beth Tsai,
Bobak Shahriari
, et al. (83 additional authors not shown)
Abstract:
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge…
▽ More
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.
△ Less
Submitted 16 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Attack Tree Generation via Process Mining
Authors:
Alyzia-Maria Konsta,
Gemma Di Federico,
Alberto Lluch Lafuente,
Andrea Burattin
Abstract:
Attack Trees are a graphical model of security used to study threat scenarios. While visually appealing and supported by solid theories and effective tools, one of their main drawbacks remains the amount of effort required by security experts to design them from scratch. This work aims to remedy this by providing a method for the automatic generation of Attack Trees from attack logs. The main orig…
▽ More
Attack Trees are a graphical model of security used to study threat scenarios. While visually appealing and supported by solid theories and effective tools, one of their main drawbacks remains the amount of effort required by security experts to design them from scratch. This work aims to remedy this by providing a method for the automatic generation of Attack Trees from attack logs. The main original feature of our approach w.r.t existing ones is the use of Process Mining algorithms to synthesize Attack Trees, which allow users to customize the way a set of logs are summarized as an Attack Tree, for example by discarding statistically irrelevant events. Our approach is supported by a prototype that, apart from the derivation and translation of the model, provides the user with an Attack Tree in the RisQFLan format, a tool used for quantitative risk modeling and analysis with Attack Trees. We illustrate our approach with the case study of attacks on a communication protocol, produced by a state-of-the-art protocol analyzer.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Different Algorithms (Might) Uncover Different Patterns: A Brain-Age Prediction Case Study
Authors:
Tobias Ettling,
Sari Saba-Sadiya,
Gemma Roig
Abstract:
Machine learning is a rapidly evolving field with a wide range of applications, including biological signal analysis, where novel algorithms often improve the state-of-the-art. However, robustness to algorithmic variability - measured by different algorithms, consistently uncovering similar findings - is seldom explored. In this paper we investigate whether established hypotheses in brain-age pred…
▽ More
Machine learning is a rapidly evolving field with a wide range of applications, including biological signal analysis, where novel algorithms often improve the state-of-the-art. However, robustness to algorithmic variability - measured by different algorithms, consistently uncovering similar findings - is seldom explored. In this paper we investigate whether established hypotheses in brain-age prediction from EEG research validate across algorithms. First, we surveyed literature and identified various features known to be informative for brain-age prediction. We employed diverse feature extraction techniques, processing steps, and models, and utilized the interpretative power of SHapley Additive exPlanations (SHAP) values to align our findings with the existing research in the field. Few of our models achieved state-of-the-art performance on the specific data-set we utilized. Moreover, analysis demonstrated that while most models do uncover similar patterns in the EEG signals, some variability could still be observed. Finally, a few prominent findings could only be validated using specific models. We conclude by suggesting remedies to the potential implications of this lack of robustness to model variability.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Non-Consensual Synthetic Intimate Imagery: Prevalence, Attitudes, and Knowledge in 10 Countries
Authors:
Rebecca Umbach,
Nicola Henry,
Gemma Beard,
Colleen Berryessa
Abstract:
Deepfake technologies have become ubiquitous, "democratizing" the ability to manipulate photos and videos. One popular use of deepfake technology is the creation of sexually explicit content, which can then be posted and shared widely on the internet. Drawing on a survey of over 16,000 respondents in 10 different countries, this article examines attitudes and behaviors related to "deepfake pornogr…
▽ More
Deepfake technologies have become ubiquitous, "democratizing" the ability to manipulate photos and videos. One popular use of deepfake technology is the creation of sexually explicit content, which can then be posted and shared widely on the internet. Drawing on a survey of over 16,000 respondents in 10 different countries, this article examines attitudes and behaviors related to "deepfake pornography" as a specific form of non-consensual synthetic intimate imagery (NSII). Our study found that deepfake pornography behaviors were considered harmful by respondents, despite nascent societal awareness. Regarding the prevalence of deepfake porn victimization and perpetration, 2.2% of all respondents indicated personal victimization, and 1.8% all of respondents indicated perpetration behaviors. Respondents from countries with specific legislation still reported perpetration and victimization experiences, suggesting NSII laws are inadequate to deter perpetration. Approaches to prevent and reduce harms may include digital literacy education, as well as enforced platform policies, practices, and tools which better detect, prevent, and respond to NSII content.
△ Less
Submitted 13 February, 2024; v1 submitted 26 January, 2024;
originally announced February 2024.
-
Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information
Authors:
Giuseppe Cascavilla,
Gemma Catolino,
Mirella Sangiovanni
Abstract:
This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERT…
▽ More
This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly optimized BERT approach) with a traditional text classification approach like LSTM (Long short-term memory) neural networks. Finally, we developed two illegal activities classification approaches, one for illicit content on the Dark Web and one for identifying the specific types of drugs. Results show that Bert obtained the best approach, classifying the dark web's general content and the types of Drugs with 96.08% and 91.98% of accuracy.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Caregiver Talk Shapes Toddler Vision: A Computational Study of Dyadic Play
Authors:
Timothy Schaumlöffel,
Arthur Aubret,
Gemma Roig,
Jochen Triesch
Abstract:
Infants' ability to recognize and categorize objects develops gradually. The second year of life is marked by both the emergence of more semantic visual representations and a better understanding of word meaning. This suggests that language input may play an important role in sha** visual representations. However, even in suitable contexts for word learning like dyadic play sessions, caregivers…
▽ More
Infants' ability to recognize and categorize objects develops gradually. The second year of life is marked by both the emergence of more semantic visual representations and a better understanding of word meaning. This suggests that language input may play an important role in sha** visual representations. However, even in suitable contexts for word learning like dyadic play sessions, caregivers utterances are sparse and ambiguous, often referring to objects that are different from the one to which the child attends. Here, we systematically investigate to what extent caregivers' utterances can nevertheless enhance visual representations. For this we propose a computational model of visual representation learning during dyadic play. We introduce a synthetic dataset of ego-centric images perceived by a toddler-agent that moves and rotates toy objects in different parts of its home environment while hearing caregivers' utterances, modeled as captions. We propose to model toddlers' learning as simultaneously aligning representations for 1) close-in-time images and 2) co-occurring images and utterances. We show that utterances with statistics matching those of real caregivers give rise to representations supporting improved category recognition. Our analysis reveals that a small decrease/increase in object-relevant naming frequencies can drastically impact the learned representations. This affects the attention on object names within an utterance, which is required for efficient visuo-linguistic alignment. Overall, our results support the hypothesis that caregivers' naming utterances can improve toddlers' visual representations.
△ Less
Submitted 17 January, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese
Authors:
Yunke He,
Xixian Liao,
Jialing Liang,
Gemma Boleda
Abstract:
Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the de…
▽ More
Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the degree of naming variation it triggers across subjects. We propose that familiarity influences naming variation in two competing ways: increasing familiarity can either expand vocabulary, leading to higher variation, or promote convergence on conventional names, thereby reducing variation. We find evidence for both factors being at play. Our study illustrates how computational resources can be used to address research questions in Cognitive Science.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization
Authors:
Prathmesh Bele,
Valay Bundele,
Avigyan Bhattacharya,
Ankit Jha,
Gemma Roig,
Biplab Banerjee
Abstract:
Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify ope…
▽ More
Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify open samples in the target domain. However, these methods struggle with visually fine-grained open-closed data, often misclassifying open samples as closed-set classes. Moreover, relying solely on a single source domain restricts the model's ability to generalize. To overcome these limitations, we propose a novel framework called SODG-Net that simultaneously synthesizes novel domains and generates pseudo-open samples using a learning-based objective, in contrast to the ad-hoc mixing strategies commonly found in the literature. Our approach enhances generalization by diversifying the styles of known class samples using a novel metric criterion and generates diverse pseudo-open samples to train a unified and confident multi-class classifier capable of handling both open and closed-set data. Extensive experimental evaluations conducted on multiple benchmarks consistently demonstrate the superior performance of SODG-Net compared to the literature.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Anthropomorphic Gras** with Neural Object Shape Completion
Authors:
Diego Hidalgo-Carvajal,
Hanzhi Chen,
Gemma C. Bettelani,
Jaesug Jung,
Melissa Zavaglia,
Laura Busse,
Abdeldjallil Naceri,
Stefan Leutenegger,
Sami Haddadin
Abstract:
The progressive prevalence of robots in human-suited environments has given rise to a myriad of object manipulation techniques, in which dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable…
▽ More
The progressive prevalence of robots in human-suited environments has given rise to a myriad of object manipulation techniques, in which dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable capacity to interact with them. Hand postures commonly demonstrate the influence of specific regions on objects that need to be grasped, especially when objects are partially visible. In this work, we leverage human-like object understanding by reconstructing and completing their full geometry from partial observations, and manipulating them using a 7-DoF anthropomorphic robot hand. Our approach has significantly improved the gras** success rates of baselines with only partial reconstruction by nearly 30% and achieved over 150 successful grasps with three different object categories. This demonstrates our approach's consistent ability to predict and execute gras** postures based on the completed object shapes from various directions and positions in real-world scenarios. Our work opens up new possibilities for enhancing robotic applications that require precise gras** and manipulation skills of real-world reconstructed objects.
△ Less
Submitted 9 November, 2023; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Authors:
Martina G. Vilas,
Timothy Schaumlöffel,
Gemma Roig
Abstract:
Despite the growing use of transformer models in computer vision, a mechanistic understanding of these networks is still needed. This work introduces a method to reverse-engineer Vision Transformers trained to solve image classification tasks. Inspired by previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class emb…
▽ More
Despite the growing use of transformer models in computer vision, a mechanistic understanding of these networks is still needed. This work introduces a method to reverse-engineer Vision Transformers trained to solve image classification tasks. Inspired by previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class embedding space to uncover how these networks build categorical representations for their predictions. We use our framework to show how image tokens develop class-specific representations that depend on attention mechanisms and contextual information, and give insights on how self-attention and MLP layers differentially contribute to this categorical composition. We additionally demonstrate that this method (1) can be used to determine the parts of an image that would be important for detecting the class of interest, and (2) exhibits significant advantages over traditional linear probing approaches. Taken together, our results position our proposed framework as a powerful tool for mechanistic interpretability and explainability research.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
A directional regularization method for the limited-angle Helsinki Tomography Challenge using the Core Imaging Library (CIL)
Authors:
Jakob Sauer Jørgensen,
Evangelos Papoutsellis,
Laura Murgatroyd,
Gemma Fardell,
Edoardo Pasca
Abstract:
This article presents the algorithms developed by the Core Imaging Library (CIL) developer team for the Helsinki Tomography Challenge 2022. The challenge focused on reconstructing 2D phantom shapes from limited-angle computed tomography (CT) data. The CIL team designed and implemented five reconstruction methods using CIL (https://ccpi.ac.uk/cil/), an open-source Python package for tomographic ima…
▽ More
This article presents the algorithms developed by the Core Imaging Library (CIL) developer team for the Helsinki Tomography Challenge 2022. The challenge focused on reconstructing 2D phantom shapes from limited-angle computed tomography (CT) data. The CIL team designed and implemented five reconstruction methods using CIL (https://ccpi.ac.uk/cil/), an open-source Python package for tomographic imaging. The CIL team adopted a model-based reconstruction strategy, unique to this challenge with all other teams relying on deep-learning techniques. The CIL algorithms showcased exceptional performance, with one algorithm securing the third place in the competition. The best-performing algorithm employed careful CT data pre-processing and an optimization problem with single-sided directional total variation regularization combined with isotropic total variation and tailored lower and upper bounds. The reconstructions and segmentations achieved high quality for data with angular ranges down to 50 degrees, and in some cases acceptable performance even at 40 and 30 degrees. This study highlights the effectiveness of model-based approaches in limited-angle tomography and emphasizes the importance of proper algorithmic design leveraging on available prior knowledge to overcome data limitations. Finally, this study highlights the flexibility of CIL for prototy** and comparison of different optimization methods.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
A Framework for Universality in Physics, Computer Science, and Beyond
Authors:
Tomáš Gonda,
Tobias Reinhart,
Sebastian Stengele,
Gemma De les Coves
Abstract:
Turing machines and spin models share a notion of universality according to which some simulate all others. Is there a theory of universality that captures this notion? We set up a categorical framework for universality which includes as instances universal Turing machines, universal spin models, NP completeness, top of a preorder, denseness of a subset, and more. By identifying necessary conditio…
▽ More
Turing machines and spin models share a notion of universality according to which some simulate all others. Is there a theory of universality that captures this notion? We set up a categorical framework for universality which includes as instances universal Turing machines, universal spin models, NP completeness, top of a preorder, denseness of a subset, and more. By identifying necessary conditions for universality, we show that universal spin models cannot be finite. We also characterize when universality can be distinguished from a trivial one and use it to show that universal Turing machines are non-trivial in this sense. Our framework allows not only to compare universalities within each instance, but also instances themselves. We leverage a Fixed Point Theorem inspired by a result of Lawvere to establish that universality and negation give rise to unreachability (such as uncomputability). As such, this work sets the basis for a unified approach to universality and invites the study of further examples within the framework.
△ Less
Submitted 29 May, 2024; v1 submitted 30 June, 2023;
originally announced July 2023.
-
A Memristor-Inspired Computation for Epileptiform Signals in Spheroids
Authors:
Iván Díez de los Ríos,
John Wesley Ephraim,
Gemma Palazzolo,
Teresa Serrano-Gotarredona,
Gabriella Panuccio,
Bernabé Linares-Barranco
Abstract:
In this paper we present a memristor-inspired computational method for obtaining a type of running spectrogram or fingerprint of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this fingerprint technique and ill…
▽ More
In this paper we present a memristor-inspired computational method for obtaining a type of running spectrogram or fingerprint of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this fingerprint technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
Authors:
Dan Ruta,
Gemma Canet Tarrés,
Andrew Gilbert,
Eli Shechtman,
Nicholas Kolkin,
John Collomosse
Abstract:
Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and kee** most image structures the same. However, style-based deformation of the content is desirable for some…
▽ More
Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and kee** most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.
△ Less
Submitted 11 July, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
Unsupervised Segmentation of Fetal Brain MRI using Deep Learning Cascaded Registration
Authors:
Valentin Comte,
Mireia Alenya,
Andrea Urru,
Judith Recober,
Ayako Nakaki,
Francesca Crovetto,
Oscar Camara,
Eduard Gratacós,
Elisenda Eixarch,
Fàtima Crispi,
Gemma Piella,
Mario Ceresa,
Miguel A. González Ballester
Abstract:
Accurate segmentation of fetal brain magnetic resonance images is crucial for analyzing fetal brain development and detecting potential neurodevelopmental abnormalities. Traditional deep learning-based automatic segmentation, although effective, requires extensive training data with ground-truth labels, typically produced by clinicians through a time-consuming annotation process. To overcome this…
▽ More
Accurate segmentation of fetal brain magnetic resonance images is crucial for analyzing fetal brain development and detecting potential neurodevelopmental abnormalities. Traditional deep learning-based automatic segmentation, although effective, requires extensive training data with ground-truth labels, typically produced by clinicians through a time-consuming annotation process. To overcome this challenge, we propose a novel unsupervised segmentation method based on multi-atlas segmentation, that accurately segments multiple tissues without relying on labeled data for training. Our method employs a cascaded deep learning network for 3D image registration, which computes small, incremental deformations to the moving image to align it precisely with the fixed image. This cascaded network can then be used to register multiple annotated images with the image to be segmented, and combine the propagated labels to form a refined segmentation. Our experiments demonstrate that the proposed cascaded architecture outperforms the state-of-the-art registration methods that were tested. Furthermore, the derived segmentation method achieves similar performance and inference time to nnU-Net while only using a small subset of annotated data for the multi-atlas segmentation task and none for training the network. Our pipeline for registration and multi-atlas segmentation is publicly available at https://github.com/ValBcn/CasReg.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Predictive and diagnosis models of stroke from hemodynamic signal monitoring
Authors:
Luis García-Terriza,
José L. Risco-Martín,
Gemma Reig Roselló,
José L. Ayala
Abstract:
This work presents a novel and promising approach to the clinical management of acute stroke. Using machine learning techniques, our research has succeeded in develo** accurate diagnosis and prediction real-time models from hemodynamic data. These models are able to diagnose stroke subtype with 30 minutes of monitoring, to predict the exitus during the first 3 hours of monitoring, and to predict…
▽ More
This work presents a novel and promising approach to the clinical management of acute stroke. Using machine learning techniques, our research has succeeded in develo** accurate diagnosis and prediction real-time models from hemodynamic data. These models are able to diagnose stroke subtype with 30 minutes of monitoring, to predict the exitus during the first 3 hours of monitoring, and to predict the stroke recurrence in just 15 minutes of monitoring. Patients with difficult access to a \acrshort{CT} scan, and all patients that arrive at the stroke unit of a specialized hospital will benefit from these positive results. The results obtained from the real-time developed models are the following: stroke diagnosis around $98\%$ precision ($97.8\%$ Sensitivity, $99.5\%$ Specificity), exitus prediction with $99.8\%$ precision ($99.8\%$ Sens., $99.9\%$ Spec.) and $98\%$ precision predicting stroke recurrence ($98\%$ Sens., $99\%$ Spec.).
△ Less
Submitted 30 May, 2023;
originally announced June 2023.
-
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
Authors:
Sophia Harrison,
Eleonora Gualdoni,
Gemma Boleda
Abstract:
Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing spo…
▽ More
Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing sports: speakers produce names indicating the sport (e.g. 'tennis player' or 'surfer') more often when it is a man or a boy participating in the sport than when it is a woman or a girl, with an average of 46% vs. 35% of sports-related names for each gender. A computational model trained on these naming data reproduces the bias. We argue that both the data and the model result in representational harm against women.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer
Authors:
Dan Ruta,
Gemma Canet Tarres,
Alexander Black,
Andrew Gilbert,
John Collomosse
Abstract:
Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully ach…
▽ More
Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully achieved. Our paper aims to learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream multimodal applications.
△ Less
Submitted 17 August, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
PARASOL: Parametric Style Control for Diffusion Image Synthesis
Authors:
Gemma Canet Tarrés,
Dan Ruta,
Tu Bui,
John Collomosse
Abstract:
We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifier-free guidance for encouraging disentangled control over independent conte…
▽ More
We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifier-free guidance for encouraging disentangled control over independent content and style modalities at inference time. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the LDM, ensuring complementarity of content and style cues. PARASOL shows promise for enabling nuanced control over visual style in diffusion models for image creation and stylization, as well as generative search where text-based search results may be adapted to more closely match user intent by interpolating both content and style descriptors.
△ Less
Submitted 1 May, 2024; v1 submitted 11 March, 2023;
originally announced March 2023.
-
The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes
Authors:
A. T. Gifford,
B. Lahner,
S. Saba-Sadiya,
M. G. Vilas,
A. Lascelles,
A. Oliva,
K. Kay,
G. Roig,
R. M. Cichy
Abstract:
The sciences of biological and artificial intelligence are ever more intertwined. Neural computational principles inspire new intelligent machines, which are in turn used to advance theoretical understanding of the brain. To promote further exchange of ideas and collaboration between biological and artificial intelligence researchers, we introduce the 2023 installment of the Algonauts Project chal…
▽ More
The sciences of biological and artificial intelligence are ever more intertwined. Neural computational principles inspire new intelligent machines, which are in turn used to advance theoretical understanding of the brain. To promote further exchange of ideas and collaboration between biological and artificial intelligence researchers, we introduce the 2023 installment of the Algonauts Project challenge: How the Human Brain Makes Sense of Natural Scenes (http://algonauts.csail.mit.edu). This installment prompts the fields of artificial and biological intelligence to come together towards building computational models of the visual brain using the largest and richest dataset of fMRI responses to visual scenes, the Natural Scenes Dataset (NSD). NSD provides high-quality fMRI responses to ~73,000 different naturalistic colored scenes, making it the ideal candidate for data-driven model building approaches promoted by the 2023 challenge. The challenge is open to all and makes results directly comparable and transparent through a public leaderboard automatically updated after each submission, thus allowing for rapid model development. We believe that the 2023 installment will spark symbiotic collaborations between biological and artificial intelligence scientists, leading to a deeper understanding of the brain through cutting-edge computational models and to novel ways of engineering artificial intelligent agents through inductive biases from biological systems.
△ Less
Submitted 11 July, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Automated Routing of Droplets for DNA Storage on a Digital Microfluidics Platform
Authors:
Ajay Manicka,
Andrew Stephan,
Sriram Chari,
Gemma Mendonsa,
Peyton Okubo,
John Stolzberg-Schray,
Anil Reddy,
Marc Riedel
Abstract:
Technologies for sequencing (reading) and synthesizing (writing) DNA have progressed on a Moore's law-like trajectory over the last three decades. This has motivated the idea of using DNA for data storage. Theoretically, DNA-based storage systems could out-compete all existing forms of archival storage. However, a large gap exists between what is theoretically possible in terms of read and write s…
▽ More
Technologies for sequencing (reading) and synthesizing (writing) DNA have progressed on a Moore's law-like trajectory over the last three decades. This has motivated the idea of using DNA for data storage. Theoretically, DNA-based storage systems could out-compete all existing forms of archival storage. However, a large gap exists between what is theoretically possible in terms of read and write speeds and what has been practically demonstrated with DNA. This paper introduces a novel approach to DNA storage, with automated assembly on a digital microfluidic biochip. This technology offers unprecedented parallelism in DNA assembly using a dual library of "symbols" and "linkers". An algorithmic solution is discussed for the problem of managing droplet traffic on the device, with prioritized three-dimensional "A*" routing. An overview is given of the software that was developed for routing a large number of droplets in parallel on the device, minimizing congestion and maximizing throughput.
△ Less
Submitted 5 July, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Many bounded versions of undecidable problems are NP-hard
Authors:
Andreas Klingler,
Mirte van der Eyden,
Sebastian Stengele,
Tobias Reinhart,
Gemma De las Cuevas
Abstract:
Several physically inspired problems have been proven undecidable; examples are the spectral gap problem and the membership problem for quantum correlations. Most of these results rely on reductions from a handful of undecidable problems, such as the halting problem, the tiling problem, the Post correspondence problem or the matrix mortality problem. All these problems have a common property: they…
▽ More
Several physically inspired problems have been proven undecidable; examples are the spectral gap problem and the membership problem for quantum correlations. Most of these results rely on reductions from a handful of undecidable problems, such as the halting problem, the tiling problem, the Post correspondence problem or the matrix mortality problem. All these problems have a common property: they have an NP-hard bounded version. This work establishes a relation between undecidable unbounded problems and their bounded NP-hard versions. Specifically, we show that NP-hardness of a bounded version follows easily from the reduction of the unbounded problems. This leads to new and simpler proofs of the NP-hardness of bounded version of the Post correspondence problem, the matrix mortality problem, the positivity of matrix product operators, the reachability problem, the tiling problem, and the ground state energy problem. This work sheds light on the intractability of problems in theoretical physics and on the computational consequences of bounding a parameter.
△ Less
Submitted 15 March, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Communication breakdown: On the low mutual intelligibility between human and neural captioning
Authors:
Roberto Dessì,
Eleonora Gualdoni,
Francesca Franzon,
Gemma Boleda,
Marco Baroni
Abstract:
We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher per…
▽ More
We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher performance when fed neural rather than human captions, despite the fact that the former, unlike the latter, were generated without awareness of the distractors that make the task hard. Even more remarkably, when the same neural captions are given to human subjects, their retrieval performance is almost at chance level. Our results thus add to the growing body of evidence that, even when the ``language'' of neural models resembles English, this superficial resemblance might be deeply misleading.
△ Less
Submitted 27 April, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Inferring subhalo effective density slopes from strong lensing observations with neural likelihood-ratio estimation
Authors:
Gemma Zhang,
Siddharth Mishra-Sharma,
Cora Dvorkin
Abstract:
Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for in…
▽ More
Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for individual subhalos through traditional sampling methods. To go beyond individual subhalo measurements, we leverage recent advances in machine learning and introduce a neural likelihood-ratio estimator to infer an effective density slope for populations of subhalos. We demonstrate that our method is capable of harnessing the statistical power of multiple subhalos (within and across multiple images) to distinguish between characteristics of different subhalo populations. The computational efficiency warranted by the neural likelihood-ratio estimator over traditional sampling enables statistical studies of dark matter perturbers and is particularly useful as we expect an influx of strong lensing systems from upcoming surveys.
△ Less
Submitted 5 November, 2022; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Net2Brain: A Toolbox to compare artificial vision models with human brain responses
Authors:
Domenic Bersch,
Kshitij Dwivedi,
Martina Vilas,
Radoslaw M. Cichy,
Gemma Roig
Abstract:
We introduce Net2Brain, a graphical and command-line user interface toolbox for comparing the representational spaces of artificial deep neural networks (DNNs) and human brain recordings. While different toolboxes facilitate only single functionalities or only focus on a small subset of supervised image classification models, Net2Brain allows the extraction of activations of more than 600 DNNs tra…
▽ More
We introduce Net2Brain, a graphical and command-line user interface toolbox for comparing the representational spaces of artificial deep neural networks (DNNs) and human brain recordings. While different toolboxes facilitate only single functionalities or only focus on a small subset of supervised image classification models, Net2Brain allows the extraction of activations of more than 600 DNNs trained to perform a diverse range of vision-related tasks (e.g semantic segmentation, depth estimation, action recognition, etc.), over both image and video datasets. The toolbox computes the representational dissimilarity matrices (RDMs) over those activations and compares them to brain recordings using representational similarity analysis (RSA), weighted RSA, both in specific ROIs and with searchlight search. In addition, it is possible to add a new data set of stimuli and brain recordings to the toolbox for evaluation. We demonstrate the functionality and advantages of Net2Brain with an example showcasing how it can be used to test hypotheses of cognitive computational neuroscience.
△ Less
Submitted 25 August, 2022; v1 submitted 20 August, 2022;
originally announced August 2022.
-
On the Adoption and Effects of Source Code Reuse on Defect Proneness and Maintenance Effort
Authors:
Giammaria Giordano,
Gerardo Festa,
Gemma Catolino,
Fabio Palomba,
Filomena Ferrucci,
Carmine Gravino
Abstract:
Context. Software reusability mechanisms, like inheritance and delegation in Object-Oriented programming, are widely recognized as key instruments of software design. These are used to reduce the risks of source code being affected by defects, other than to reduce the effort required to maintain and evolve source code. Previous work has traditionally employed source code reuse metrics for predicti…
▽ More
Context. Software reusability mechanisms, like inheritance and delegation in Object-Oriented programming, are widely recognized as key instruments of software design. These are used to reduce the risks of source code being affected by defects, other than to reduce the effort required to maintain and evolve source code. Previous work has traditionally employed source code reuse metrics for prediction purposes, e.g., in the context of defect prediction. Objective. However, our research identifies two noticeable limitations of current literature. First, still little is known on the extent to which developers actually employ code reuse mechanisms over time. Second, it is still unclear how these mechanisms may contribute to explain defect-proneness and maintenance effort during software evolution. We aim at bridging this gap of knowledge, as an improved understanding of these aspects might provide insights into the actual support provided by these mechanisms, e.g., by suggesting whether and how to use them for prediction purposes. Method. We propose an exploratory study aiming at (1) assessing how developers use inheritance and delegation during software evolution; and (2) statistically analyze the impact of inheritance and delegation on fault proneness and maintenance effort. The study will be conducted on the commits of 17 Java projects of the DEFECTS4J dataset.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Using Sentence Embeddings and Semantic Similarity for Seeking Consensus when Assessing Trustworthy AI
Authors:
Dennis Vetter,
Jesmin Jahan Tithi,
Magnus Westerlund,
Roberto V. Zicari,
Gemma Roig
Abstract:
Assessing the trustworthiness of artificial intelligence systems requires knowledge from many different disciplines. These disciplines do not necessarily share concepts between them and might use words with different meanings, or even use the same words differently. Additionally, experts from different disciplines might not be aware of specialized terms readily used in other disciplines. Therefore…
▽ More
Assessing the trustworthiness of artificial intelligence systems requires knowledge from many different disciplines. These disciplines do not necessarily share concepts between them and might use words with different meanings, or even use the same words differently. Additionally, experts from different disciplines might not be aware of specialized terms readily used in other disciplines. Therefore, a core challenge of the assessment process is to identify when experts from different disciplines talk about the same problem but use different terminologies. In other words, the problem is to group problem descriptions (a.k.a. issues) with the same semantic meaning but described using slightly different terminologies.
In this work, we show how we employed recent advances in natural language processing, namely sentence embeddings and semantic textual similarity, to support this identification process and to bridge communication gaps in interdisciplinary teams of experts assessing the trustworthiness of an artificial intelligence system used in healthcare.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Unstructured Road Segmentation using Hypercolumn based Random Forests of Local experts
Authors:
Prassanna Ganesh Ravishankar,
Antonio M. Lopez,
Gemma M. Sanchez
Abstract:
Monocular vision based road detection methods are mostly based on machine learning methods, relying on classification and feature extraction accuracy, and suffer from appearance, illumination and weather changes. Traditional methods introduce the predictions into conditional random fields or markov random fields models to improve the intermediate predictions based on structure. These methods are o…
▽ More
Monocular vision based road detection methods are mostly based on machine learning methods, relying on classification and feature extraction accuracy, and suffer from appearance, illumination and weather changes. Traditional methods introduce the predictions into conditional random fields or markov random fields models to improve the intermediate predictions based on structure. These methods are optimization based and therefore resource heavy and slow, making it unsuitable for real time applications. We propose a method to detect and segment roads with a random forest classifier of local experts with superpixel based machine-learned features. The random forest takes in machine learnt descriptors from a pre-trained convolutional neural network - VGG-16. The features are also pooled into their respective superpixels, allowing for local structure to be continuous. We compare our algorithm against Nueral Network based methods and Traditional approaches (based on Hand-crafted features), on both Structured Road (CamVid and Kitti) and Unstructured Road Datasets. Finally, we introduce a Road Scene Dataset with 1000 annotated images, and verify that our algorithm works well in non-urban and rural road scenarios.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Job Offers Classifier using Neural Networks and Oversampling Methods
Authors:
Germán Ortiz,
Gemma Bel Enguix,
Helena Gómez-Adorno,
Iqra Ameer,
Grigori Sidorov
Abstract:
Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran…
▽ More
Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran.com.mx/ Last visited: 19-01-2022.. We applied machine learning algorithms such as Support Vector Machines, Naive-Bayes, Logistic Regression, Random Forest, and deep learning Long-Short Term Memory (LSTM). Using these algorithms, we trained multi-class models to classify job offers in one of the 23 classes (not uniformly distributed): Sales, Administration, Call Center, Technology, Trades, Human Resources, Logistics, Marketing, Health, Gastronomy, Financing, Secretary, Production, Engineering, Education, Design, Legal, Construction, Insurance, Communication, Management, Foreign Trade, and Mining. We used the SMOTE, Geometric-SMOTE, and ADASYN synthetic oversampling algorithms to handle imbalanced classes. The proposed convolutional neural network architecture achieved the best results when applied the Geometric-SMOTE algorithm.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
What do navigation agents learn about their environment?
Authors:
Kshitij Dwivedi,
Gemma Roig,
Aniruddha Kembhavi,
Roozbeh Mottaghi
Abstract:
Today's state of the art visual navigation agents typically consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the actions of the agent taken in response to its environment. While past works have explored interpreting deep learning models, little attention has been devoted to interpreting embodied AI systems, which…
▽ More
Today's state of the art visual navigation agents typically consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the actions of the agent taken in response to its environment. While past works have explored interpreting deep learning models, little attention has been devoted to interpreting embodied AI systems, which often involve reasoning about the structure of the environment, target characteristics and the outcome of one's actions. In this paper, we introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents. We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment. We demonstrate interesting insights about navigation agents using iSEE, including the ability to encode reachable locations (to avoid obstacles), visibility of the target, progress from the initial spawn location as well as the dramatic effect on the behaviors of agents when we mask out critical individual neurons. The code is available at: https://github.com/allenai/iSEE
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
An automatic pipeline for atlas-based fetal and neonatal brain segmentation and analysis
Authors:
Urru,
Andrea,
Nakaki,
Ayako,
Benkarim,
Oualid,
Crovetto,
Francesca,
Segales,
Laura,
Comte,
Valentin,
Hahner,
Nadine,
Eixarch,
Elisenda,
Gratacós,
Eduard,
Crispi,
Fàtima,
Piella,
Gemma,
González Ballester,
Miguel A
Abstract:
The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging. In this work, a new pipeline for fetal and neonatal segmentation has been developed. We…
▽ More
The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging. In this work, a new pipeline for fetal and neonatal segmentation has been developed. We also report the creation of two new fetal atlases, and their use within the pipeline for atlas-based segmentation, based on novel registration methods. The pipeline is also able to extract cortical and pial surfaces and compute features, such as curvature, thickness, sulcal depth, and local gyrification index. Results show that the introduction of the new templates together with our segmentation strategy leads to accurate results when compared to expert annotations, as well as better performances when compared to a reference pipeline (develo** Human Connectome Project (dHCP)), for both early and late-onset fetal brains.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
CoGS: Controllable Generation and Search from Sketch and Style
Authors:
Cusuh Ham,
Gemma Canet Tarres,
Tu Bui,
James Hays,
Zhe Lin,
John Collomosse
Abstract:
We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar "style" conditioning image t…
▽ More
We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar "style" conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user's intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.
△ Less
Submitted 20 July, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
BabyNet: Reconstructing 3D faces of babies from uncalibrated photographs
Authors:
Araceli Morales,
Antonio R. Porras,
Marius George Linguraru,
Gemma Piella,
Federico M. Sukno
Abstract:
We present a 3D face reconstruction system that aims at recovering the 3D facial geometry of babies from uncalibrated photographs, BabyNet. Since the 3D facial geometry of babies differs substantially from that of adults, baby-specific facial reconstruction systems are needed. BabyNet consists of two stages: 1) a 3D graph convolutional autoencoder learns a latent space of the baby 3D facial shape;…
▽ More
We present a 3D face reconstruction system that aims at recovering the 3D facial geometry of babies from uncalibrated photographs, BabyNet. Since the 3D facial geometry of babies differs substantially from that of adults, baby-specific facial reconstruction systems are needed. BabyNet consists of two stages: 1) a 3D graph convolutional autoencoder learns a latent space of the baby 3D facial shape; and 2) a 2D encoder that maps photographs to the 3D latent space based on representative features extracted using transfer learning. In this way, using the pre-trained 3D decoder, we can recover a 3D face from 2D images. We evaluate BabyNet and show that 1) methods based on adult datasets cannot model the 3D facial geometry of babies, which proves the need for a baby-specific method, and 2) BabyNet outperforms classical model-fitting methods even when a baby-specific 3D morphable model, such as BabyFM, is used.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Authors:
Phoebe Chua,
Dimos Makris,
Dorien Herremans,
Gemma Roig,
Kat Agres
Abstract:
Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived e…
▽ More
Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media. The data were collected by presenting music videos to participants in three conditions: music, visual, and audiovisual. Participants annotated the music videos for valence and arousal over time, as well as the overall emotion conveyed. We present detailed descriptive statistics for key measures in the dataset and the results of feature importance analyses for each condition. Finally, we propose a novel transfer learning architecture to train Predictive models Augmented with Isolated modality Ratings (PAIR) and demonstrate the potential of isolated modality ratings for enhancing multimodal emotion recognition. Our results suggest that perceptions of arousal are influenced primarily by auditory information, while perceptions of valence are more subjective and can be influenced by both visual and auditory information. The dataset is made publicly available.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
FRIDA -- Generative Feature Replay for Incremental Domain Adaptation
Authors:
Sayan Rakshit,
Anwesh Mohanty,
Ruchika Chavhan,
Biplab Banerjee,
Gemma Roig,
Subhasis Chaudhuri
Abstract:
We tackle the novel problem of incremental unsupervised domain adaptation (IDA) in this paper. We assume that a labeled source domain and different unlabeled target domains are incrementally observed with the constraint that data corresponding to the current domain is only available at a time. The goal is to preserve the accuracies for all the past domains while generalizing well for the current d…
▽ More
We tackle the novel problem of incremental unsupervised domain adaptation (IDA) in this paper. We assume that a labeled source domain and different unlabeled target domains are incrementally observed with the constraint that data corresponding to the current domain is only available at a time. The goal is to preserve the accuracies for all the past domains while generalizing well for the current domain. The IDA setup suffers due to the abrupt differences among the domains and the unavailability of past data including the source domain. Inspired by the notion of generative feature replay, we propose a novel framework called Feature Replay based Incremental Domain Adaptation (FRIDA) which leverages a new incremental generative adversarial network (GAN) called domain-generic auxiliary classification GAN (DGAC-GAN) for producing domain-specific feature representations seamlessly. For domain alignment, we propose a simple extension of the popular domain adversarial neural network (DANN) called DANN-IB which encourages discriminative domain-invariant and task-relevant feature learning. Experimental results on Office-Home, Office-CalTech, and DomainNet datasets confirm that FRIDA maintains superior stability-plasticity trade-off than the literature.
△ Less
Submitted 11 January, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
TinyML Platforms Benchmarking
Authors:
Anas Osman,
Usman Abid,
Luca Gemma,
Matteo Perotto,
Davide Brunelli
Abstract:
Recent advances in state-of-the-art ultra-low power embedded devices for machine learning (ML) have permitted a new class of products whose key features enable ML capabilities on microcontrollers with less than 1 mW power consumption (TinyML). TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices. However, we have only recently been able to r…
▽ More
Recent advances in state-of-the-art ultra-low power embedded devices for machine learning (ML) have permitted a new class of products whose key features enable ML capabilities on microcontrollers with less than 1 mW power consumption (TinyML). TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices. However, we have only recently been able to run ML on microcontrollers, and the field is still in its infancy, which means that hardware, software, and research are changing extremely rapidly. Consequently, many TinyML frameworks have been developed for different platforms to facilitate the deployment of ML models and standardize the process. Therefore, in this paper, we focus on bench-marking two popular frameworks: Tensorflow Lite Micro (TFLM) on the Arduino Nano BLE and CUBE AI on the STM32-NucleoF401RE to provide a standardized framework selection criterion for specific applications.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Identifiable Deep Generative Models via Sparse Decoding
Authors:
Gemma E. Moran,
Dhanya Sridhar,
Yixin Wang,
David M. Blei
Abstract:
We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each m…
▽ More
We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable: with infinite data, the true model parameters can be learned. (In contrast, most deep generative models are not identifiable.) We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.
△ Less
Submitted 17 February, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution
Authors:
Laura Aina,
Xixian Liao,
Gemma Boleda,
Matthijs Westera
Abstract:
It is often posited that more predictable parts of a speaker's meaning tend to be made less explicit, for instance using shorter, less informative words. Studying these dynamics in the domain of referring expressions has proven difficult, with existing studies, both psycholinguistic and corpus-based, providing contradictory results. We test the hypothesis that speakers produce less informative ref…
▽ More
It is often posited that more predictable parts of a speaker's meaning tend to be made less explicit, for instance using shorter, less informative words. Studying these dynamics in the domain of referring expressions has proven difficult, with existing studies, both psycholinguistic and corpus-based, providing contradictory results. We test the hypothesis that speakers produce less informative referring expressions (e.g., pronouns vs. full noun phrases) when the context is more informative about the referent, using novel computational estimates of referent predictability. We obtain these estimates training an existing coreference resolution system for English on a new task, masked coreference resolution, giving us a probability distribution over referents that is conditioned on the context but not the referring expression. The resulting system retains standard coreference resolution performance while yielding a better estimate of human-derived referent predictability than previous attempts. A statistical analysis of the relationship between model output and mention form supports the hypothesis that predictability affects the form of a mention, both its morphosyntactic type and its length.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
How Computer Science Can Aid Forest Restoration
Authors:
Gemma Gordon,
Amelia Holcomb,
Tom Kelly,
Srinivasan Keshav,
Jon Ludlum,
Anil Madhavapeddy
Abstract:
The world faces two interlinked crises: climate change and loss of biodiversity. Forest restoration on degraded lands and surplus croplands can play a significant role both in sequestering carbon and re-establishing bio-diversity. There is a considerable body of research and practice that addresses forest restoration. However, there has been little work by computer scientists to bring powerful com…
▽ More
The world faces two interlinked crises: climate change and loss of biodiversity. Forest restoration on degraded lands and surplus croplands can play a significant role both in sequestering carbon and re-establishing bio-diversity. There is a considerable body of research and practice that addresses forest restoration. However, there has been little work by computer scientists to bring powerful computational techniques to bear on this important area of work, perhaps due to a lack of awareness. In an attempt to bridge this gap, we present our vision of how techniques from computer science, broadly speaking, can aid current practice in forest restoration.
△ Less
Submitted 12 August, 2021;
originally announced September 2021.
-
The academic motherload: Models of parenting engagement and the effect on academic productivity and performance
Authors:
Derrick G. E.,
Chen P-Y.,
van Leeuwen T.,
Lariviere V.,
Sugimoto C. R
Abstract:
Gender differences in research productivity are well documented, and have been mostly explained by access parental leave and child-related responsibilities. Those explanations are based on the assumption that women take on the majority of childcare responsibilities, and take the same level of leave at the birth of a child. Changing social dynamics around parenting has seen fathers increasingly tak…
▽ More
Gender differences in research productivity are well documented, and have been mostly explained by access parental leave and child-related responsibilities. Those explanations are based on the assumption that women take on the majority of childcare responsibilities, and take the same level of leave at the birth of a child. Changing social dynamics around parenting has seen fathers increasingly take an active role in parenting. This demands a more nuanced approach to understanding how parenting affects both men and women. Using a global survey of 11,226 academic parents, this study investigates the effect of parental engagement (Lead, Dual (shared), and Satellite parenting), and partner type, on measures of research productivity and impact for men and for women. It also analyzes the effect of different levels of parental leave on academic productivity. Results show that the parenting penalty for men and women is a function of the level of engagement in parenting activities. Men who serve in lead roles suffer similar penalties, but women are more likely to serve in lead parenting roles and to be more engaged across time and tasks. Taking a period of parental leave is associated with higher levels of productivity, however the productivity advantage is lost for the US-sample at 6 months, and at 12-months for the non-US sample. These results suggest that parental engagement is a more powerful variable to explain gender differences in academic productivity than the mere existence of children, and that policies should that factor into account.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Memory-aware curriculum federated learning for breast cancer classification
Authors:
Amelia Jiménez-Sánchez,
Mickael Tardy,
Miguel A. González Ballester,
Diana Mateus,
Gemma Piella
Abstract:
For early breast cancer detection, regular screening with mammography imaging is recommended. Routinary examinations result in datasets with a predominant amount of negative samples. A potential solution to such class-imbalance is joining forces across multiple institutions. Develo** a collaborative computer-aided diagnosis system is challenging in different ways. Patient privacy and regulations…
▽ More
For early breast cancer detection, regular screening with mammography imaging is recommended. Routinary examinations result in datasets with a predominant amount of negative samples. A potential solution to such class-imbalance is joining forces across multiple institutions. Develo** a collaborative computer-aided diagnosis system is challenging in different ways. Patient privacy and regulations need to be carefully respected. Data across institutions may be acquired from different devices or imaging protocols, leading to heterogeneous non-IID data. Also, for learning-based methods, new optimization strategies working on distributed data are required. Recently, federated learning has emerged as an effective tool for collaborative learning. In this setting, local models perform computation on their private data to update the global model. The order and the frequency of local updates influence the final global model. Hence, the order in which samples are locally presented to the optimizers plays an important role. In this work, we define a memory-aware curriculum learning method for the federated setting. Our curriculum controls the order of the training samples paying special attention to those that are forgotten after the deployment of the global model. Our approach is combined with unsupervised domain adaptation to deal with domain shift while preserving data privacy. We evaluate our method with three clinical datasets from different vendors. Our results verify the effectiveness of federated adversarial learning for the multi-site breast cancer classification. Moreover, we show that our proposed memory-aware curriculum method is beneficial to further improve classification performance. Our code is publicly available at: https://github.com/ameliajimenez/curriculum-federated-learning.
△ Less
Submitted 6 January, 2023; v1 submitted 6 July, 2021;
originally announced July 2021.