-
Conformal Prediction for Natural Language Processing: A Survey
Authors:
Margarida M. Campos,
António Farinhas,
Chrysoula Zerva,
Mário A. T. Figueiredo,
André F. T. Martins
Abstract:
The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistica…
▽ More
The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistical guarantees. Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems that stem from the absence of uncertainty quantification. This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP, pointing to directions for future research and open challenges.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
The sandwich problem for odd-hole-free and even-hole-free graphs
Authors:
Kathie Cameron,
Aristotelis Chaniotis,
Celina M. H. de Figueiredo,
Sophie Spirkl
Abstract:
For a property $\mathcal{P}$ of graphs, the $\mathcal{P}$-\textsc{Sandwich-Problem}, introduced by Golumbic and Shamir (1993), is the following: Given a pair of graphs $(G_1, G_2)$ on the same vertex set $V$, does there exist a graph $G$ such that $V(G)=V$, $E(G_{1})\subseteq E(G) \subseteq E(G_{2})$, and $G$ satisfies $\mathcal{P}$? A {\em hole} in a graph is an induced subgraph which is a cycle…
▽ More
For a property $\mathcal{P}$ of graphs, the $\mathcal{P}$-\textsc{Sandwich-Problem}, introduced by Golumbic and Shamir (1993), is the following: Given a pair of graphs $(G_1, G_2)$ on the same vertex set $V$, does there exist a graph $G$ such that $V(G)=V$, $E(G_{1})\subseteq E(G) \subseteq E(G_{2})$, and $G$ satisfies $\mathcal{P}$? A {\em hole} in a graph is an induced subgraph which is a cycle of length at least four. An odd (respectively even) hole is a hole of odd (respectively even) length. Given a class of graphs $\mathcal{C}$ and a graph $G$ we say that $G$ is {\em $\mathcal{C}$-free} if it contains no induced subgraph isomorphic to a member of $\mathcal{C}$. In this paper we prove that if $\mathcal{P}$ is the property of being odd-hole-free or the property of being even-hole-free, then the $\mathcal{P}$-\textsc{Sandwich-Problem} is NP-hard.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
A Measure of Synergy based on Union Information
Authors:
André F. C. Gomes,
Mário A. T. Figueiredo
Abstract:
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of (two or more) random variables (the sources) has about another variable (the target) into three types of information: unique, redundant, and synergistic. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions…
▽ More
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of (two or more) random variables (the sources) has about another variable (the target) into three types of information: unique, redundant, and synergistic. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. One often overlooked way to achieve this decomposition is using a so-called measure of union information - which quantifies the information that is present in at least one of the sources - from which a synergy measure stems. In this paper, we introduce a new measure of union information based on adopting a communication channel perspective, compare it with existing measures, and study some of its properties. We also include a comprehensive critical review of characterizations of union information and synergy measures that have been proposed in the literature.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Javier Liébana,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring c…
▽ More
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset and iii) not dealing with human work capacity constraints. To address these issues, we propose the deferral under cost and capacity constraints framework (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average 8.4% reduction in the misclassification cost.
△ Less
Submitted 21 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Photonic-electronic spiking neuron with multi-modal and multi-wavelength excitatory and inhibitory operation for high-speed neuromorphic sensing and computing
Authors:
Weikang Zhang,
Matěj Hejda,
Qusay Raghib Ali Al-Taai,
Dafydd Owen-Newns,
Bruno Romeira,
José M. L. Figueiredo,
Joshua Robertson,
Edward Wasige,
Antonio Hurtado
Abstract:
We report a multi-modal spiking neuron that allows optical and electronic input and control, and wavelength-multiplexing operation, for use in novel high-speed neuromorphic sensing and computing functionalities. The photonic-electronic neuron is built with a micro-scale, nanostructure resonant tunnelling diode (RTD) with photodetection (PD) capability. Leveraging the advantageous intrinsic propert…
▽ More
We report a multi-modal spiking neuron that allows optical and electronic input and control, and wavelength-multiplexing operation, for use in novel high-speed neuromorphic sensing and computing functionalities. The photonic-electronic neuron is built with a micro-scale, nanostructure resonant tunnelling diode (RTD) with photodetection (PD) capability. Leveraging the advantageous intrinsic properties of this RTD-PD system, namely highly nonlinear characteristics, photo-sensitivity, light-induced I-V curve shift, and the ability to deliver excitable responses under electrical and optical inputs, we successfully achieve flexible neuromorphic spike activation and inhibition regimes through photonic-electrical control. We also demonstrate the ability of this RTD-PD spiking sensing-processing neuron to operate under the simultaneous arrival of multiple wavelength-multiplexed optical signals, due to its large photodetection spectral window (covering the 1310 and 1550 nm telecom wavelength bands). Our results highlight the potential of RTD photonic-electronic neurons to reproduce multiple key excitatory and inhibitory spiking regimes, at high speed (ns-rate spiking responses, with faster sub-ns regimes theoretically predicted) and low energy (requiring only ~10 mV and ~150 microW, electrical and optical input amplitudes, respectively), similar in nature to those commonly found in the biological neurons of the visual system and the brain. This work offers a highly promising approach for the realisation of high-speed, energy-efficient photonic-electronic spiking neurons and spiking neural networks, enabling multi-modal and multi-wavelength operation for sensing and information processing tasks. This work therefore paves the way for innovative high-speed, photonic-electronic, and spike-based neuromorphic sensing and computing systems and artificial intelligence hardware.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
DiConStruct: Causal Concept-based Explanations through Black-Box Distillation
Authors:
Ricardo Moreira,
Jacopo Bono,
Mário Cardoso,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predi…
▽ More
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.
△ Less
Submitted 26 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
FiFAR: A Fraud Detection Dataset for Learning to Defer
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud det…
▽ More
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also provide a realistic definition of human work capacity constraints, an aspect of L2D systems that is often overlooked, allowing for extensive testing of assignment systems under real-world conditions. We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions, and benchmark these baselines under an array of 300 distinct testing scenarios. We believe that this dataset will serve as a pivotal instrument in facilitating a systematic, rigorous, reproducible, and transparent evaluation and comparison of L2D methods, thereby fostering the development of more synergistic human-AI collaboration in decision-making systems. The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
DeepThought: An Architecture for Autonomous Self-motivated Systems
Authors:
Arlindo L. Oliveira,
Tiago Domingos,
Mário Figueiredo,
Pedro U. Lima
Abstract:
The ability of large language models (LLMs) to engage in credible dialogues with humans, taking into account the training data and the context of the conversation, has raised discussions about their ability to exhibit intrinsic motivations, agency, or even some degree of consciousness. We argue that the internal architecture of LLMs and their finite and volatile state cannot support any of these p…
▽ More
The ability of large language models (LLMs) to engage in credible dialogues with humans, taking into account the training data and the context of the conversation, has raised discussions about their ability to exhibit intrinsic motivations, agency, or even some degree of consciousness. We argue that the internal architecture of LLMs and their finite and volatile state cannot support any of these properties. By combining insights from complementary learning systems, global neuronal workspace, and attention schema theories, we propose to integrate LLMs and other deep learning systems into an architecture for cognitive language agents able to exhibit properties akin to agency, self-motivation, even some features of meta-cognition.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Orders between channels and implications for partial information decomposition
Authors:
André F. C. Gomes,
Máario A. T. Figueiredo
Abstract:
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. Recently, Kolc…
▽ More
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information, based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, thus contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study some of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky's axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first propose the PID.
△ Less
Submitted 14 July, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Optically-triggered deterministic spiking regimes in nanostructure resonant tunnelling diode-photodetectors
Authors:
Qusay Raghib Ali Al-Taai,
Matěj Hejda,
Weikang Zhang,
Bruno Romeira,
José M. L. Figueiredo,
Edward Wasige,
Antonio Hurtado
Abstract:
This work reports a nanostructure resonant tunnelling diode-photodetector (RTD-PD) device and demonstrates its operation as a controllable, optically-triggered excitable spike generator. The top contact layer of the device is designed with a nanopillar structure 500 nm in diameter) to restrain the injection current, yielding therefore lower energy operation for spike generation. We demonstrate exp…
▽ More
This work reports a nanostructure resonant tunnelling diode-photodetector (RTD-PD) device and demonstrates its operation as a controllable, optically-triggered excitable spike generator. The top contact layer of the device is designed with a nanopillar structure 500 nm in diameter) to restrain the injection current, yielding therefore lower energy operation for spike generation. We demonstrate experimentally the deterministic optical triggering of controllable and repeatable neuron-like spike patterns in the nanostructure RTD-PDs. Moreover, we show the device's ability to deliver spiking responses when biased in both regions adjacent to the negative differential conductance (NDC) region, the so-called 'peak' and 'valley' points of the current-voltage ($I$-$V$) characteristic. This work also demonstrates experimentally key neuron-like dynamical features in the nanostructure RTD-PD, such as a well-defined threshold (in input optical intensity) for spike firing, as well as the presence of spike firing refractory time. The optoelectronic and chip-scale character of the proposed system together with the deterministic, repeatable and well controllable nature of the optically-elicited spiking responses render this nanostructure RTD-PD element as a highly promising solution for high-speed, energy-efficient optoelectronic artificial spiking neurons for novel light-enabled neuromorphic computing hardware.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Fairness-Aware Data Valuation for Supervised Learning
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework tha…
▽ More
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model
Authors:
Mário A. T. Figueiredo,
Catarina A. Oliveira
Abstract:
Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional…
▽ More
Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
ProBoost: a Boosting Method for Probabilistic Classifiers
Authors:
Fábio Mendonça,
Sheikh Shanawaz Mostafa,
Fernando Morgado-Dias,
Antonio G. Ravelo-García,
Mário A. T. Figueiredo
Abstract:
ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end…
▽ More
ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end, the weak learners' outputs are combined into a weighted ensemble of classifiers. Three methods are proposed to manipulate the training set: undersampling, oversampling, and weighting the training samples according to the uncertainty estimated by the weak learners. Furthermore, two approaches are studied regarding the ensemble combination. The weak learner herein considered is a standard convolutional neural network, and the probabilistic models underlying the uncertainty estimation use either variational inference or Monte Carlo dropout. The experimental evaluation carried out on MNIST benchmark datasets shows that ProBoost yields a significant performance improvement. The results are further highlighted by assessing the relative achievable improvement, a metric proposed in this work, which shows that a model with only four weak learners leads to an improvement exceeding 12% in this metric (for either accuracy, sensitivity, or specificity), in comparison to the model learned without ProBoost.
△ Less
Submitted 4 September, 2022;
originally announced September 2022.
-
Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions
Authors:
José Pombal,
André F. Cruz,
João Bravo,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial…
▽ More
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Human-AI Collaboration in Decision-Making: Beyond Learning to Defer
Authors:
Diogo Leitão,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements…
▽ More
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.
△ Less
Submitted 13 July, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases o…
▽ More
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in sha** the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Artificial optoelectronic spiking neuron based on a resonant tunnelling diode coupled to a vertical cavity surface emitting laser
Authors:
Matěj Hejda,
Ekaterina Malysheva,
Dafydd Owen-Newns,
Qusay Raghib Ali Al-Taai,
Weikang Zhang,
Ignacio Ortega-Piwonka,
Julien Javaloyes,
Edward Wasige,
Victor Dolores-Calzadilla,
José M. L. Figueiredo,
Bruno Romeira,
Antonio Hurtado
Abstract:
Excitable optoelectronic devices represent one of the key building blocks for implementation of artificial spiking neurons in neuromorphic (brain-inspired) photonic systems. This work introduces and experimentally investigates an opto-electro-optical (O/E/O) artificial neuron built with a resonant tunnelling diode (RTD) coupled to a photodetector as a receiver and a vertical cavity surface emittin…
▽ More
Excitable optoelectronic devices represent one of the key building blocks for implementation of artificial spiking neurons in neuromorphic (brain-inspired) photonic systems. This work introduces and experimentally investigates an opto-electro-optical (O/E/O) artificial neuron built with a resonant tunnelling diode (RTD) coupled to a photodetector as a receiver and a vertical cavity surface emitting laser as a the transmitter. We demonstrate a well defined excitability threshold, above which this neuron produces 100 ns optical spiking responses with characteristic neural-like refractory period. We utilise its fan-in capability to perform in-device coincidence detection (logical AND) and exclusive logical OR (XOR) tasks. These results provide first experimental validation of deterministic triggering and tasks in an RTD-based spiking optoelectronic neuron with both input and output optical (I/O) terminals. Furthermore, we also investigate in theory the prospects of the proposed system for its nanophotonic implementation with a monolithic design combining a nanoscale RTD element and a nanolaser; therefore demonstrating the potential of integrated RTD-based excitable nodes for low footprint, high-speed optoelectronic spiking neurons in future neuromorphic photonic hardware.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Differentiable Causal Discovery Under Latent Interventions
Authors:
Gonçalo R. A. Faria,
André F. T. Martins,
Mário A. T. Figueiredo
Abstract:
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and…
▽ More
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and one observation distribution, but where we do not know which distribution originated each sample and how the intervention affected the system, \textit{i.e.}, interventions are entirely latent. We propose a method based on neural networks and variational inference that addresses this scenario by framing it as learning a shared causal graph among an infinite mixture (under a Dirichlet process prior) of intervention structural causal models. Experiments with synthetic and real data show that our approach and its semi-supervised variant are able to discover causal relations in this challenging scenario.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
MaxCut on Permutation Graphs is NP-complete
Authors:
Celina M. H. de Figueiredo,
Alexsander A. de Melo,
Fabiano S. Oliveira,
Ana Silva
Abstract:
In this paper, we prove that the MaxCut problem is NP-complete on permutation graphs, settling a long-standing open problem that appeared in the 1985 column of the "Ongoing Guide to NP-completeness" by David S. Johnson.
In this paper, we prove that the MaxCut problem is NP-complete on permutation graphs, settling a long-standing open problem that appeared in the 1985 column of the "Ongoing Guide to NP-completeness" by David S. Johnson.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Most direct product of graphs are Type 1
Authors:
Diane Castonguay,
Celina M. H. de Figueiredo,
Luis Antonio Kowada,
Caroline Reis Patrão,
Diana Sasaki
Abstract:
A \textit{$k$-total coloring} of a graph $G$ is an assignment of $k$ colors to its elements (vertices and edges) so that adjacent or incident elements have different colors. The total chromatic number is the smallest integer $k$ for which the graph $G$ has a $k$-total coloring. Clearly, this number is at least $Δ(G)+1$, where $Δ(G)$ is the maximum degree of $G$. When the lower bound is reached, th…
▽ More
A \textit{$k$-total coloring} of a graph $G$ is an assignment of $k$ colors to its elements (vertices and edges) so that adjacent or incident elements have different colors. The total chromatic number is the smallest integer $k$ for which the graph $G$ has a $k$-total coloring. Clearly, this number is at least $Δ(G)+1$, where $Δ(G)$ is the maximum degree of $G$. When the lower bound is reached, the graph is said to be Type~1. The upper bound of $Δ(G)+2$ is a central problem that has been open for fifty years, is verified for graphs with maximum degree 4 but not for regular graphs.
Most classified direct product of graphs are Type~1. The particular cases of the direct product of cycle graphs $C_m \times C_n$, for $m =3p, 5\ell$ and $8\ell$ with $p \geq 2$ and $\ell \geq 1$, and arbitrary $n \geq 3$, were previously known to be Type 1 and motivated the conjecture that, except for $C_4 \times C_4$, all direct product of cycle graphs $C_m \times C_n$ with $m,n \geq 3$ are Type 1.
We give a general pattern proving that all $C_m \times C_n$ are Type 1, except for $C_4 \times C_4$. dditionally, we investigate sufficient conditions to ensure that the direct product reaches the lower bound for the total chromatic number.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Classification of anomalous gait using Machine Learning techniques and embedded sensors
Authors:
T. R. D. Sa,
C. M. S. Figueiredo
Abstract:
Human gait can be a predictive factor for detecting pathologies that affect human locomotion according to studies. In addition, it is known that a high investment is demanded in order to raise a traditional clinical infrastructure able to provide human gait examinations, making them unaffordable for economically vulnerable patients. In face of this scenario, this work proposes an accessible and mo…
▽ More
Human gait can be a predictive factor for detecting pathologies that affect human locomotion according to studies. In addition, it is known that a high investment is demanded in order to raise a traditional clinical infrastructure able to provide human gait examinations, making them unaffordable for economically vulnerable patients. In face of this scenario, this work proposes an accessible and modern solution composed of a wearable device, to acquire 3D-accelerometer and 3D-gyroscope measurements, and machine learning techniques to classify between distinct categories of induced gait disorders. In order to develop the proposed research, it was created a dataset with the target label being 4 distinct and balanced categories of anomalous gait. The machine learning techniques that achieved the best performances (in terms of accuracy) in this dataset were through the application of Principal Component Analysis algorithm following of a Support Vector Machines classifier (94 \%). Further, an architecture based on a Feedforward Neural Network yielded even better results (96 \%). Finally, it is also presented computational performance comparison between the models implemented.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Sparse Continuous Distributions and Fenchel-Young Losses
Authors:
André F. T. Martins,
Marcos Treviso,
António Farinhas,
Pedro M. Q. Aguiar,
Mário A. T. Figueiredo,
Mathieu Blondel,
Vlad Niculae
Abstract:
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused…
▽ More
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fusedmax), has led to distributions with varying support.
This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $Ω$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $Ω$ is a Tsallis negentropy with parameter $α$, we obtain ``deformed exponential families,'' which include $α$-entmax and sparsemax ($α=2$) as particular cases. For quadratic energy functions, the resulting densities are $β$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $Ω$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $α\in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.
△ Less
Submitted 4 August, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Distributed Banach-Picard Iteration: Application to Distributed EM and Distributed PCA
Authors:
Francisco L. Andrade,
Mário A. T. Figueiredo,
João Xavier
Abstract:
In recent work, we proposed a distributed Banach-Picard iteration (DBPI) that allows a set of agents, linked by a communication network, to find a fixed point of a locally contractive (LC) map that is the average of individual maps held by said agents. In this work, we build upon the DBPI and its local linear convergence (LLC) guarantees to make several contributions. We show that Sanger's algorit…
▽ More
In recent work, we proposed a distributed Banach-Picard iteration (DBPI) that allows a set of agents, linked by a communication network, to find a fixed point of a locally contractive (LC) map that is the average of individual maps held by said agents. In this work, we build upon the DBPI and its local linear convergence (LLC) guarantees to make several contributions. We show that Sanger's algorithm for principal component analysis (PCA) corresponds to the iteration of an LC map that can be written as the average of local maps, each map known to each agent holding a subset of the data. Similarly, we show that a variant of the expectation-maximization (EM) algorithm for parameter estimation from noisy and faulty measurements in a sensor network can be written as the iteration of an LC map that is the average of local maps, each available at just one node. Consequently, via the DBPI, we derive two distributed algorithms - distributed EM and distributed PCA - whose LLC guarantees follow from those that we proved for the DBPI. The verification of the LC condition for EM is challenging, as the underlying operator depends on random samples, thus the LC condition is of probabilistic nature.
△ Less
Submitted 26 January, 2022; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Revising Johnson's table for the 21st century
Authors:
Celina M. H. de Figueiredo,
Alexsander A. de Melo,
Diana Sasaki,
Ana Silva
Abstract:
What does it mean today to study a problem from a computational point of view? We focus on parameterized complexity and on Column 16 "Graph Restrictions and Their Effect" of D. S. Johnson's Ongoing guide, where several puzzles were proposed in a summary table with 30 graph classes as rows and 11 problems as columns. Several of the 330 entries remain unclassified into Polynomial or NP-complete afte…
▽ More
What does it mean today to study a problem from a computational point of view? We focus on parameterized complexity and on Column 16 "Graph Restrictions and Their Effect" of D. S. Johnson's Ongoing guide, where several puzzles were proposed in a summary table with 30 graph classes as rows and 11 problems as columns. Several of the 330 entries remain unclassified into Polynomial or NP-complete after 35 years. We provide a full dichotomy for the Steiner Tree column by proving that the problem is NP-complete when restricted to Undirected Path graphs. We revise Johnson's summary table according to the granularity provided by the parameterized complexity for NP-complete problems.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Distributed Banach-Picard Iteration for Locally Contractive Maps
Authors:
Francisco L. Andrade,
Mário A. T. Figueiredo,
João Xavier
Abstract:
The Banach-Picard iteration is widely used to find fixed points of locally contractive (LC) maps. This paper extends the Banach-Picard iteration to distributed settings; specifically, we assume the map of which the fixed point is sought to be the average of individual (not necessarily LC) maps held by a set of agents linked by a communication network. An additional difficulty is that the LC map is…
▽ More
The Banach-Picard iteration is widely used to find fixed points of locally contractive (LC) maps. This paper extends the Banach-Picard iteration to distributed settings; specifically, we assume the map of which the fixed point is sought to be the average of individual (not necessarily LC) maps held by a set of agents linked by a communication network. An additional difficulty is that the LC map is not assumed to come from an underlying optimization problem, which prevents exploiting strong global properties such as convexity or Lipschitzianity. Yet, we propose a distributed algorithm and prove its convergence, in fact showing that it maintains the linear rate of the standard Banach-Picard iteration for the average LC map. As another contribution, our proof imports tools from perturbation theory of linear operators, which, to the best of our knowledge, had not been used before in the theory of distributed computation.
△ Less
Submitted 28 December, 2021; v1 submitted 31 March, 2021;
originally announced April 2021.
-
Maximum cut on interval graphs of interval count four is NP-complete
Authors:
Celina M. H. de Figueiredo,
Alexsander A. de Melo,
Fabiano S. Oliveira,
Ana Silva
Abstract:
The computational complexity of the MaxCut problem restricted to interval graphs has been open since the 80's, being one of the problems proposed by Johnson on his Ongoing Guide to NP-completeness, and has been settled as NP-complete only recently by Adhikary, Bose, Mukherjee and Roy. On the other hand, many flawed proofs of polynomiality for MaxCut on the more restrictive class of unit/proper int…
▽ More
The computational complexity of the MaxCut problem restricted to interval graphs has been open since the 80's, being one of the problems proposed by Johnson on his Ongoing Guide to NP-completeness, and has been settled as NP-complete only recently by Adhikary, Bose, Mukherjee and Roy. On the other hand, many flawed proofs of polynomiality for MaxCut on the more restrictive class of unit/proper interval graphs (or graphs with interval count 1) have been presented along the years, and the classification of the problem is still unknown. In this paper, we present the first NP-completeness proof for MaxCut when restricted to interval graphs with bounded interval count, namely graphs with interval count 4.
△ Less
Submitted 29 November, 2022; v1 submitted 17 December, 2020;
originally announced December 2020.
-
TimeSHAP: Explaining Recurrent Models through Sequence Perturbations
Authors:
João Bento,
Pedro Saleiro,
André F. Cruz,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may b…
▽ More
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients.
△ Less
Submitted 26 June, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Control with adaptive Q-learning
Authors:
João Pedro Araújo,
Mário A. T. Figueiredo,
Miguel Ayala Botto
Abstract:
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the map** from states to actions. The…
▽ More
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the map** from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the map** from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted to RL problems where the action space is finite, as is the case with the Cartpole problem. SPAQL-TS solves the OpenAI Gym Cartpole problem, while also displaying a higher sample efficiency than trust region policy optimization (TRPO), a standard RL algorithm for solving control tasks. Moreover, the policies learned by SPAQL are interpretable, while TRPO policies are typically encoded as neural networks, and therefore hard to interpret. Yielding interpretable policies while being sample-efficient are the major advantages of SPAQL.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Variational Mixture of Normalizing Flows
Authors:
Guilherme G. P. Freitas Pires,
Mário A. T. Figueiredo
Abstract:
In the past few years, deep generative models, such as generative adversarial networks \autocite{GAN}, variational autoencoders \autocite{vaepaper}, and their variants, have seen wide adoption for the task of modelling complex data distributions. In spite of the outstanding sample quality achieved by those early methods, they model the target distributions \emph{implicitly}, in the sense that the…
▽ More
In the past few years, deep generative models, such as generative adversarial networks \autocite{GAN}, variational autoencoders \autocite{vaepaper}, and their variants, have seen wide adoption for the task of modelling complex data distributions. In spite of the outstanding sample quality achieved by those early methods, they model the target distributions \emph{implicitly}, in the sense that the probability density functions induced by them are not explicitly accessible. This fact renders those methods unfit for tasks that require, for example, scoring new instances of data with the learned distributions. Normalizing flows have overcome this limitation by leveraging the change-of-variables formula for probability density functions, and by using transformations designed to have tractable and cheaply computable Jacobians. Although flexible, this framework lacked (until recently \autocites{semisuplearning_nflows, RAD}) a way to introduce discrete structure (such as the one found in mixtures) in the models it allows to construct, in an unsupervised scenario. The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model. This procedure is based on variational inference, and uses a variational posterior parameterized by a neural network. As will become clear, this model naturally lends itself to (multimodal) density estimation, semi-supervised learning, and clustering. The proposed model is illustrated on two synthetic datasets, as well as on a real-world dataset.
Keywords: Deep generative models, normalizing flows, variational inference, probabilistic modelling, mixture models.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Single-partition adaptive Q-learning
Authors:
João Pedro Araújo,
Mário Figueiredo,
Miguel Ayala Botto
Abstract:
This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the map** from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative…
▽ More
This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the map** from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative reward. The trade-off between exploration and exploitation is handled by using a mixture of upper confidence bounds (UCB) and Boltzmann exploration during training, with a temperature parameter that is automatically tuned as training progresses. The algorithm is an improvement over adaptive Q-learning (AQL). It converges faster to the optimal solution, while also using fewer arms. Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike AQL. Based on this empirical evidence, we claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Equilibrium Propagation for Complete Directed Neural Networks
Authors:
Matilde Tristany Farinha,
Sérgio Pequito,
Pedro A. Santos,
Mário A. T. Figueiredo
Abstract:
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibr…
▽ More
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
△ Less
Submitted 17 June, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Sparse and Continuous Attention Mechanisms
Authors:
André F. T. Martins,
António Farinhas,
Marcos Treviso,
Vlad Niculae,
Pedro M. Q. Aguiar,
Mário A. T. Figueiredo
Abstract:
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and a…
▽ More
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.
△ Less
Submitted 29 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
A multicenter study on radiomic features from T$_2$-weighted images of a customized MR pelvic phantom setting the basis for robust radiomic models in clinics
Authors:
Linda Bianchini,
Joao Santinha,
Nuno Loução,
Mario Figueiredo,
Francesca Botta,
Daniela Origgi,
Marta Cremonesi,
Enrico Cassano,
Nikolaos Papanikolaou,
Alessandro Lascialfari
Abstract:
In this study we investigated the repeatability and reproducibility of radiomic features extracted from MRI images and provide a workflow to identify robust features. 2D and 3D T$_2$-weighted images of a pelvic phantom were acquired on three scanners of two manufacturers and two magnetic field strengths. The repeatability and reproducibility of the radiomic features were assessed respectively by i…
▽ More
In this study we investigated the repeatability and reproducibility of radiomic features extracted from MRI images and provide a workflow to identify robust features. 2D and 3D T$_2$-weighted images of a pelvic phantom were acquired on three scanners of two manufacturers and two magnetic field strengths. The repeatability and reproducibility of the radiomic features were assessed respectively by intraclass correlation coefficient (ICC) and concordance correlation coefficient (CCC), considering repeated acquisitions with or without phantom repositioning, and with different scanner/acquisition type, and acquisition parameters. The features showing ICC/CCC > 0.9 were selected, and their dependence on shape information (Spearman's $ρ$> 0.8) was analyzed. They were classified for their ability to distinguish textures, after shuffling voxel intensities. From 944 2D features, 79.9% to 96.4% showed excellent repeatability in fixed position across all scanners. Much lower range (11.2% to 85.4%) was obtained after phantom repositioning. 3D extraction did not improve repeatability performance. Excellent reproducibility between scanners was observed in 4.6% to 15.6% of the features, at fixed imaging parameters. 82.4% to 94.9% of features showed excellent agreement when extracted from images acquired with TEs 5 ms apart (values decreased when increasing TE intervals) and 90.7% of the features exhibited excellent reproducibility for changes in TR. 2.0% of non-shape features were identified as providing only shape information. This study demonstrates that radiomic features are affected by specific MRI protocols. The use of our radiomic pelvic phantom allowed to identify unreliable features for radiomic analysis on T$_2$-weighted images. This paper proposes a general workflow to identify repeatable, reproducible, and informative radiomic features, fundamental to ensure robustness of clinical studies.
△ Less
Submitted 18 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Gravitational Wave Detection and Information Extraction via Neural Networks
Authors:
Gerson R. Santos,
Marcela P. Figueiredo,
Antonio de Pádua Santos,
Pavlos Protopapas,
Tiago A. E. Ferreira
Abstract:
Laser Interferometer Gravitational-Wave Observatory (LIGO) was the first laboratory to measure the gravitational waves. It was needed an exceptional experimental design to measure distance changes much less than a radius of a proton. In the same way, the data analyses to confirm and extract information is a tremendously hard task. Here, it is shown a computational procedure base on artificial neur…
▽ More
Laser Interferometer Gravitational-Wave Observatory (LIGO) was the first laboratory to measure the gravitational waves. It was needed an exceptional experimental design to measure distance changes much less than a radius of a proton. In the same way, the data analyses to confirm and extract information is a tremendously hard task. Here, it is shown a computational procedure base on artificial neural networks to detect a gravitation wave event and extract the knowledge of its ring-down time from the LIGO data. With this proposal, it is possible to make a probabilistic thermometer for gravitational wave detection and obtain physical information about the astronomical body system that created the phenomenon. Here, the ring-down time is determined with a direct data measure, without the need to use numerical relativity techniques and high computational power.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints
Authors:
Marek Śmieja,
Łukasz Struski,
Mário A. T. Figueiredo
Abstract:
In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully…
▽ More
In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Conditional Random Fields as Recurrent Neural Networks for 3D Medical Imaging Segmentation
Authors:
Miguel Monteiro,
Mário A. T. Figueiredo,
Arlindo L. Oliveira
Abstract:
The Conditional Random Field as a Recurrent Neural Network layer is a recently proposed algorithm meant to be placed on top of an existing Fully-Convolutional Neural Network to improve the quality of semantic segmentation. In this paper, we test whether this algorithm, which was shown to improve semantic segmentation for 2D RGB images, is able to improve segmentation quality for 3D multi-modal med…
▽ More
The Conditional Random Field as a Recurrent Neural Network layer is a recently proposed algorithm meant to be placed on top of an existing Fully-Convolutional Neural Network to improve the quality of semantic segmentation. In this paper, we test whether this algorithm, which was shown to improve semantic segmentation for 2D RGB images, is able to improve segmentation quality for 3D multi-modal medical images. We developed an implementation of the algorithm which works for any number of spatial dimensions, input/output image channels, and reference image channels. As far as we know this is the first publicly available implementation of this sort. We tested the algorithm with two distinct 3D medical imaging datasets, we concluded that the performance differences observed were not statistically significant. Finally, in the discussion section of the paper, we go into the reasons as to why this technique transfers poorly from natural images to medical images.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Image Restoration Using Conditional Random Fields and Scale Mixtures of Gaussians
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical par…
▽ More
This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical parameters for similar patches, our approach can be classified as a model-based non-local method. For the prior term in the potential function of the CRF model, multivariate Gaussians and multivariate scale-mixture of Gaussians are considered, with the latter being a novel prior for image patches. Our results show that the proposed approach outperforms methods based on Gaussian mixture models for image denoising and state-of-the-art methods for image interpolation/inpainting.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
External Patch-Based Image Restoration Using Importance Sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalizati…
▽ More
This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalization of the external non-local means (NLM), uses self-normalized importance sampling to efficiently approximate the MMSE estimates. The use of self-normalized importance sampling endows the proposed method with great flexibility, namely regarding the statistical properties of the measurement noise. The effectiveness of the proposed method is shown in a series of experiments using both generic large-scale and class-specific external datasets.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Impulsive Noise Robust Sparse Recovery via Continuous Mixed Norm
Authors:
Amirhossein Javaheri,
Hadi Zayyani,
Mario A. T. Figueiredo,
Farrokh Marvasti
Abstract:
This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $SαS$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity o…
▽ More
This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $SαS$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity on the residual error. In this paper, we exploit a Continuous Mixed Norm (CMN) for robust sparse recovery instead of $\ell_p$-norm. We show that in blind conditions, i.e., in case where the parameters of noise distribution are unknown, incorporating CMN can lead to near optimal recovery. We apply Alternating Direction Method of Multipliers (ADMM) for solving the problem induced by utilizing CMN for robust sparse recovery. In this approach, CMN is replaced with a surrogate function and Majorization-Minimization technique is incorporated to solve the problem. Simulation results confirm the efficiency of the proposed method compared to some recent algorithms in the literature for impulsive noise robust sparse recovery.
△ Less
Submitted 12 April, 2018;
originally announced April 2018.
-
Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework
Authors:
Milad Niknejad,
Mario A. T. Figueiredo
Abstract:
In this paper, we address the problem of denoising images degraded by Poisson noise. We propose a new patch-based approach based on best linear prediction to estimate the underlying clean image. A simplified prediction formula is derived for Poisson observations, which requires the covariance matrix of the underlying clean patch. We use the assumption that similar patches in a neighborhood share t…
▽ More
In this paper, we address the problem of denoising images degraded by Poisson noise. We propose a new patch-based approach based on best linear prediction to estimate the underlying clean image. A simplified prediction formula is derived for Poisson observations, which requires the covariance matrix of the underlying clean patch. We use the assumption that similar patches in a neighborhood share the same covariance matrix, and we use off-the-shelf Poisson denoising methods in order to obtain an initial estimate of the covariance matrices. Our method can be seen as a post-processing step for Poisson denoising methods and the results show that it improves upon several Poisson denoising methods by relevant margins.
△ Less
Submitted 1 March, 2018;
originally announced March 2018.
-
Scene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility a…
▽ More
The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility and excellent performance, convergence of the resulting algorithm may be hard to analyze, as most state-of-the-art denoisers lack an explicit underlying objective function. In this paper, we propose a PnP approach where a scene-adapted prior (i.e., where the denoiser is targeted to the specific scene being imaged) is plugged into ADMM (alternating direction method of multipliers), and prove convergence of the resulting algorithm. Finally, we apply the proposed framework in two different imaging inverse problems: hyperspectral sharpening/fusion and image deblurring from blurred/noisy image pairs.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Blind image deblurring using class-adapted image priors
Authors:
Marina Ljubenović,
Mário A. T. Figueiredo
Abstract:
Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.…
▽ More
Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.g., text, face, fingerprints), and exploiting this knowledge allows obtaining more accurate priors. In this work, we propose a method where a Gaussian mixture model (GMM) is used to learn a class-adapted prior, by training on a dataset of clean images of that class. Experiments show the competitiveness of the proposed method in terms of restoration quality when dealing with images containing text, faces, or fingerprints. Additionally, experiments show that the proposed method is able to handle text images at high noise levels, outperforming state-of-the-art methods specifically designed for BID of text images.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Class-specific image denoising using importance sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
In this paper, we propose a new image denoising method, tailored to specific classes of images, assuming that a dataset of clean images of the same class is available. Similarly to the non-local means (NLM) algorithm, the proposed method computes a weighted average of non-local patches, which we interpret under the importance sampling framework. This viewpoint introduces flexibility regarding the…
▽ More
In this paper, we propose a new image denoising method, tailored to specific classes of images, assuming that a dataset of clean images of the same class is available. Similarly to the non-local means (NLM) algorithm, the proposed method computes a weighted average of non-local patches, which we interpret under the importance sampling framework. This viewpoint introduces flexibility regarding the adopted priors, the noise statistics, and the computation of Bayesian estimates. The importance sampling viewpoint is exploited to approximate the minimum mean squared error (MMSE) patch estimates, using the true underlying prior on image patches. The estimates thus obtained converge to the true MMSE estimates, as the number of samples approaches infinity. Experimental results provide evidence that the proposed denoiser outperforms the state-of-the-art in the specific classes of face and text images.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
Adaptive Consensus ADMM for Distributed Optimization
Authors:
Zheng Xu,
Gavin Taylor,
Hao Li,
Mario Figueiredo,
Xiaoming Yuan,
Tom Goldstein
Abstract:
The alternating direction method of multipliers (ADMM) is commonly used for distributed model fitting problems, but its performance and reliability depend strongly on user-defined penalty parameters. We study distributed ADMM methods that boost performance by using different fine-tuned algorithm parameters on each worker node. We present a O(1/k) convergence rate for adaptive ADMM methods with nod…
▽ More
The alternating direction method of multipliers (ADMM) is commonly used for distributed model fitting problems, but its performance and reliability depend strongly on user-defined penalty parameters. We study distributed ADMM methods that boost performance by using different fine-tuned algorithm parameters on each worker node. We present a O(1/k) convergence rate for adaptive ADMM methods with node-specific parameters, and propose adaptive consensus ADMM (ACADMM), which automatically tunes parameters without user oversight.
△ Less
Submitted 20 June, 2017; v1 submitted 9 June, 2017;
originally announced June 2017.
-
Class-specific Poisson denoising by patch-based importance sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
In this paper, we address the problem of recovering images degraded by Poisson noise, where the image is known to belong to a specific class. In the proposed method, a dataset of clean patches from images of the class of interest is clustered using multivariate Gaussian distributions. In order to recover the noisy image, each noisy patch is assigned to one of these distributions, and the correspon…
▽ More
In this paper, we address the problem of recovering images degraded by Poisson noise, where the image is known to belong to a specific class. In the proposed method, a dataset of clean patches from images of the class of interest is clustered using multivariate Gaussian distributions. In order to recover the noisy image, each noisy patch is assigned to one of these distributions, and the corresponding minimum mean squared error (MMSE) estimate is obtained. We propose to use a self-normalized importance sampling approach, which is a method of the Monte-Carlo family, for the both determining the most likely distribution and approximating the MMSE estimate of the clean patch. Experimental results shows that our proposed method outperforms other methods for Poisson denoising at a low SNR regime.
△ Less
Submitted 9 June, 2017;
originally announced June 2017.
-
Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation
Authors:
Zheng Xu,
Mario A. T. Figueiredo,
Xiaoming Yuan,
Christoph Studer,
Tom Goldstein
Abstract:
Many modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly…
▽ More
Many modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly on algorithm parameters that must be chosen by an expert user. We propose an adaptive method that automatically tunes the key algorithm parameters to achieve optimal performance without user oversight. Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A detailed convergence analysis of ARADMM is provided, and numerical results on several applications demonstrate fast practical convergence.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.
-
Synthesis versus analysis in patch-based image priors
Authors:
Mario A. T. Figueiredo
Abstract:
In global models/priors (for example, using wavelet frames), there is a well known analysis vs synthesis dichotomy in the way signal/image priors are formulated. In patch-based image models/priors, this dichotomy is also present in the choice of how each patch is modeled. This paper shows that there is another analysis vs synthesis dichotomy, in terms of how the whole image is related to the patch…
▽ More
In global models/priors (for example, using wavelet frames), there is a well known analysis vs synthesis dichotomy in the way signal/image priors are formulated. In patch-based image models/priors, this dichotomy is also present in the choice of how each patch is modeled. This paper shows that there is another analysis vs synthesis dichotomy, in terms of how the whole image is related to the patches, and that all existing patch-based formulations that provide a global image prior belong to the analysis category. We then propose a synthesis formulation, where the image is explicitly modeled as being synthesized by additively combining a collection of independent patches. We formally establish that these analysis and synthesis formulations are not equivalent in general and that both formulations are compatible with analysis and synthesis formulations at the patch level. Finally, we present an instance of the alternating direction method of multipliers (ADMM) that can be used to perform image denoising under the proposed synthesis formulation, showing its computational feasibility. Rather than showing the superiority of the synthesis or analysis formulations, the contributions of this paper is to establish the existence of both alternatives, thus closing the corresponding gap in the field of patch-based image processing.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Scene-adapted plug-and-play algorithm with convergence guarantees
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the con…
▽ More
Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.
△ Less
Submitted 8 November, 2017; v1 submitted 8 February, 2017;
originally announced February 2017.
-
An Empirical Study of ADMM for Nonconvex Problems
Authors:
Zheng Xu,
Soham De,
Mario Figueiredo,
Christoph Studer,
Tom Goldstein
Abstract:
The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems. We provide an empirical study of the practical performance of ADMM on several nonconvex applications, including l0 regularized linear regression, l0 regularized image denoising, phase retrieval, and eigenvector computation. Our experiments suggest that ADMM…
▽ More
The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems. We provide an empirical study of the practical performance of ADMM on several nonconvex applications, including l0 regularized linear regression, l0 regularized image denoising, phase retrieval, and eigenvector computation. Our experiments suggest that ADMM performs well on a broad class of non-convex problems. Moreover, recently proposed adaptive ADMM methods, which automatically tune penalty parameters as the method runs, can improve algorithm efficiency and solution quality compared to ADMM with a non-tuned penalty.
△ Less
Submitted 10 December, 2016;
originally announced December 2016.
-
Restoring STM images via Sparse Coding: noise and artifact removal
Authors:
João P. Oliveira,
Ana Bragança,
José Bioucas-Dias,
Mário Figueiredo,
Luís Alcácer,
Jorge Morgado,
Quirina Ferreira
Abstract:
In this article, we present a denoising algorithm to improve the interpretation and quality of scanning tunneling microscopy (STM) images. Given the high level of self-similarity of STM images, we propose a denoising algorithm by reformulating the true estimation problem as a sparse regression, often termed sparse coding. We introduce modifications to the algorithm to cope with the existence of ar…
▽ More
In this article, we present a denoising algorithm to improve the interpretation and quality of scanning tunneling microscopy (STM) images. Given the high level of self-similarity of STM images, we propose a denoising algorithm by reformulating the true estimation problem as a sparse regression, often termed sparse coding. We introduce modifications to the algorithm to cope with the existence of artifacts, mainly dropouts, which appear in a structured way as consecutive line segments on the scanning direction. The resulting algorithm treats the artifacts as missing data, and the estimated values outperform those algorithms that substitute the outliers by a local filtering. We provide code implementations for both Matlab and Gwyddion.
△ Less
Submitted 11 October, 2016;
originally announced October 2016.