-
Conformal Prediction for Natural Language Processing: A Survey
Authors:
Margarida M. Campos,
António Farinhas,
Chrysoula Zerva,
Mário A. T. Figueiredo,
André F. T. Martins
Abstract:
The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistica…
▽ More
The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistical guarantees. Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems that stem from the absence of uncertainty quantification. This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP, pointing to directions for future research and open challenges.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
A Measure of Synergy based on Union Information
Authors:
André F. C. Gomes,
Mário A. T. Figueiredo
Abstract:
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of (two or more) random variables (the sources) has about another variable (the target) into three types of information: unique, redundant, and synergistic. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions…
▽ More
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of (two or more) random variables (the sources) has about another variable (the target) into three types of information: unique, redundant, and synergistic. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. One often overlooked way to achieve this decomposition is using a so-called measure of union information - which quantifies the information that is present in at least one of the sources - from which a synergy measure stems. In this paper, we introduce a new measure of union information based on adopting a communication channel perspective, compare it with existing measures, and study some of its properties. We also include a comprehensive critical review of characterizations of union information and synergy measures that have been proposed in the literature.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Javier Liébana,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring c…
▽ More
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset and iii) not dealing with human work capacity constraints. To address these issues, we propose the deferral under cost and capacity constraints framework (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average 8.4% reduction in the misclassification cost.
△ Less
Submitted 21 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
DiConStruct: Causal Concept-based Explanations through Black-Box Distillation
Authors:
Ricardo Moreira,
Jacopo Bono,
Mário Cardoso,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predi…
▽ More
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.
△ Less
Submitted 26 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
FiFAR: A Fraud Detection Dataset for Learning to Defer
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud det…
▽ More
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also provide a realistic definition of human work capacity constraints, an aspect of L2D systems that is often overlooked, allowing for extensive testing of assignment systems under real-world conditions. We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions, and benchmark these baselines under an array of 300 distinct testing scenarios. We believe that this dataset will serve as a pivotal instrument in facilitating a systematic, rigorous, reproducible, and transparent evaluation and comparison of L2D methods, thereby fostering the development of more synergistic human-AI collaboration in decision-making systems. The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Orders between channels and implications for partial information decomposition
Authors:
André F. C. Gomes,
Máario A. T. Figueiredo
Abstract:
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. Recently, Kolc…
▽ More
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information, based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, thus contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study some of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky's axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first propose the PID.
△ Less
Submitted 14 July, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Fairness-Aware Data Valuation for Supervised Learning
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework tha…
▽ More
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model
Authors:
Mário A. T. Figueiredo,
Catarina A. Oliveira
Abstract:
Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional…
▽ More
Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
ProBoost: a Boosting Method for Probabilistic Classifiers
Authors:
Fábio Mendonça,
Sheikh Shanawaz Mostafa,
Fernando Morgado-Dias,
Antonio G. Ravelo-García,
Mário A. T. Figueiredo
Abstract:
ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end…
▽ More
ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end, the weak learners' outputs are combined into a weighted ensemble of classifiers. Three methods are proposed to manipulate the training set: undersampling, oversampling, and weighting the training samples according to the uncertainty estimated by the weak learners. Furthermore, two approaches are studied regarding the ensemble combination. The weak learner herein considered is a standard convolutional neural network, and the probabilistic models underlying the uncertainty estimation use either variational inference or Monte Carlo dropout. The experimental evaluation carried out on MNIST benchmark datasets shows that ProBoost yields a significant performance improvement. The results are further highlighted by assessing the relative achievable improvement, a metric proposed in this work, which shows that a model with only four weak learners leads to an improvement exceeding 12% in this metric (for either accuracy, sensitivity, or specificity), in comparison to the model learned without ProBoost.
△ Less
Submitted 4 September, 2022;
originally announced September 2022.
-
Aveiro Tech City Living Lab: A Communication, Sensing and Computing Platform for City Environments
Authors:
Pedro Rito,
Ana Almeida,
Andreia Figueiredo,
Christian Gomes,
Pedro Teixeira,
Rodrigo Rosmaninho,
Rui Lopes,
Duarte Dias,
Gonçalo Vítor,
Gonçalo Perna,
Miguel Silva,
Carlos Senna,
Duarte Raposo,
Miguel Luís,
Susana Sargento,
Arnaldo Oliveira,
Nuno Borges de Carvalho
Abstract:
This article presents the deployment and experimentation architecture of the Aveiro Tech City Living Lab (ATCLL) in Aveiro, Portugal. This platform comprises a large number of Internet-of-Things devices with communication, sensing and computing capabilities. The communication infrastructure, built on fiber and Millimeter-wave (mmWave) links, integrates a communication network with radio terminals…
▽ More
This article presents the deployment and experimentation architecture of the Aveiro Tech City Living Lab (ATCLL) in Aveiro, Portugal. This platform comprises a large number of Internet-of-Things devices with communication, sensing and computing capabilities. The communication infrastructure, built on fiber and Millimeter-wave (mmWave) links, integrates a communication network with radio terminals (WiFi, ITS-G5, C-V2X, 5G and LoRa(WAN)), multiprotocol, spread throughout 44 connected points of access in the city. Additionally, public transportation has also been equipped with communication and sensing units. All these points combine and interconnect a set of sensors, such as mobility (Radars, Lidars, video cameras) and environmental sensors. Combining edge computing and cloud management to deploy the services and manage the platform, and a data platform to gather and process the data, the living lab supports a wide range of services and applications: IoT, intelligent transportation systems and assisted driving, environmental monitoring, emergency and safety, among others. This article describes the architecture, implementation and deployment to make the overall platform to work and integrate researchers and citizens. Moreover, it showcases some examples of the performance metrics achieved in the city infrastructure, the data that can be collected, visualized and used to build services and applications to the cities, and, finally, different use cases in the mobility and safety scenarios.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions
Authors:
José Pombal,
André F. Cruz,
João Bravo,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial…
▽ More
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Human-AI Collaboration in Decision-Making: Beyond Learning to Defer
Authors:
Diogo Leitão,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements…
▽ More
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.
△ Less
Submitted 13 July, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases o…
▽ More
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in sha** the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Differentiable Causal Discovery Under Latent Interventions
Authors:
Gonçalo R. A. Faria,
André F. T. Martins,
Mário A. T. Figueiredo
Abstract:
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and…
▽ More
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and one observation distribution, but where we do not know which distribution originated each sample and how the intervention affected the system, \textit{i.e.}, interventions are entirely latent. We propose a method based on neural networks and variational inference that addresses this scenario by framing it as learning a shared causal graph among an infinite mixture (under a Dirichlet process prior) of intervention structural causal models. Experiments with synthetic and real data show that our approach and its semi-supervised variant are able to discover causal relations in this challenging scenario.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Sparse Continuous Distributions and Fenchel-Young Losses
Authors:
André F. T. Martins,
Marcos Treviso,
António Farinhas,
Pedro M. Q. Aguiar,
Mário A. T. Figueiredo,
Mathieu Blondel,
Vlad Niculae
Abstract:
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused…
▽ More
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fusedmax), has led to distributions with varying support.
This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $Ω$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $Ω$ is a Tsallis negentropy with parameter $α$, we obtain ``deformed exponential families,'' which include $α$-entmax and sparsemax ($α=2$) as particular cases. For quadratic energy functions, the resulting densities are $β$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $Ω$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $α\in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.
△ Less
Submitted 4 August, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Distributed Banach-Picard Iteration: Application to Distributed EM and Distributed PCA
Authors:
Francisco L. Andrade,
Mário A. T. Figueiredo,
João Xavier
Abstract:
In recent work, we proposed a distributed Banach-Picard iteration (DBPI) that allows a set of agents, linked by a communication network, to find a fixed point of a locally contractive (LC) map that is the average of individual maps held by said agents. In this work, we build upon the DBPI and its local linear convergence (LLC) guarantees to make several contributions. We show that Sanger's algorit…
▽ More
In recent work, we proposed a distributed Banach-Picard iteration (DBPI) that allows a set of agents, linked by a communication network, to find a fixed point of a locally contractive (LC) map that is the average of individual maps held by said agents. In this work, we build upon the DBPI and its local linear convergence (LLC) guarantees to make several contributions. We show that Sanger's algorithm for principal component analysis (PCA) corresponds to the iteration of an LC map that can be written as the average of local maps, each map known to each agent holding a subset of the data. Similarly, we show that a variant of the expectation-maximization (EM) algorithm for parameter estimation from noisy and faulty measurements in a sensor network can be written as the iteration of an LC map that is the average of local maps, each available at just one node. Consequently, via the DBPI, we derive two distributed algorithms - distributed EM and distributed PCA - whose LLC guarantees follow from those that we proved for the DBPI. The verification of the LC condition for EM is challenging, as the underlying operator depends on random samples, thus the LC condition is of probabilistic nature.
△ Less
Submitted 26 January, 2022; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Distributed Banach-Picard Iteration for Locally Contractive Maps
Authors:
Francisco L. Andrade,
Mário A. T. Figueiredo,
João Xavier
Abstract:
The Banach-Picard iteration is widely used to find fixed points of locally contractive (LC) maps. This paper extends the Banach-Picard iteration to distributed settings; specifically, we assume the map of which the fixed point is sought to be the average of individual (not necessarily LC) maps held by a set of agents linked by a communication network. An additional difficulty is that the LC map is…
▽ More
The Banach-Picard iteration is widely used to find fixed points of locally contractive (LC) maps. This paper extends the Banach-Picard iteration to distributed settings; specifically, we assume the map of which the fixed point is sought to be the average of individual (not necessarily LC) maps held by a set of agents linked by a communication network. An additional difficulty is that the LC map is not assumed to come from an underlying optimization problem, which prevents exploiting strong global properties such as convexity or Lipschitzianity. Yet, we propose a distributed algorithm and prove its convergence, in fact showing that it maintains the linear rate of the standard Banach-Picard iteration for the average LC map. As another contribution, our proof imports tools from perturbation theory of linear operators, which, to the best of our knowledge, had not been used before in the theory of distributed computation.
△ Less
Submitted 28 December, 2021; v1 submitted 31 March, 2021;
originally announced April 2021.
-
TimeSHAP: Explaining Recurrent Models through Sequence Perturbations
Authors:
João Bento,
Pedro Saleiro,
André F. Cruz,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may b…
▽ More
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients.
△ Less
Submitted 26 June, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Control with adaptive Q-learning
Authors:
João Pedro Araújo,
Mário A. T. Figueiredo,
Miguel Ayala Botto
Abstract:
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the map** from states to actions. The…
▽ More
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the map** from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the map** from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted to RL problems where the action space is finite, as is the case with the Cartpole problem. SPAQL-TS solves the OpenAI Gym Cartpole problem, while also displaying a higher sample efficiency than trust region policy optimization (TRPO), a standard RL algorithm for solving control tasks. Moreover, the policies learned by SPAQL are interpretable, while TRPO policies are typically encoded as neural networks, and therefore hard to interpret. Yielding interpretable policies while being sample-efficient are the major advantages of SPAQL.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Variational Mixture of Normalizing Flows
Authors:
Guilherme G. P. Freitas Pires,
Mário A. T. Figueiredo
Abstract:
In the past few years, deep generative models, such as generative adversarial networks \autocite{GAN}, variational autoencoders \autocite{vaepaper}, and their variants, have seen wide adoption for the task of modelling complex data distributions. In spite of the outstanding sample quality achieved by those early methods, they model the target distributions \emph{implicitly}, in the sense that the…
▽ More
In the past few years, deep generative models, such as generative adversarial networks \autocite{GAN}, variational autoencoders \autocite{vaepaper}, and their variants, have seen wide adoption for the task of modelling complex data distributions. In spite of the outstanding sample quality achieved by those early methods, they model the target distributions \emph{implicitly}, in the sense that the probability density functions induced by them are not explicitly accessible. This fact renders those methods unfit for tasks that require, for example, scoring new instances of data with the learned distributions. Normalizing flows have overcome this limitation by leveraging the change-of-variables formula for probability density functions, and by using transformations designed to have tractable and cheaply computable Jacobians. Although flexible, this framework lacked (until recently \autocites{semisuplearning_nflows, RAD}) a way to introduce discrete structure (such as the one found in mixtures) in the models it allows to construct, in an unsupervised scenario. The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model. This procedure is based on variational inference, and uses a variational posterior parameterized by a neural network. As will become clear, this model naturally lends itself to (multimodal) density estimation, semi-supervised learning, and clustering. The proposed model is illustrated on two synthetic datasets, as well as on a real-world dataset.
Keywords: Deep generative models, normalizing flows, variational inference, probabilistic modelling, mixture models.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Equilibrium Propagation for Complete Directed Neural Networks
Authors:
Matilde Tristany Farinha,
Sérgio Pequito,
Pedro A. Santos,
Mário A. T. Figueiredo
Abstract:
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibr…
▽ More
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
△ Less
Submitted 17 June, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Sparse and Continuous Attention Mechanisms
Authors:
André F. T. Martins,
António Farinhas,
Marcos Treviso,
Vlad Niculae,
Pedro M. Q. Aguiar,
Mário A. T. Figueiredo
Abstract:
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and a…
▽ More
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.
△ Less
Submitted 29 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
A Proposed IoT Smart Trap using Computer Vision for Sustainable Pest Control in Coffee Culture
Authors:
Vitor Alexandre Campos Figueiredo,
Samuel Mafra,
Joel Rodrigues
Abstract:
The Internet of Things (IoT) is emerging as a multi-purpose technology with enormous potential for improving the quality of life in several areas. In particular, IoT has been applied in agriculture to make it more sustainable ecologically. For instance, electronic traps have the potential to perform pest control without any pesticide. In this paper, a smart trap with IoT capabilities that uses com…
▽ More
The Internet of Things (IoT) is emerging as a multi-purpose technology with enormous potential for improving the quality of life in several areas. In particular, IoT has been applied in agriculture to make it more sustainable ecologically. For instance, electronic traps have the potential to perform pest control without any pesticide. In this paper, a smart trap with IoT capabilities that uses computer vision to identify the insect of interest is proposed. The solution includes 1) an embedded system with camera, GPS sensor and motor actuators; 2) an IoT middleware as database service provider, and 3) a Web application to present data by a configurable heat map. The demonstration of proposed solution is exposed and the main conclusions are the perception about pest concentration at the plantation and the viability as alternative pest control over traditional control based on pesticides.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints
Authors:
Marek Śmieja,
Łukasz Struski,
Mário A. T. Figueiredo
Abstract:
In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully…
▽ More
In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Conditional Random Fields as Recurrent Neural Networks for 3D Medical Imaging Segmentation
Authors:
Miguel Monteiro,
Mário A. T. Figueiredo,
Arlindo L. Oliveira
Abstract:
The Conditional Random Field as a Recurrent Neural Network layer is a recently proposed algorithm meant to be placed on top of an existing Fully-Convolutional Neural Network to improve the quality of semantic segmentation. In this paper, we test whether this algorithm, which was shown to improve semantic segmentation for 2D RGB images, is able to improve segmentation quality for 3D multi-modal med…
▽ More
The Conditional Random Field as a Recurrent Neural Network layer is a recently proposed algorithm meant to be placed on top of an existing Fully-Convolutional Neural Network to improve the quality of semantic segmentation. In this paper, we test whether this algorithm, which was shown to improve semantic segmentation for 2D RGB images, is able to improve segmentation quality for 3D multi-modal medical images. We developed an implementation of the algorithm which works for any number of spatial dimensions, input/output image channels, and reference image channels. As far as we know this is the first publicly available implementation of this sort. We tested the algorithm with two distinct 3D medical imaging datasets, we concluded that the performance differences observed were not statistically significant. Finally, in the discussion section of the paper, we go into the reasons as to why this technique transfers poorly from natural images to medical images.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Image Restoration Using Conditional Random Fields and Scale Mixtures of Gaussians
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical par…
▽ More
This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical parameters for similar patches, our approach can be classified as a model-based non-local method. For the prior term in the potential function of the CRF model, multivariate Gaussians and multivariate scale-mixture of Gaussians are considered, with the latter being a novel prior for image patches. Our results show that the proposed approach outperforms methods based on Gaussian mixture models for image denoising and state-of-the-art methods for image interpolation/inpainting.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
External Patch-Based Image Restoration Using Importance Sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalizati…
▽ More
This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalization of the external non-local means (NLM), uses self-normalized importance sampling to efficiently approximate the MMSE estimates. The use of self-normalized importance sampling endows the proposed method with great flexibility, namely regarding the statistical properties of the measurement noise. The effectiveness of the proposed method is shown in a series of experiments using both generic large-scale and class-specific external datasets.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Impulsive Noise Robust Sparse Recovery via Continuous Mixed Norm
Authors:
Amirhossein Javaheri,
Hadi Zayyani,
Mario A. T. Figueiredo,
Farrokh Marvasti
Abstract:
This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $SαS$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity o…
▽ More
This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $SαS$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity on the residual error. In this paper, we exploit a Continuous Mixed Norm (CMN) for robust sparse recovery instead of $\ell_p$-norm. We show that in blind conditions, i.e., in case where the parameters of noise distribution are unknown, incorporating CMN can lead to near optimal recovery. We apply Alternating Direction Method of Multipliers (ADMM) for solving the problem induced by utilizing CMN for robust sparse recovery. In this approach, CMN is replaced with a surrogate function and Majorization-Minimization technique is incorporated to solve the problem. Simulation results confirm the efficiency of the proposed method compared to some recent algorithms in the literature for impulsive noise robust sparse recovery.
△ Less
Submitted 12 April, 2018;
originally announced April 2018.
-
Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework
Authors:
Milad Niknejad,
Mario A. T. Figueiredo
Abstract:
In this paper, we address the problem of denoising images degraded by Poisson noise. We propose a new patch-based approach based on best linear prediction to estimate the underlying clean image. A simplified prediction formula is derived for Poisson observations, which requires the covariance matrix of the underlying clean patch. We use the assumption that similar patches in a neighborhood share t…
▽ More
In this paper, we address the problem of denoising images degraded by Poisson noise. We propose a new patch-based approach based on best linear prediction to estimate the underlying clean image. A simplified prediction formula is derived for Poisson observations, which requires the covariance matrix of the underlying clean patch. We use the assumption that similar patches in a neighborhood share the same covariance matrix, and we use off-the-shelf Poisson denoising methods in order to obtain an initial estimate of the covariance matrices. Our method can be seen as a post-processing step for Poisson denoising methods and the results show that it improves upon several Poisson denoising methods by relevant margins.
△ Less
Submitted 1 March, 2018;
originally announced March 2018.
-
Scene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility a…
▽ More
The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility and excellent performance, convergence of the resulting algorithm may be hard to analyze, as most state-of-the-art denoisers lack an explicit underlying objective function. In this paper, we propose a PnP approach where a scene-adapted prior (i.e., where the denoiser is targeted to the specific scene being imaged) is plugged into ADMM (alternating direction method of multipliers), and prove convergence of the resulting algorithm. Finally, we apply the proposed framework in two different imaging inverse problems: hyperspectral sharpening/fusion and image deblurring from blurred/noisy image pairs.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Blind image deblurring using class-adapted image priors
Authors:
Marina Ljubenović,
Mário A. T. Figueiredo
Abstract:
Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.…
▽ More
Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.g., text, face, fingerprints), and exploiting this knowledge allows obtaining more accurate priors. In this work, we propose a method where a Gaussian mixture model (GMM) is used to learn a class-adapted prior, by training on a dataset of clean images of that class. Experiments show the competitiveness of the proposed method in terms of restoration quality when dealing with images containing text, faces, or fingerprints. Additionally, experiments show that the proposed method is able to handle text images at high noise levels, outperforming state-of-the-art methods specifically designed for BID of text images.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Class-specific image denoising using importance sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
In this paper, we propose a new image denoising method, tailored to specific classes of images, assuming that a dataset of clean images of the same class is available. Similarly to the non-local means (NLM) algorithm, the proposed method computes a weighted average of non-local patches, which we interpret under the importance sampling framework. This viewpoint introduces flexibility regarding the…
▽ More
In this paper, we propose a new image denoising method, tailored to specific classes of images, assuming that a dataset of clean images of the same class is available. Similarly to the non-local means (NLM) algorithm, the proposed method computes a weighted average of non-local patches, which we interpret under the importance sampling framework. This viewpoint introduces flexibility regarding the adopted priors, the noise statistics, and the computation of Bayesian estimates. The importance sampling viewpoint is exploited to approximate the minimum mean squared error (MMSE) patch estimates, using the true underlying prior on image patches. The estimates thus obtained converge to the true MMSE estimates, as the number of samples approaches infinity. Experimental results provide evidence that the proposed denoiser outperforms the state-of-the-art in the specific classes of face and text images.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
Class-specific Poisson denoising by patch-based importance sampling
Authors:
Milad Niknejad,
Jose M. Bioucas-Dias,
Mario A. T. Figueiredo
Abstract:
In this paper, we address the problem of recovering images degraded by Poisson noise, where the image is known to belong to a specific class. In the proposed method, a dataset of clean patches from images of the class of interest is clustered using multivariate Gaussian distributions. In order to recover the noisy image, each noisy patch is assigned to one of these distributions, and the correspon…
▽ More
In this paper, we address the problem of recovering images degraded by Poisson noise, where the image is known to belong to a specific class. In the proposed method, a dataset of clean patches from images of the class of interest is clustered using multivariate Gaussian distributions. In order to recover the noisy image, each noisy patch is assigned to one of these distributions, and the corresponding minimum mean squared error (MMSE) estimate is obtained. We propose to use a self-normalized importance sampling approach, which is a method of the Monte-Carlo family, for the both determining the most likely distribution and approximating the MMSE estimate of the clean patch. Experimental results shows that our proposed method outperforms other methods for Poisson denoising at a low SNR regime.
△ Less
Submitted 9 June, 2017;
originally announced June 2017.
-
Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation
Authors:
Zheng Xu,
Mario A. T. Figueiredo,
Xiaoming Yuan,
Christoph Studer,
Tom Goldstein
Abstract:
Many modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly…
▽ More
Many modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly on algorithm parameters that must be chosen by an expert user. We propose an adaptive method that automatically tunes the key algorithm parameters to achieve optimal performance without user oversight. Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A detailed convergence analysis of ARADMM is provided, and numerical results on several applications demonstrate fast practical convergence.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.
-
Synthesis versus analysis in patch-based image priors
Authors:
Mario A. T. Figueiredo
Abstract:
In global models/priors (for example, using wavelet frames), there is a well known analysis vs synthesis dichotomy in the way signal/image priors are formulated. In patch-based image models/priors, this dichotomy is also present in the choice of how each patch is modeled. This paper shows that there is another analysis vs synthesis dichotomy, in terms of how the whole image is related to the patch…
▽ More
In global models/priors (for example, using wavelet frames), there is a well known analysis vs synthesis dichotomy in the way signal/image priors are formulated. In patch-based image models/priors, this dichotomy is also present in the choice of how each patch is modeled. This paper shows that there is another analysis vs synthesis dichotomy, in terms of how the whole image is related to the patches, and that all existing patch-based formulations that provide a global image prior belong to the analysis category. We then propose a synthesis formulation, where the image is explicitly modeled as being synthesized by additively combining a collection of independent patches. We formally establish that these analysis and synthesis formulations are not equivalent in general and that both formulations are compatible with analysis and synthesis formulations at the patch level. Finally, we present an instance of the alternating direction method of multipliers (ADMM) that can be used to perform image denoising under the proposed synthesis formulation, showing its computational feasibility. Rather than showing the superiority of the synthesis or analysis formulations, the contributions of this paper is to establish the existence of both alternatives, thus closing the corresponding gap in the field of patch-based image processing.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Scene-adapted plug-and-play algorithm with convergence guarantees
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the con…
▽ More
Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.
△ Less
Submitted 8 November, 2017; v1 submitted 8 February, 2017;
originally announced February 2017.
-
Adaptive ADMM with Spectral Penalty Parameter Selection
Authors:
Zheng Xu,
Mario A. T. Figueiredo,
Tom Goldstein
Abstract:
The alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems, with differentiable or non-differentiable objective functions. Unfortunately, its performance is highly sensitive to a penalty parameter, which makes ADMM often unreliable and hard to automate for a non-expert user. We tackle this weakness of ADMM by proposing a…
▽ More
The alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems, with differentiable or non-differentiable objective functions. Unfortunately, its performance is highly sensitive to a penalty parameter, which makes ADMM often unreliable and hard to automate for a non-expert user. We tackle this weakness of ADMM by proposing a method to adaptively tune the penalty parameters to achieve fast convergence. The resulting adaptive ADMM (AADMM) algorithm, inspired by the successful Barzilai-Borwein spectral method for gradient descent, yields fast convergence and relative insensitivity to the initial stepsize and problem scaling.
△ Less
Submitted 19 July, 2017; v1 submitted 23 May, 2016;
originally announced May 2016.
-
Image Restoration with Locally Selected Class-Adapted Models
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
State-of-the-art algorithms for imaging inverse problems (namely deblurring and reconstruction) are typically iterative, involving a denoising operation as one of its steps. Using a state-of-the-art denoising method in this context is not trivial, and is the focus of current work. Recently, we have proposed to use a class-adapted denoiser (patch-based using Gaussian mixture models) in a so-called…
▽ More
State-of-the-art algorithms for imaging inverse problems (namely deblurring and reconstruction) are typically iterative, involving a denoising operation as one of its steps. Using a state-of-the-art denoising method in this context is not trivial, and is the focus of current work. Recently, we have proposed to use a class-adapted denoiser (patch-based using Gaussian mixture models) in a so-called plug-and-play scheme, wherein a state-of-the-art denoiser is plugged into an iterative algorithm, leading to results that outperform the best general-purpose algorithms, when applied to an image of a known class (e.g. faces, text, brain MRI). In this paper, we extend that approach to handle situations where the image being processed is from one of a collection of possible classes or, more importantly, contains regions of different classes. More specifically, we propose a method to locally select one of a set of class-adapted Gaussian mixture patch priors, previously estimated from clean images of those classes. Our approach may be seen as simultaneously performing segmentation and restoration, thus contributing to bridging the gap between image restoration/reconstruction and analysis.
△ Less
Submitted 2 August, 2016; v1 submitted 23 May, 2016;
originally announced May 2016.
-
Image Restoration and Reconstruction using Variable Splitting and Class-adapted Image Priors
Authors:
Afonso M. Teodoro,
José M. Bioucas-Dias,
Mário A. T. Figueiredo
Abstract:
This paper proposes using a Gaussian mixture model as a prior, for solving two image inverse problems, namely image deblurring and compressive imaging. We capitalize on the fact that variable splitting algorithms, like ADMM, are able to decouple the handling of the observation operator from that of the regularizer, and plug a state-of-the-art algorithm into the pure denoising step. Furthermore, we…
▽ More
This paper proposes using a Gaussian mixture model as a prior, for solving two image inverse problems, namely image deblurring and compressive imaging. We capitalize on the fact that variable splitting algorithms, like ADMM, are able to decouple the handling of the observation operator from that of the regularizer, and plug a state-of-the-art algorithm into the pure denoising step. Furthermore, we show that, when applied to a specific type of image, a Gaussian mixture model trained from an database of images of the same type is able to outperform current state-of-the-art methods.
△ Less
Submitted 23 May, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.
-
Uplink Performance Evaluation of Massive MU-MIMO Systems
Authors:
Felipe A. P. de Figueiredo,
Joao Paulo Miranda,
Fabricio L. Figueiredo,
Fabbryccio A. C. M. Cardoso
Abstract:
The present paper deals with an OFDM-based uplink within a multi-user MIMO (MU-MIMO) system where a massive MIMO approach is employed. In this context, the linear detectors Minimum Mean-Squared Error (MMSE), Zero Forcing (ZF) and Maximum Ratio Combining (MRC) are considered and assessed. This papers includes Bit Error Rate (BER) results for uncoded QPSK/OFDM transmissions through a flat Rayleigh f…
▽ More
The present paper deals with an OFDM-based uplink within a multi-user MIMO (MU-MIMO) system where a massive MIMO approach is employed. In this context, the linear detectors Minimum Mean-Squared Error (MMSE), Zero Forcing (ZF) and Maximum Ratio Combining (MRC) are considered and assessed. This papers includes Bit Error Rate (BER) results for uncoded QPSK/OFDM transmissions through a flat Rayleigh fading channel under the assumption of perfect power control and channel estimation. BER results are obtained through Monte Carlo simulations. Performance results are discussed in detail and we confirm the achievable "massive MIMO" effects, even for a reduced complexity detection technique, when the number of receive antennas at BS is much larger than the number of transmit antennas.
△ Less
Submitted 7 March, 2015;
originally announced March 2015.
-
The Ordered Weighted $\ell_1$ Norm: Atomic Formulation, Projections, and Algorithms
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
The ordered weighted $\ell_1$ norm (OWL) was recently proposed, with two different motivations: its good statistical properties as a sparsity promoting regularizer; the fact that it generalizes the so-called {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR), which has the ability to cluster/group regression variables that are highly correlated. This paper contains several c…
▽ More
The ordered weighted $\ell_1$ norm (OWL) was recently proposed, with two different motivations: its good statistical properties as a sparsity promoting regularizer; the fact that it generalizes the so-called {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR), which has the ability to cluster/group regression variables that are highly correlated. This paper contains several contributions to the study and application of OWL regularization: the derivation of the atomic formulation of the OWL norm; the derivation of the dual of the OWL norm, based on its atomic formulation; a new and simpler derivation of the proximity operator of the OWL norm; an efficient scheme to compute the Euclidean projection onto an OWL ball; the instantiation of the conditional gradient (CG, also known as Frank-Wolfe) algorithm for linear regression problems under OWL regularization; the instantiation of accelerated projected gradient algorithms for the same class of problems. Finally, a set of experiments give evidence that accelerated projected gradient algorithms are considerably faster than CG, for the class of problems considered.
△ Less
Submitted 10 April, 2015; v1 submitted 15 September, 2014;
originally announced September 2014.
-
Decreasing Weighted Sorted $\ell_1$ Regularization
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We consider a new family of regularizers, termed {\it weighted sorted $\ell_1$ norms} (WSL1), which generalizes the recently introduced {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR) and also contains the $\ell_1$ and $\ell_{\infty}$ norms as particular instances. We focus on a special case of the WSL1, the {\sl decreasing WSL1} (DWSL1), where the elements of the argumen…
▽ More
We consider a new family of regularizers, termed {\it weighted sorted $\ell_1$ norms} (WSL1), which generalizes the recently introduced {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR) and also contains the $\ell_1$ and $\ell_{\infty}$ norms as particular instances. We focus on a special case of the WSL1, the {\sl decreasing WSL1} (DWSL1), where the elements of the argument vector are sorted in non-increasing order and the weights are also non-increasing. In this paper, after showing that the DWSL1 is indeed a norm, we derive two key tools for its use as a regularizer: the dual norm and the Moreau proximity operator.
△ Less
Submitted 11 April, 2014;
originally announced April 2014.
-
Group-sparse Matrix Recovery
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We apply the OSCAR (octagonal selection and clustering algorithms for regression) in recovering group-sparse matrices (two-dimensional---2D---arrays) from compressive measurements. We propose a 2D version of OSCAR (2OSCAR) consisting of the $\ell_1$ norm and the pair-wise $\ell_{\infty}$ norm, which is convex but non-differentiable. We show that the proximity operator of 2OSCAR can be computed bas…
▽ More
We apply the OSCAR (octagonal selection and clustering algorithms for regression) in recovering group-sparse matrices (two-dimensional---2D---arrays) from compressive measurements. We propose a 2D version of OSCAR (2OSCAR) consisting of the $\ell_1$ norm and the pair-wise $\ell_{\infty}$ norm, which is convex but non-differentiable. We show that the proximity operator of 2OSCAR can be computed based on that of OSCAR. The 2OSCAR problem can thus be efficiently solved by state-of-the-art proximal splitting algorithms. Experiments on group-sparse 2D array recovery show that 2OSCAR regularization solved by the SpaRSA algorithm is the fastest choice, while the PADMM algorithm (with debiasing) yields the most accurate results.
△ Less
Submitted 20 February, 2014;
originally announced February 2014.
-
Robust Binary Fused Compressive Sensing using Adaptive Outlier Pursuit
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We propose a new method, {\it robust binary fused compressive sensing} (RoBFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed method is a modification of our previous {\it binary fused compressive sensing} (BFCS) algorithm, which is based on the {\it binary iterative hard thresholding} (BIHT) algorithm. As in BIHT, the data term of the objective fun…
▽ More
We propose a new method, {\it robust binary fused compressive sensing} (RoBFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed method is a modification of our previous {\it binary fused compressive sensing} (BFCS) algorithm, which is based on the {\it binary iterative hard thresholding} (BIHT) algorithm. As in BIHT, the data term of the objective function is a one-sided $\ell_1$ (or $\ell_2$) norm. Experiments show that the proposed algorithm is able to take advantage of the piece-wise smoothness of the original signal and detect sign flips and correct them, achieving more accurate recovery than BFCS and BIHT.
△ Less
Submitted 20 March, 2014; v1 submitted 20 February, 2014;
originally announced February 2014.
-
Binary Fused Compressive Sensing: 1-Bit Compressive Sensing meets Group Sparsity
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We propose a new method, {\it binary fused compressive sensing} (BFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed algorithm is a modification of the previous {\it binary iterative hard thresholding} (BIHT) algorithm, where, in addition to the sparsity constraint, the total-variation of the recovered signal is upper constrained. As in BIHT, the da…
▽ More
We propose a new method, {\it binary fused compressive sensing} (BFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed algorithm is a modification of the previous {\it binary iterative hard thresholding} (BIHT) algorithm, where, in addition to the sparsity constraint, the total-variation of the recovered signal is upper constrained. As in BIHT, the data term of the objective function is an one-sided $\ell_1$ (or $\ell_2$) norm. Experiments on the recovery of sparse piece-wise smooth signals show that the proposed algorithm is able to take advantage of the piece-wise smoothness of the original signal, achieving more accurate recovery than BIHT.
△ Less
Submitted 20 February, 2014;
originally announced February 2014.
-
Exploiting Two-Dimensional Group Sparsity in 1-Bit Compressive Sensing
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We propose a new approach, {\it two-dimensional fused binary compressive sensing} (2DFBCS) to recover 2D sparse piece-wise signals from 1-bit measurements, exploiting 2D group sparsity for 1-bit compressive sensing recovery. The proposed method is a modified 2D version of the previous {\it binary iterative hard thresholding} (2DBIHT) algorithm, where the objective function includes a 2D one-sided…
▽ More
We propose a new approach, {\it two-dimensional fused binary compressive sensing} (2DFBCS) to recover 2D sparse piece-wise signals from 1-bit measurements, exploiting 2D group sparsity for 1-bit compressive sensing recovery. The proposed method is a modified 2D version of the previous {\it binary iterative hard thresholding} (2DBIHT) algorithm, where the objective function includes a 2D one-sided $\ell_1$ (or $\ell_2$) penalty function encouraging agreement with the observed data, an indicator function of $K$-sparsity, and a total variation (TV) or modified TV (MTV) constraint. The subgradient of the 2D one-sided $\ell_1$ (or $\ell_2$) penalty and the projection onto the $K$-sparsity and TV or MTV constraint can be computed efficiently, allowing the appliaction of algorithms of the {\it forward-backward splitting} (a.k.a. {\it iterative shrinkage-thresholding}) family. Experiments on the recovery of 2D sparse piece-wise smooth signals show that the proposed approach is able to take advantage of the piece-wise smoothness of the original signal, achieving more accurate recovery than 2DBIHT. More specifically, 2DFBCS with the MTV and the $\ell_2$ penalty performs best amongst the algorithms tested.
△ Less
Submitted 21 February, 2014; v1 submitted 20 February, 2014;
originally announced February 2014.
-
A novel sparsity and clustering regularization
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
We propose a novel SPARsity and Clustering (SPARC) regularizer, which is a modified version of the previous octagonal shrinkage and clustering algorithm for regression (OSCAR), where, the proposed regularizer consists of a $K$-sparse constraint and a pair-wise $\ell_{\infty}$ norm restricted on the $K$ largest components in magnitude. The proposed regularizer is able to separably enforce $K$-spars…
▽ More
We propose a novel SPARsity and Clustering (SPARC) regularizer, which is a modified version of the previous octagonal shrinkage and clustering algorithm for regression (OSCAR), where, the proposed regularizer consists of a $K$-sparse constraint and a pair-wise $\ell_{\infty}$ norm restricted on the $K$ largest components in magnitude. The proposed regularizer is able to separably enforce $K$-sparsity and encourage the non-zeros to be equal in magnitude. Moreover, it can accurately group the features without shrinking their magnitude. In fact, SPARC is closely related to OSCAR, so that the proximity operator of the former can be efficiently computed based on that of the latter, allowing using proximal splitting algorithms to solve problems with SPARC regularization. Experiments on synthetic data and with benchmark breast cancer data show that SPARC is a competitive group-sparsity inducing regularizer for regression and classification.
△ Less
Submitted 20 February, 2014; v1 submitted 18 October, 2013;
originally announced October 2013.
-
Solving OSCAR regularization problems by proximal splitting algorithms
Authors:
Xiangrong Zeng,
Mário A. T. Figueiredo
Abstract:
The OSCAR (octagonal selection and clustering algorithm for regression) regularizer consists of a L_1 norm plus a pair-wise L_inf norm (responsible for its grou** behavior) and was proposed to encourage group sparsity in scenarios where the groups are a priori unknown. The OSCAR regularizer has a non-trivial proximity operator, which limits its applicability. We reformulate this regularizer as a…
▽ More
The OSCAR (octagonal selection and clustering algorithm for regression) regularizer consists of a L_1 norm plus a pair-wise L_inf norm (responsible for its grou** behavior) and was proposed to encourage group sparsity in scenarios where the groups are a priori unknown. The OSCAR regularizer has a non-trivial proximity operator, which limits its applicability. We reformulate this regularizer as a weighted sorted L_1 norm, and propose its grou** proximity operator (GPO) and approximate proximity operator (APO), thus making state-of-the-art proximal splitting algorithms (PSAs) available to solve inverse problems with OSCAR regularization. The GPO is in fact the APO followed by additional grou** and averaging operations, which are costly in time and storage, explaining the reason why algorithms with APO are much faster than that with GPO. The convergences of PSAs with GPO are guaranteed since GPO is an exact proximity operator. Although convergence of PSAs with APO is may not be guaranteed, we have experimentally found that APO behaves similarly to GPO when the regularization parameter of the pair-wise L_inf norm is set to an appropriately small value. Experiments on recovery of group-sparse signals (with unknown groups) show that PSAs with APO are very fast and accurate.
△ Less
Submitted 27 September, 2013; v1 submitted 24 September, 2013;
originally announced September 2013.
-
Alternating Directions Dual Decomposition
Authors:
Andre F. T. Martins,
Mario A. T. Figueiredo,
Pedro M. Q. Aguiar,
Noah A. Smith,
Eric P. Xing
Abstract:
We propose AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs based on the alternating directions method of multipliers. Like dual decomposition algorithms, AD3 uses worker nodes to iteratively solve local subproblems and a controller node to combine these local solutions into a global update. The key characteristic of AD3 is that each local subproblem has a…
▽ More
We propose AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs based on the alternating directions method of multipliers. Like dual decomposition algorithms, AD3 uses worker nodes to iteratively solve local subproblems and a controller node to combine these local solutions into a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to a faster consensus than subgradient-based dual decomposition, both theoretically and in practice. We provide closed-form solutions for these AD3 subproblems for binary pairwise factors and factors imposing first-order logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP configuration, making AD3 applicable to a wide range of problems. Experiments on synthetic and realworld problems show that AD3 compares favorably with the state-of-the-art.
△ Less
Submitted 28 December, 2012;
originally announced December 2012.
-
Deconvolving Images with Unknown Boundaries Using the Alternating Direction Method of Multipliers
Authors:
Mariana S. C. Almeida,
Mário A. T. Figueiredo
Abstract:
The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems…
▽ More
The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system), which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e., under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic deblurring (with and without inpainting of interior pixels) under total-variation and frame-based regularization.
△ Less
Submitted 7 March, 2013; v1 submitted 9 October, 2012;
originally announced October 2012.