-
Anomaly Detection of Tabular Data Using LLMs
Authors:
Aodong Li,
Yunhan Zhao,
Chen Qiu,
Marius Kloft,
Padhraic Smyth,
Maja Rudolph,
Stephan Mandt
Abstract:
Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating thei…
▽ More
Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction
Authors:
Aleix Lafita,
Ferran Gonzalez,
Mahmoud Hossam,
Paul Smyth,
Jacob Deasy,
Ari Allyn-Feuer,
Daniel Seaton,
Stephen Young
Abstract:
Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-o…
▽ More
Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Dynamic Conditional Optimal Transport through Simulation-Free Flows
Authors:
Gavin Kerrigan,
Giosue Migliorini,
Padhraic Smyth
Abstract:
We study the geometry of conditional optimal transport (COT) and prove a dynamical formulation which generalizes the Benamou-Brenier Theorem. Equipped with these tools, we propose a simulation-free flow-based method for conditional generative modeling. Our method couples an arbitrary source distribution to a specified target distribution through a triangular COT plan, and a conditional generative…
▽ More
We study the geometry of conditional optimal transport (COT) and prove a dynamical formulation which generalizes the Benamou-Brenier Theorem. Equipped with these tools, we propose a simulation-free flow-based method for conditional generative modeling. Our method couples an arbitrary source distribution to a specified target distribution through a triangular COT plan, and a conditional generative model is obtained by approximating the geodesic path of measures induced by this COT plan. Our theory and methods are applicable in infinite-dimensional settings, making them well suited for a wide class of Bayesian inverse problems. Empirically, we demonstrate that our method is competitive on several challenging conditional generation tasks, including an infinite-dimensional inverse problem.
△ Less
Submitted 31 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
The Calibration Gap between Model and Human Confidence in Large Language Models
Authors:
Mark Steyvers,
Heliodoro Tejeda,
Aakriti Kumar,
Catarina Belem,
Sheer Karny,
Xinyue Hu,
Lukas Mayer,
Padhraic Smyth
Abstract:
For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper ex…
▽ More
For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model. Through experiments involving multiple-choice questions, we systematically examine human users' ability to discern the reliability of LLM outputs. Our study focuses on two key areas: (1) assessing users' perception of true LLM confidence and (2) investigating the impact of tailored explanations on this perception. The research highlights that default explanations from LLMs often lead to user overestimation of both the model's confidence and its' accuracy. By modifying the explanations to more accurately reflect the LLM's internal confidence, we observe a significant shift in user perception, aligning it more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The findings underscore the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Probabilistic Modeling for Sequences of Sets in Continuous-Time
Authors:
Yuxin Chang,
Alex Boyd,
Padhraic Smyth
Abstract:
Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In…
▽ More
Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.
△ Less
Submitted 18 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Bayesian Online Learning for Consensus Prediction
Authors:
Sam Showalter,
Alex Boyd,
Padhraic Smyth,
Mark Steyvers
Abstract:
Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costl…
▽ More
Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Functional Flow Matching
Authors:
Gavin Kerrigan,
Giosue Migliorini,
Padhraic Smyth
Abstract:
We propose Functional Flow Matching (FFM), a function-space generative model that generalizes the recently-introduced Flow Matching model to operate in infinite-dimensional spaces. Our approach works by first defining a path of probability measures that interpolates between a fixed Gaussian measure and the data distribution, followed by learning a vector field on the underlying space of functions…
▽ More
We propose Functional Flow Matching (FFM), a function-space generative model that generalizes the recently-introduced Flow Matching model to operate in infinite-dimensional spaces. Our approach works by first defining a path of probability measures that interpolates between a fixed Gaussian measure and the data distribution, followed by learning a vector field on the underlying space of functions that generates this path of measures. Our method does not rely on likelihoods or simulations, making it well-suited to the function space setting. We provide both a theoretical framework for building such models and an empirical evaluation of our techniques. We demonstrate through experiments on several real-world benchmarks that our proposed FFM method outperforms several recently proposed function-space generative models.
△ Less
Submitted 5 December, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Capturing Humans' Mental Models of AI: An Item Response Theory Approach
Authors:
Markelle Kelly,
Aakriti Kumar,
Padhraic Smyth,
Mark Steyvers
Abstract:
Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a…
▽ More
Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a question-answering setting, repeatedly assessing their teammate's performance. Using this experimental data, we demonstrate the use of our framework for testing research questions about people's perceptions of both AI agents and other people. We contrast mental models of AI teammates with those of human teammates as we characterize the dimensionality of these mental models, their development over time, and the influence of the participants' own self-perception. Our results indicate that people expect AI agents' performance to be significantly better on average than the performance of other humans, with less variation across different types of problems. We conclude with a discussion of the implications of these findings for human-AI interaction.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Zero-Shot Anomaly Detection via Batch Normalization
Authors:
Aodong Li,
Chen Qiu,
Marius Kloft,
Padhraic Smyth,
Maja Rudolph,
Stephan Mandt
Abstract:
Anomaly detection (AD) plays a crucial role in many safety-critical application domains. The challenge of adapting an anomaly detector to drift in the normal data distribution, especially when no training data is available for the "new normal," has led to the development of zero-shot AD techniques. In this paper, we propose a simple yet effective method called Adaptive Centered Representations (AC…
▽ More
Anomaly detection (AD) plays a crucial role in many safety-critical application domains. The challenge of adapting an anomaly detector to drift in the normal data distribution, especially when no training data is available for the "new normal," has led to the development of zero-shot AD techniques. In this paper, we propose a simple yet effective method called Adaptive Centered Representations (ACR) for zero-shot batch-level AD. Our approach trains off-the-shelf deep anomaly detectors (such as deep SVDD) to adapt to a set of inter-related training data distributions in combination with batch normalization, enabling automatic zero-shot generalization for unseen AD tasks. This simple recipe, batch normalization plus meta-training, is a highly effective and versatile tool. Our theoretical results guarantee the zero-shot generalization for unseen AD tasks; our empirical results demonstrate the first zero-shot AD results for tabular data and outperform existing methods in zero-shot anomaly detection and segmentation on image data from specialized domains. Code is at https://github.com/aodongli/zero-shot-ad-via-batch-norm
△ Less
Submitted 7 November, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Deep Anomaly Detection under Labeling Budget Constraints
Authors:
Aodong Li,
Chen Qiu,
Marius Kloft,
Padhraic Smyth,
Stephan Mandt,
Maja Rudolph
Abstract:
Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with op…
▽ More
Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.
△ Less
Submitted 4 July, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Diffusion Generative Models in Infinite Dimensions
Authors:
Gavin Kerrigan,
Justin Ley,
Padhraic Smyth
Abstract:
Diffusion generative models have recently been applied to domains where the available data can be seen as a discretization of an underlying function, such as audio signals or time series. However, these models operate directly on the discretized data, and there are no semantics in the modeling process that relate the observed data to the underlying functional forms. We generalize diffusion models…
▽ More
Diffusion generative models have recently been applied to domains where the available data can be seen as a discretization of an underlying function, such as audio signals or time series. However, these models operate directly on the discretized data, and there are no semantics in the modeling process that relate the observed data to the underlying functional forms. We generalize diffusion models to operate directly in function space by develo** the foundational theory for such models in terms of Gaussian measures on Hilbert spaces. A significant benefit of our function space point of view is that it allows us to explicitly specify the space of functions we are working in, leading us to develop methods for diffusion generative modeling in Sobolev spaces. Our approach allows us to perform both unconditional and conditional generation of function-valued data. We demonstrate our methods on several synthetic and real-world benchmarks.
△ Less
Submitted 24 February, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Probabilistic Querying of Continuous-Time Event Sequences
Authors:
Alex Boyd,
Yuxin Chang,
Stephan Mandt,
Padhraic Smyth
Abstract:
Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions…
▽ More
Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions about future scenarios such as "what kind of event will occur next" or "will an event of type $A$ occur before one of type $B$". Unfortunately, some of these queries are notoriously hard to address since current methods are limited to naive simulation, which can be highly inefficient. This paper introduces a new typology of query types and a framework for addressing them using importance sampling. Example queries include predicting the $n^\text{th}$ event type in a sequence and the hitting time distribution of one or more event types. We also leverage these findings further to be applicable for estimating general "$A$ before $B$" type of queries. We prove theoretically that our estimation method is effectively always better than naive simulation and show empirically based on three real-world datasets that it is on average 1,000 times more efficient than existing approaches.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Predictive Querying for Autoregressive Neural Sequence Models
Authors:
Alex Boyd,
Sam Showalter,
Stephan Mandt,
Padhraic Smyth
Abstract:
In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restr…
▽ More
In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods.
△ Less
Submitted 4 November, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Variable-Based Calibration for Machine Learning Classifiers
Authors:
Markelle Kelly,
Padhraic Smyth
Abstract:
The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we f…
▽ More
The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.
△ Less
Submitted 5 April, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Fair Generalized Linear Models with a Convex Penalty
Authors:
Hyungrok Do,
Preston Putzel,
Axel Martin,
Padhraic Smyth,
Judy Zhong
Abstract:
Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term b…
▽ More
Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other well-known fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration
Authors:
Gavin Kerrigan,
Padhraic Smyth,
Mark Steyvers
Abstract:
An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probab…
▽ More
An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.
△ Less
Submitted 1 October, 2021; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Automating Data Science: Prospects and Challenges
Authors:
Tijl De Bie,
Luc De Raedt,
José Hernández-Orallo,
Holger H. Hoos,
Padhraic Smyth,
Christopher K. I. Williams
Abstract:
Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process.
Key insights:
* Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
* Important parts of data science are already being automated, especially in the modeling stages, w…
▽ More
Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process.
Key insights:
* Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
* Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction.
* Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.
△ Less
Submitted 28 February, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Joint Fairness Model with Applications to Risk Predictions for Under-represented Populations
Authors:
Hyungrok Do,
Shin**i Nandi,
Preston Putzel,
Padhraic Smyth,
Judy Zhong
Abstract:
In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the ma…
▽ More
In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: i) fairness is often achieved by compromising accuracy for some groups; ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a Joint Fairness Model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an Accelerated Smoothing Proximal Gradient Algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for under-represented older patients diagnosed with coronavirus disease 2019 (COVID-19).
△ Less
Submitted 23 February, 2022; v1 submitted 10 May, 2021;
originally announced May 2021.
-
A Mask R-CNN approach to counting bacterial colony forming units in pharmaceutical development
Authors:
Tanguy Naets,
Maarten Huijsmans,
Paul Smyth,
Laurent Sorber,
Gaël de Lannoy
Abstract:
We present an application of the well-known Mask R-CNN approach to the counting of different types of bacterial colony forming units that were cultured in Petri dishes. Our model was made available to lab technicians in a modern SPA (Single-Page Application). Users can upload images of dishes, after which the Mask R-CNN model that was trained and tuned specifically for this task detects the number…
▽ More
We present an application of the well-known Mask R-CNN approach to the counting of different types of bacterial colony forming units that were cultured in Petri dishes. Our model was made available to lab technicians in a modern SPA (Single-Page Application). Users can upload images of dishes, after which the Mask R-CNN model that was trained and tuned specifically for this task detects the number of BVG- and BVG+ colonies and displays these in an interactive interface for the user to verify. Users can then check the model's predictions, correct them if deemed necessary, and finally validate them. Our adapted Mask R-CNN model achieves a mean average precision (mAP) of 94\% at an intersection-over-union (IoU) threshold of 50\%. With these encouraging results, we see opportunities to bring the benefits of improved accuracy and time saved to related problems, such as generalising to other bacteria types and viral foci counting.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning
Authors:
Aodong Li,
Alex Boyd,
Padhraic Smyth,
Stephan Mandt
Abstract:
We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a…
▽ More
We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a binary 'change variable,' we construct an informative prior such that--if a change is detected--the model partially erases the information of past model updates by tempering to facilitate adaptation to the new data distribution. Furthermore, the approach uses beam search to track multiple change-point hypotheses and selects the most probable one in hindsight. Our proposed method is model-agnostic, applicable in both supervised and unsupervised learning settings, suitable for an environment of concept drifts or covariate drifts, and yields improvements over state-of-the-art Bayesian online learning approaches.
△ Less
Submitted 26 October, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
User-Dependent Neural Sequence Models for Continuous-Time Event Data
Authors:
Alex Boyd,
Robert Bamler,
Stephan Mandt,
Padhraic Smyth
Abstract:
Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying int…
▽ More
Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying intensity functions are the current state-of-the-art for predictive modeling with such data. These models typically assume that all event sequences come from the same data distribution. However, in many applications event sequences are generated by different sources, or users, and their characteristics can be very different. In this paper, we extend the broad class of neural marked point process models to mixtures of latent embeddings, where each mixture component models the characteristic traits of a given user. Our approach relies on augmenting these models with a latent variable that encodes user characteristics, represented by a mixture model over user behavior that is trained via amortized variational inference. We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference
Authors:
Disi Ji,
Padhraic Smyth,
Mark Steyvers
Abstract:
We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each gro…
▽ More
We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
Deep Learning to Detect Bacterial Colonies for the Production of Vaccines
Authors:
Thomas Beznik,
Paul Smyth,
Gaël de Lannoy,
John A. Lee
Abstract:
During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a besp…
▽ More
During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Zonally opposing shifts of the intertropical convergence zone in response to climate change
Authors:
Antonios Mamalakis,
James T. Randerson,
**-Yi Yu,
Michael S. Pritchard,
Gudrun Magnusdottir,
Padhraic Smyth,
Paul A. Levine,
Sungduk Yu,
Efi Foufoula-Georgiou
Abstract:
Future changes in the location of the intertropical convergence zone (ITCZ) due to climate change are of high interest since they could substantially alter precipitation patterns in the tropics and subtropics. Although models predict a future narrowing of the ITCZ during the 21st century in response to climate warming, uncertainties remain large regarding its future position, with most past work f…
▽ More
Future changes in the location of the intertropical convergence zone (ITCZ) due to climate change are of high interest since they could substantially alter precipitation patterns in the tropics and subtropics. Although models predict a future narrowing of the ITCZ during the 21st century in response to climate warming, uncertainties remain large regarding its future position, with most past work focusing on the zonal-mean ITCZ shifts. Here we use projections from 27 state-of-the-art climate models (CMIP6) to investigate future changes in ITCZ location as a function of longitude and season, in response to climate warming. We document a robust zonally opposing response of the ITCZ, with a northward shift over eastern Africa and the Indian Ocean, and a southward shift in the eastern Pacific and Atlantic Ocean by 2100, for the SSP3-7.0 scenario. Using a two-dimensional energetics framework, we find that the revealed ITCZ response is consistent with future changes in the divergent atmospheric energy transport over the tropics, and sector-mean shifts of the energy flux equator (EFE). The changes in the EFE appear to be the result of zonally opposing imbalances in the hemispheric atmospheric heating over the two sectors, consisting of increases in atmospheric heating over Eurasia and cooling over the Southern Ocean, which contrast with atmospheric cooling over the North Atlantic Ocean due to a model-projected weakening of the Atlantic meridional overturning circulation.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Active Bayesian Assessment for Black-Box Classifiers
Authors:
Disi Ji,
Robert L. Logan IV,
Padhraic Smyth,
Mark Steyvers
Abstract:
Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an act…
▽ More
Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by develo** inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.
△ Less
Submitted 15 March, 2021; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Dropout as a Structured Shrinkage Prior
Authors:
Eric Nalisnick,
José Miguel Hernández-Lobato,
Padhraic Smyth
Abstract:
Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dro…
▽ More
Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior 'automatic depth determination' as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.
△ Less
Submitted 29 May, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Mondrian Processes for Flow Cytometry Analysis
Authors:
Disi Ji,
Eric Nalisnick,
Padhraic Smyth
Abstract:
Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes t…
▽ More
Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes to perform automated gating by incorporating prior information of the kind used by gating technicians. The method segments cells into types via Bayesian nonparametric trees. Examining the posterior over trees allows for interpretable visualizations and uncertainty quantification - two vital qualities for implementation in clinical practice.
△ Less
Submitted 28 November, 2017; v1 submitted 21 November, 2017;
originally announced November 2017.
-
Learning Approximately Objective Priors
Authors:
Eric Nalisnick,
Padhraic Smyth
Abstract:
Informative Bayesian priors are often difficult to elicit, and when this is the case, modelers usually turn to noninformative or objective priors. However, objective priors such as the Jeffreys and reference priors are not tractable to derive for many models of interest. We address this issue by proposing techniques for learning reference prior approximations: we select a parametric family and opt…
▽ More
Informative Bayesian priors are often difficult to elicit, and when this is the case, modelers usually turn to noninformative or objective priors. However, objective priors such as the Jeffreys and reference priors are not tractable to derive for many models of interest. We address this issue by proposing techniques for learning reference prior approximations: we select a parametric family and optimize a black-box lower bound on the reference prior objective to find the member of the family that serves as a good approximation. We experimentally demonstrate the method's effectiveness by recovering Jeffreys priors and learning the Variational Autoencoder's reference prior.
△ Less
Submitted 4 August, 2017; v1 submitted 4 April, 2017;
originally announced April 2017.
-
Bayesian Non-Homogeneous Markov Models via Polya-Gamma Data Augmentation with Applications to Rainfall Modeling
Authors:
Tracy Holsclaw,
Arthur M. Greene,
Andrew W. Robertson,
Padhraic Smyth
Abstract:
Discrete-time hidden Markov models are a broadly useful class of latent-variable models with applications in areas such as speech recognition, bioinformatics, and climate data analysis. It is common in practice to introduce temporal non-homogeneity into such models by making the transition probabilities dependent on time-varying exogenous input variables via a multinomial logistic parametrization.…
▽ More
Discrete-time hidden Markov models are a broadly useful class of latent-variable models with applications in areas such as speech recognition, bioinformatics, and climate data analysis. It is common in practice to introduce temporal non-homogeneity into such models by making the transition probabilities dependent on time-varying exogenous input variables via a multinomial logistic parametrization. We extend such models to introduce additional non-homogeneity into the emission distribution using a generalized linear model (GLM), with data augmentation for sampling-based inference. However, the presence of the logistic function in the state transition model significantly complicates parameter inference for the overall model, particularly in a Bayesian context. To address this we extend the recently-proposed Polya-Gamma data augmentation approach to handle non-homogeneous hidden Markov models (NHMMs), allowing the development of an efficient Markov chain Monte Carlo (MCMC) sampling scheme. We apply our model and inference scheme to 30 years of daily rainfall in India, leading to a number of insights into rainfall-related phenomena in the region. Our proposed approach allows for fully Bayesian analysis of relatively complex NHMMs on a scale that was not possible with previous methods. Software implementing the methods described in the paper is available via the R package NHMM.
△ Less
Submitted 12 January, 2017; v1 submitted 11 January, 2017;
originally announced January 2017.
-
Stick-Breaking Variational Autoencoders
Authors:
Eric Nalisnick,
Padhraic Smyth
Abstract:
We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-s…
▽ More
We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE's.
△ Less
Submitted 3 April, 2017; v1 submitted 19 May, 2016;
originally announced May 2016.
-
A Scale Mixture Perspective of Multiplicative Noise in Neural Networks
Authors:
Eric Nalisnick,
Anima Anandkumar,
Padhraic Smyth
Abstract:
Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper…
▽ More
Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper, we show that when a Gaussian prior is placed on a DNN's weights, applying multiplicative noise induces a Gaussian scale mixture, which can be reparameterized to circumvent the problematic likelihood function. Analysis can then proceed by using a type-II maximum likelihood procedure to derive a closed-form expression revealing how regularization evolves as a function of the network's weights. Results show that multiplicative noise forces weights to become either sparse or invariant to rescaling. We find our analysis has implications for model compression as it naturally reveals a weight pruning rule that starkly contrasts with the commonly used signal-to-noise ratio (SNR). While the SNR prunes weights with large variances, seeing them as noisy, our approach recognizes their robustness and retains them. We empirically demonstrate our approach has a strong advantage over the SNR heuristic and is competitive to retraining with soft targets produced from a teacher model.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Bayesian Detection of Changepoints in Finite-State Markov Chains for Multiple Sequences
Authors:
Petter Arnesen,
Tracy Holsclaw,
Padhraic Smyth
Abstract:
We consider the analysis of sets of categorical sequences consisting of piecewise homogeneous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a Bay…
▽ More
We consider the analysis of sets of categorical sequences consisting of piecewise homogeneous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a Bayesian framework for analyzing such data, placing priors on the locations of the changepoints and on the transition matrices and using Markov chain Monte Carlo (MCMC) techniques to obtain posterior samples given the data. Experimental results using simulated data illustrates how the methodology can be used for inference of posterior distributions for parameters and changepoints, as well as the ability to handle considerable variability in the locations of the changepoints across different sequences. We also investigate the application of the approach to sequential data from two applications involving monsoonal rainfall patterns and branching patterns in trees.
△ Less
Submitted 7 April, 2015; v1 submitted 3 April, 2015;
originally announced April 2015.
-
Hot Swap** for Online Adaptation of Optimization Hyperparameters
Authors:
Kevin Bache,
Dennis DeCoste,
Padhraic Smyth
Abstract:
We describe a general framework for online adaptation of optimization hyperparameters by `hot swap**' their values during learning. We investigate this approach in the context of adaptive learning rate selection using an explore-exploit strategy from the multi-armed bandit literature. Experiments on a benchmark neural network show that the hot swap** approach leads to consistently better solut…
▽ More
We describe a general framework for online adaptation of optimization hyperparameters by `hot swap**' their values during learning. We investigate this approach in the context of adaptive learning rate selection using an explore-exploit strategy from the multi-armed bandit literature. Experiments on a benchmark neural network show that the hot swap** approach leads to consistently better solutions compared to well-known alternatives such as AdaDelta and stochastic gradient with exhaustive hyperparameter search.
△ Less
Submitted 13 April, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (2013)
Authors:
Ann Nicholson,
Padhriac Smyth
Abstract:
This is the Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, which was held in Bellevue, WA, August 11-15, 2013
This is the Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, which was held in Bellevue, WA, August 11-15, 2013
△ Less
Submitted 27 August, 2014; v1 submitted 30 September, 2013;
originally announced September 2013.
-
Scalar masses in general N=2 gauged supergravity theories
Authors:
Francesca Catino,
Claudio A. Scrucca,
Paul Smyth
Abstract:
We readdress the question of whether any universal upper bound exists on the square mass m^2 of the lightest scalar around a supersymmetry breaking vacuum in generic N=2 gauged supergravity theories for a given gravitino mass m_3/2 and cosmological constant V. We review the known bounds which apply to theories with restricted matter content from a new perspective. We then extend these results to t…
▽ More
We readdress the question of whether any universal upper bound exists on the square mass m^2 of the lightest scalar around a supersymmetry breaking vacuum in generic N=2 gauged supergravity theories for a given gravitino mass m_3/2 and cosmological constant V. We review the known bounds which apply to theories with restricted matter content from a new perspective. We then extend these results to theories with both hyper and vector multiplets and a gauging involving only one generator, for which we show that such a bound exists for both V>0 and V<0. We finally argue that there is no bound for the same theories with a gauging involving two or more generators. These results imply that in N=2 supergravity theories metastable de Sitter vacua with V<<m^2_3/2 can only arise if at least two isometries are gauged, while those with V>>m^2_3/2 can also arise when a single isometry is gauged.
△ Less
Submitted 18 January, 2014; v1 submitted 6 September, 2013;
originally announced September 2013.
-
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation
Authors:
James Foulds,
Levi Boyles,
Christopher Dubois,
Padhraic Smyth,
Max Welling
Abstract:
In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take fu…
▽ More
In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. We show connections between collapsed variational Bayesian inference and MAP estimation for LDA, and leverage these connections to prove convergence properties of the proposed algorithm. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than the previous method. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.
△ Less
Submitted 10 May, 2013;
originally announced May 2013.
-
The rigid limit of N=2 supergravity
Authors:
Bobby E. Gunara,
Jan Louis,
Paul Smyth,
Luca Tripodi,
Roberto Valandro
Abstract:
In this paper we review the rigid limit of N=2 supergravity coupled to vector and hypermultiplets. In particular we show how the respective scalar field spaces reduce to their global counterparts. In the hypermultiplet sector we focus on the relation between the local and rigid c-map.
In this paper we review the rigid limit of N=2 supergravity coupled to vector and hypermultiplets. In particular we show how the respective scalar field spaces reduce to their global counterparts. In the hypermultiplet sector we focus on the relation between the local and rigid c-map.
△ Less
Submitted 8 May, 2013;
originally announced May 2013.
-
Simple metastable de Sitter vacua in N=2 gauged supergravity
Authors:
Francesca Catino,
Claudio A. Scrucca,
Paul Smyth
Abstract:
We construct a simple class of N=2 gauged supergravity theories that admit metastable de Sitter vacua, generalizing the recent work done in the context of rigid supersymmetry. The setup involves one hypermultiplet and one vector multiplet spanning suitably curved quaternionic-Kahler and special-Kahler geometries, with an Abelian gauging based on a single triholomorphic isometry, but neither Fayet-…
▽ More
We construct a simple class of N=2 gauged supergravity theories that admit metastable de Sitter vacua, generalizing the recent work done in the context of rigid supersymmetry. The setup involves one hypermultiplet and one vector multiplet spanning suitably curved quaternionic-Kahler and special-Kahler geometries, with an Abelian gauging based on a single triholomorphic isometry, but neither Fayet-Iliopoulos terms nor non-Abelian gauge symmetries. We construct the most general model of this type and show that in such a situation the possibility of achieving metastable supersymmetry breaking vacua crucially depends on the value of the cosmological constant V relative to the gravitino mass squared m_{3/2}^2 in Planck units. In particular, focusing on de Sitter vacua with positive V, we show that metastability is only possible when V >= 2.17 m_{3/2}^2. We also derive an upper bound on the lightest scalar mass in this kind of model relative to the gravitino mass m_{3/2} as a function of the cosmological constant V, and discuss its physical implications.
△ Less
Submitted 26 April, 2013; v1 submitted 7 February, 2013;
originally announced February 2013.
-
Probabilistic Models for Query Approximation with Large Sparse Binary Datasets
Authors:
Dmitry Y. Pavlov,
Heikki Mannila,
Padhraic Smyth
Abstract:
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical pr…
▽ More
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical problem for such databases.
We investigate the application of probabilistic models to this problem. In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model. We find that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint. To alleviate the computational requirements we show how one can apply bucket elimination and clique tree approaches to take advantage of structure in the models and in the queries. We provide experimental results on two large real-world transaction datasets.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Electrically gauged N=4 supergravities in D=4 with N=2 vacua
Authors:
Christoph Horst,
Jan Louis,
Paul Smyth
Abstract:
We study N=2 vacua in spontaneously broken N=4 electrically gauged supergravities in four space-time dimensions. We argue that the classification of all such solutions amounts to solving a system of purely algebraic equations. We then explicitly construct a special class of consistent N=2 solutions and study their properties. In particular we find that the spectrum assembles in N=2 massless or BPS…
▽ More
We study N=2 vacua in spontaneously broken N=4 electrically gauged supergravities in four space-time dimensions. We argue that the classification of all such solutions amounts to solving a system of purely algebraic equations. We then explicitly construct a special class of consistent N=2 solutions and study their properties. In particular we find that the spectrum assembles in N=2 massless or BPS supermultiplets. We show that (modulo U(1) factors) arbitrary unbroken gauge groups can be realized provided that the number of N=4 vector multiplets is large enough. Below the scale of partial supersymmetry breaking we calculate the relevant terms of the low-energy effective action and argue that the special Kahler manifold for vector multiplets is completely determined, up to its dimension, and lies in the unique series of special Kahler product manifolds.
△ Less
Submitted 15 January, 2013; v1 submitted 19 December, 2012;
originally announced December 2012.
-
Probabilistic models for joint clustering and time-war** of multidimensional curves
Authors:
Darya Chudova,
Scott Gaffney,
Padhraic Smyth
Abstract:
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows non-linear time war** of the observed curves relative to the mean curves within the clusters. We also allow for arbitrary discrete-valued translation of the time…
▽ More
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows non-linear time war** of the observed curves relative to the mean curves within the clusters. We also allow for arbitrary discrete-valued translation of the time axis, random real-valued offsets of the measured curves, and additive measurement noise. The resulting model can be viewed as a dynamic Bayesian network with a special transition structure that allows effective inference and learning. The Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely time war**, translation, offset, and cluster membership for each curve. We demonstrate how Bayesian estimation methods improve the results for smaller sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, and show that the DBN models provide systematic improvements in predictive power over competing approaches.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Metastable spontaneous breaking of N=2 supersymmetry
Authors:
Benoit Legeret,
Claudio A. Scrucca,
Paul Smyth
Abstract:
We show that contrary to the common lore it is possible to spontaneously break N=2 supersymmetry even in simple theories without constant Fayet-Iliopoulos terms. We consider the most general N=2 supersymmetric theory with one hypermultiplet and one vector multiplet without Fayet-Iliopoulos terms, and show that metastable supersymmetry breaking vacua can arise if both the hyper-Kahler and the speci…
▽ More
We show that contrary to the common lore it is possible to spontaneously break N=2 supersymmetry even in simple theories without constant Fayet-Iliopoulos terms. We consider the most general N=2 supersymmetric theory with one hypermultiplet and one vector multiplet without Fayet-Iliopoulos terms, and show that metastable supersymmetry breaking vacua can arise if both the hyper-Kahler and the special-Kahler geometries are suitably curved. We then also prove that while all the scalars can be massive, the lightest one is always lighter than the vector boson. Finally, we argue that these results also directly imply that metastable de Sitter vacua can exist in N=2 supergravity theories with Abelian gaugings and no Fayet-Iliopoulos terms, again contrary to common lore, at least if the cosmological constant is sufficiently large.
△ Less
Submitted 26 April, 2013; v1 submitted 30 November, 2012;
originally announced November 2012.
-
Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges
Authors:
Michael J. Bannister,
Christopher DuBois,
David Eppstein,
Padhraic Smyth
Abstract:
We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connecte…
▽ More
We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries.
△ Less
Submitted 25 September, 2012;
originally announced September 2012.
-
Metastable de Sitter vacua in N=2 to N=1 truncated supergravity
Authors:
Francesca Catino,
Claudio A. Scrucca,
Paul Smyth
Abstract:
We study the possibility of achieving metastable de Sitter vacua in general N=2 to N=1 truncated supergravities without vector multiplets, and compare with the situations arising in N=2 theories with only hypermultiplets and N=1 theories with only chiral multiplets. In N=2 theories based on a quaternionic manifold and a graviphoton gauging, de Sitter vacua are necessarily unstable, as a result of…
▽ More
We study the possibility of achieving metastable de Sitter vacua in general N=2 to N=1 truncated supergravities without vector multiplets, and compare with the situations arising in N=2 theories with only hypermultiplets and N=1 theories with only chiral multiplets. In N=2 theories based on a quaternionic manifold and a graviphoton gauging, de Sitter vacua are necessarily unstable, as a result of the peculiar properties of the geometry. In N=1 theories based on a Kahler manifold and a superpotential, de Sitter vacua can instead be metastable provided the geometry satisfies some constraint and the superpotential can be freely adjusted. In N=2 to N=1 truncations, the crucial requirement is then that the tachyon of the mother theory be projected out from the daughter theory, so that the original unstable vacuum is projected to a metastable vacuum. We study the circumstances under which this may happen and derive general constraints for metastability on the geometry and the gauging. We then study in full detail the simplest case of quaternionic manifolds of dimension four with at least one isometry, for which there exists a general parametrization, and study two types of truncations defining Kahler submanifolds of dimension two. As an application, we finally discuss the case of the universal hypermultiplet of N=2 superstrings and its truncations to the dilaton chiral multiplet of N=1 superstrings. We argue that de Sitter vacua in such theories are necessarily unstable in weakly coupled situations, while they can in principle be metastable in strongly coupled regimes.
△ Less
Submitted 5 September, 2012;
originally announced September 2012.
-
Hierarchical Models for Relational Event Sequences
Authors:
Christopher DuBois,
Carter T. Butts,
Daniel McFarland,
Padhraic Smyth
Abstract:
Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating…
▽ More
Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating inferences about event-level dynamics and their variation across sequences. The hierarchical approach allows one to share information across sequences in a principled manner---we illustrate the efficacy of such sharing through a set of prediction experiments. After discussing methods for adequacy checking and model selection for this class of models, the method is illustrated with an analysis of high school classroom dynamics.
△ Less
Submitted 31 July, 2012;
originally announced July 2012.
-
The Author-Topic Model for Authors and Documents
Authors:
Michal Rosen-Zvi,
Thomas Griffiths,
Mark Steyvers,
Padhraic Smyth
Abstract:
We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that…
▽ More
We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.
△ Less
Submitted 11 July, 2012;
originally announced July 2012.
-
Modeling Waveform Shapes with Random Eects Segmental Hidden Markov Models
Authors:
Seyoung Kim,
Padhraic Smyth,
Stefan Luther
Abstract:
In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech recognition) with the addition of random effects to the generative model. The random effects component of the model handles shape variability across different waveforms within a general class of waveforms of sim…
▽ More
In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech recognition) with the addition of random effects to the generative model. The random effects component of the model handles shape variability across different waveforms within a general class of waveforms of similar shape. We show that this probabilistic model provides a unified framework for learning these models from sets of waveform data as well as parsing, classification, and prediction of new waveforms. We derive a computationally efficient EM algorithm to fit the model on multiple waveforms, and introduce a scoring method that evaluates a test waveform based on its shape. Results on two real-world data sets demonstrate that the random effects methodology leads to improved accuracy (compared to alternative approaches) on classification and segmentation of real-world waveforms.
△ Less
Submitted 11 July, 2012;
originally announced July 2012.
-
Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series
Authors:
Sergey Kirshner,
Padhraic Smyth,
Andrew Robertson
Abstract:
We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. Conditional Chow-Liu tree models are introduced, as an extension to standard Chow-Liu trees, for modeling conditional rather than joint densities. We describe learning algorithms for such models and show how…
▽ More
We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. Conditional Chow-Liu tree models are introduced, as an extension to standard Chow-Liu trees, for modeling conditional rather than joint densities. We describe learning algorithms for such models and show how they can be used to learn parsimonious representations for the output distributions in hidden Markov models. These models are applied to the important problem of simulating and forecasting daily precipitation occurrence for networks of rain stations. To demonstrate the effectiveness of the models, we compare their performance versus a number of alternatives using historical precipitation data from Southwestern Australia and the Western United States. We illustrate how the structure and parameters of the models can be used to provide an improved meteorological interpretation of such data.
△ Less
Submitted 11 July, 2012;
originally announced July 2012.
-
Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation
Authors:
Ian Porteous,
Alexander T. Ihler,
Padhraic Smyth,
Max Welling
Abstract:
Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mix…
▽ More
Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infinite mixture models (e.g. over time) at the level of their parameters (i.e. the dependent Dirichlet process model). However, Gibbs samplers for infinite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
On Smoothing and Inference for Topic Models
Authors:
Arthur Asuncion,
Max Welling,
Padhraic Smyth,
Yee Whye Teh
Abstract:
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the c…
▽ More
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.