Search | arXiv e-print repository

Anomaly Detection of Tabular Data Using LLMs

Authors: Aodong Li, Yunhan Zhao, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, Stephan Mandt

Abstract: Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating thei… ▽ More Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: accepted at the Anomaly Detection with Foundation Models workshop

arXiv:2405.06729 [pdf, other]

Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Authors: Aleix Lafita, Ferran Gonzalez, Mahmoud Hossam, Paul Smyth, Jacob Deasy, Ari Allyn-Feuer, Daniel Seaton, Stephen Young

Abstract: Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-o… ▽ More Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Machine Learning for Genomics Explorations workshop at ICLR 2024

arXiv:2404.04240 [pdf, other]

Dynamic Conditional Optimal Transport through Simulation-Free Flows

Authors: Gavin Kerrigan, Giosue Migliorini, Padhraic Smyth

Abstract: We study the geometry of conditional optimal transport (COT) and prove a dynamical formulation which generalizes the Benamou-Brenier Theorem. Equipped with these tools, we propose a simulation-free flow-based method for conditional generative modeling. Our method couples an arbitrary source distribution to a specified target distribution through a triangular COT plan, and a conditional generative… ▽ More We study the geometry of conditional optimal transport (COT) and prove a dynamical formulation which generalizes the Benamou-Brenier Theorem. Equipped with these tools, we propose a simulation-free flow-based method for conditional generative modeling. Our method couples an arbitrary source distribution to a specified target distribution through a triangular COT plan, and a conditional generative model is obtained by approximating the geodesic path of measures induced by this COT plan. Our theory and methods are applicable in infinite-dimensional settings, making them well suited for a wide class of Bayesian inverse problems. Empirically, we demonstrate that our method is competitive on several challenging conditional generation tasks, including an infinite-dimensional inverse problem. △ Less

Submitted 31 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2401.13835 [pdf, other]

The Calibration Gap between Model and Human Confidence in Large Language Models

Authors: Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas Mayer, Padhraic Smyth

Abstract: For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper ex… ▽ More For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model. Through experiments involving multiple-choice questions, we systematically examine human users' ability to discern the reliability of LLM outputs. Our study focuses on two key areas: (1) assessing users' perception of true LLM confidence and (2) investigating the impact of tailored explanations on this perception. The research highlights that default explanations from LLMs often lead to user overestimation of both the model's confidence and its' accuracy. By modifying the explanations to more accurately reflect the LLM's internal confidence, we observe a significant shift in user perception, aligning it more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The findings underscore the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 27 pages, 10 figures

arXiv:2312.15045 [pdf, other]

Probabilistic Modeling for Sequences of Sets in Continuous-Time

Authors: Yuxin Chang, Alex Boyd, Padhraic Smyth

Abstract: Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In… ▽ More Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction. △ Less

Submitted 18 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Oral presentation at AISTATS 2024

arXiv:2312.07679 [pdf, other]

Bayesian Online Learning for Consensus Prediction

Authors: Sam Showalter, Alex Boyd, Padhraic Smyth, Mark Steyvers

Abstract: Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costl… ▽ More Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2305.17209 [pdf, other]

Functional Flow Matching

Authors: Gavin Kerrigan, Giosue Migliorini, Padhraic Smyth

Abstract: We propose Functional Flow Matching (FFM), a function-space generative model that generalizes the recently-introduced Flow Matching model to operate in infinite-dimensional spaces. Our approach works by first defining a path of probability measures that interpolates between a fixed Gaussian measure and the data distribution, followed by learning a vector field on the underlying space of functions… ▽ More We propose Functional Flow Matching (FFM), a function-space generative model that generalizes the recently-introduced Flow Matching model to operate in infinite-dimensional spaces. Our approach works by first defining a path of probability measures that interpolates between a fixed Gaussian measure and the data distribution, followed by learning a vector field on the underlying space of functions that generates this path of measures. Our method does not rely on likelihoods or simulations, making it well-suited to the function space setting. We provide both a theoretical framework for building such models and an empirical evaluation of our techniques. We demonstrate through experiments on several real-world benchmarks that our proposed FFM method outperforms several recently proposed function-space generative models. △ Less

Submitted 5 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.09064 [pdf, other]

doi 10.1145/3593013.3594111

Capturing Humans' Mental Models of AI: An Item Response Theory Approach

Authors: Markelle Kelly, Aakriti Kumar, Padhraic Smyth, Mark Steyvers

Abstract: Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a… ▽ More Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a question-answering setting, repeatedly assessing their teammate's performance. Using this experimental data, we demonstrate the use of our framework for testing research questions about people's perceptions of both AI agents and other people. We contrast mental models of AI teammates with those of human teammates as we characterize the dimensionality of these mental models, their development over time, and the influence of the participants' own self-perception. Our results indicate that people expect AI agents' performance to be significantly better on average than the performance of other humans, with less variation across different types of problems. We conclude with a discussion of the implications of these findings for human-AI interaction. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: FAccT 2023

arXiv:2302.07849 [pdf, other]

Zero-Shot Anomaly Detection via Batch Normalization

Authors: Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, Stephan Mandt

Abstract: Anomaly detection (AD) plays a crucial role in many safety-critical application domains. The challenge of adapting an anomaly detector to drift in the normal data distribution, especially when no training data is available for the "new normal," has led to the development of zero-shot AD techniques. In this paper, we propose a simple yet effective method called Adaptive Centered Representations (AC… ▽ More Anomaly detection (AD) plays a crucial role in many safety-critical application domains. The challenge of adapting an anomaly detector to drift in the normal data distribution, especially when no training data is available for the "new normal," has led to the development of zero-shot AD techniques. In this paper, we propose a simple yet effective method called Adaptive Centered Representations (ACR) for zero-shot batch-level AD. Our approach trains off-the-shelf deep anomaly detectors (such as deep SVDD) to adapt to a set of inter-related training data distributions in combination with batch normalization, enabling automatic zero-shot generalization for unseen AD tasks. This simple recipe, batch normalization plus meta-training, is a highly effective and versatile tool. Our theoretical results guarantee the zero-shot generalization for unseen AD tasks; our empirical results demonstrate the first zero-shot AD results for tabular data and outperform existing methods in zero-shot anomaly detection and segmentation on image data from specialized domains. Code is at https://github.com/aodongli/zero-shot-ad-via-batch-norm △ Less

Submitted 7 November, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: accepted at NeurIPS 2023

arXiv:2302.07832 [pdf, other]

Deep Anomaly Detection under Labeling Budget Constraints

Authors: Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Stephan Mandt, Maja Rudolph

Abstract: Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with op… ▽ More Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints. △ Less

Submitted 4 July, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: ICML 2023

arXiv:2212.00886 [pdf, other]

Diffusion Generative Models in Infinite Dimensions

Authors: Gavin Kerrigan, Justin Ley, Padhraic Smyth

Abstract: Diffusion generative models have recently been applied to domains where the available data can be seen as a discretization of an underlying function, such as audio signals or time series. However, these models operate directly on the discretized data, and there are no semantics in the modeling process that relate the observed data to the underlying functional forms. We generalize diffusion models… ▽ More Diffusion generative models have recently been applied to domains where the available data can be seen as a discretization of an underlying function, such as audio signals or time series. However, these models operate directly on the discretized data, and there are no semantics in the modeling process that relate the observed data to the underlying functional forms. We generalize diffusion models to operate directly in function space by develo** the foundational theory for such models in terms of Gaussian measures on Hilbert spaces. A significant benefit of our function space point of view is that it allows us to explicitly specify the space of functions we are working in, leading us to develop methods for diffusion generative modeling in Sobolev spaces. Our approach allows us to perform both unconditional and conditional generation of function-valued data. We demonstrate our methods on several synthetic and real-world benchmarks. △ Less

Submitted 24 February, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

arXiv:2211.08499 [pdf, other]

Probabilistic Querying of Continuous-Time Event Sequences

Authors: Alex Boyd, Yuxin Chang, Stephan Mandt, Padhraic Smyth

Abstract: Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions… ▽ More Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions about future scenarios such as "what kind of event will occur next" or "will an event of type $A$ occur before one of type $B$". Unfortunately, some of these queries are notoriously hard to address since current methods are limited to naive simulation, which can be highly inefficient. This paper introduces a new typology of query types and a framework for addressing them using importance sampling. Example queries include predicting the $n^\text{th}$ event type in a sequence and the hitting time distribution of one or more event types. We also leverage these findings further to be applicable for estimating general "$A$ before $B$" type of queries. We prove theoretically that our estimation method is effectively always better than naive simulation and show empirically based on three real-world datasets that it is on average 1,000 times more efficient than existing approaches. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2210.06464 [pdf, other]

Predictive Querying for Autoregressive Neural Sequence Models

Authors: Alex Boyd, Sam Showalter, Stephan Mandt, Padhraic Smyth

Abstract: In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restr… ▽ More In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods. △ Less

Submitted 4 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Oral Presentation at the Intl. Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2209.15154 [pdf, other]

Variable-Based Calibration for Machine Learning Classifiers

Authors: Markelle Kelly, Padhraic Smyth

Abstract: The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we f… ▽ More The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability. △ Less

Submitted 5 April, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

arXiv:2206.09076 [pdf, other]

Fair Generalized Linear Models with a Convex Penalty

Authors: Hyungrok Do, Preston Putzel, Axel Martin, Padhraic Smyth, Judy Zhong

Abstract: Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term b… ▽ More Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other well-known fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted for publication in ICML 2022

arXiv:2109.14591 [pdf, other]

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

Authors: Gavin Kerrigan, Padhraic Smyth, Mark Steyvers

Abstract: An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probab… ▽ More An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints. △ Less

Submitted 1 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: NeurIPS 2021

arXiv:2105.05699 [pdf, other]

doi 10.1145/3495256

Automating Data Science: Prospects and Challenges

Authors: Tijl De Bie, Luc De Raedt, José Hernández-Orallo, Holger H. Hoos, Padhraic Smyth, Christopher K. I. Williams

Abstract: Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and transform the work of data scientists, not to replace them. * Important parts of data science are already being automated, especially in the modeling stages, w… ▽ More Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and transform the work of data scientists, not to replace them. * Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction. * Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction. △ Less

Submitted 28 February, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: 19 pages, 3 figures. v1 accepted for publication (April 2021) in Communications of the ACM

Journal ref: Communications of the ACM 65(3) 76-87 (2022)

arXiv:2105.04648 [pdf, other]

doi 10.1111/biom.13632

Joint Fairness Model with Applications to Risk Predictions for Under-represented Populations

Authors: Hyungrok Do, Shin**i Nandi, Preston Putzel, Padhraic Smyth, Judy Zhong

Abstract: In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the ma… ▽ More In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: i) fairness is often achieved by compromising accuracy for some groups; ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a Joint Fairness Model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an Accelerated Smoothing Proximal Gradient Algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for under-represented older patients diagnosed with coronavirus disease 2019 (COVID-19). △ Less

Submitted 23 February, 2022; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: 34 pages, 4 figures, 1 table

arXiv:2103.05337 [pdf, other]

A Mask R-CNN approach to counting bacterial colony forming units in pharmaceutical development

Authors: Tanguy Naets, Maarten Huijsmans, Paul Smyth, Laurent Sorber, Gaël de Lannoy

Abstract: We present an application of the well-known Mask R-CNN approach to the counting of different types of bacterial colony forming units that were cultured in Petri dishes. Our model was made available to lab technicians in a modern SPA (Single-Page Application). Users can upload images of dishes, after which the Mask R-CNN model that was trained and tuned specifically for this task detects the number… ▽ More We present an application of the well-known Mask R-CNN approach to the counting of different types of bacterial colony forming units that were cultured in Petri dishes. Our model was made available to lab technicians in a modern SPA (Single-Page Application). Users can upload images of dishes, after which the Mask R-CNN model that was trained and tuned specifically for this task detects the number of BVG- and BVG+ colonies and displays these in an interactive interface for the user to verify. Users can then check the model's predictions, correct them if deemed necessary, and finally validate them. Our adapted Mask R-CNN model achieves a mean average precision (mAP) of 94\% at an intersection-over-union (IoU) threshold of 50\%. With these encouraging results, we see opportunities to bring the benefits of improved accuracy and time saved to related problems, such as generalising to other bacteria types and viral foci counting. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 9 pages, 3 pdf figures. Extended version of poster presented at ESANN 2020 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning)

arXiv:2012.08101 [pdf, other]

Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning

Authors: Aodong Li, Alex Boyd, Padhraic Smyth, Stephan Mandt

Abstract: We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a… ▽ More We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a binary 'change variable,' we construct an informative prior such that--if a change is detected--the model partially erases the information of past model updates by tempering to facilitate adaptation to the new data distribution. Furthermore, the approach uses beam search to track multiple change-point hypotheses and selects the most probable one in hindsight. Our proposed method is model-agnostic, applicable in both supervised and unsupervised learning settings, suitable for an environment of concept drifts or covariate drifts, and yields improvements over state-of-the-art Bayesian online learning approaches. △ Less

Submitted 26 October, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: Published version, Neural Information Processing Systems 2021

arXiv:2011.03231 [pdf, other]

User-Dependent Neural Sequence Models for Continuous-Time Event Data

Authors: Alex Boyd, Robert Bamler, Stephan Mandt, Padhraic Smyth

Abstract: Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying int… ▽ More Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying intensity functions are the current state-of-the-art for predictive modeling with such data. These models typically assume that all event sequences come from the same data distribution. However, in many applications event sequences are generated by different sources, or users, and their characteristics can be very different. In this paper, we extend the broad class of neural marked point process models to mixtures of latent embeddings, where each mixture component models the characteristic traits of a given user. Our approach relies on augmenting these models with a latent variable that encodes user characteristics, represented by a mixture model over user behavior that is trained via amortized variational inference. We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification. △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: Accepted at NeurIPS 2020

arXiv:2010.09851 [pdf, other]

Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Authors: Disi Ji, Padhraic Smyth, Mark Steyvers

Abstract: We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each gro… ▽ More We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 27 pages

arXiv:2009.00926 [pdf, other]

Deep Learning to Detect Bacterial Colonies for the Production of Vaccines

Authors: Thomas Beznik, Paul Smyth, Gaël de Lannoy, John A. Lee

Abstract: During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a besp… ▽ More During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: 6 pages, 2 figures, accepted at ESANN 2020 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning)

arXiv:2007.00239 [pdf]

doi 10.1038/s41558-020-00963-x

Zonally opposing shifts of the intertropical convergence zone in response to climate change

Authors: Antonios Mamalakis, James T. Randerson, **-Yi Yu, Michael S. Pritchard, Gudrun Magnusdottir, Padhraic Smyth, Paul A. Levine, Sungduk Yu, Efi Foufoula-Georgiou

Abstract: Future changes in the location of the intertropical convergence zone (ITCZ) due to climate change are of high interest since they could substantially alter precipitation patterns in the tropics and subtropics. Although models predict a future narrowing of the ITCZ during the 21st century in response to climate warming, uncertainties remain large regarding its future position, with most past work f… ▽ More Future changes in the location of the intertropical convergence zone (ITCZ) due to climate change are of high interest since they could substantially alter precipitation patterns in the tropics and subtropics. Although models predict a future narrowing of the ITCZ during the 21st century in response to climate warming, uncertainties remain large regarding its future position, with most past work focusing on the zonal-mean ITCZ shifts. Here we use projections from 27 state-of-the-art climate models (CMIP6) to investigate future changes in ITCZ location as a function of longitude and season, in response to climate warming. We document a robust zonally opposing response of the ITCZ, with a northward shift over eastern Africa and the Indian Ocean, and a southward shift in the eastern Pacific and Atlantic Ocean by 2100, for the SSP3-7.0 scenario. Using a two-dimensional energetics framework, we find that the revealed ITCZ response is consistent with future changes in the divergent atmospheric energy transport over the tropics, and sector-mean shifts of the energy flux equator (EFE). The changes in the EFE appear to be the result of zonally opposing imbalances in the hemispheric atmospheric heating over the two sectors, consisting of increases in atmospheric heating over Eurasia and cooling over the Southern Ocean, which contrast with atmospheric cooling over the North Atlantic Ocean due to a model-projected weakening of the Atlantic meridional overturning circulation. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Journal ref: Nature Climate Change 2021

arXiv:2002.06532 [pdf, other]

Active Bayesian Assessment for Black-Box Classifiers

Authors: Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers

Abstract: Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an act… ▽ More Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by develo** inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets. △ Less

Submitted 15 March, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

arXiv:1810.04045 [pdf, other]

Dropout as a Structured Shrinkage Prior

Authors: Eric Nalisnick, José Miguel Hernández-Lobato, Padhraic Smyth

Abstract: Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dro… ▽ More Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior 'automatic depth determination' as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks. △ Less

Submitted 29 May, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: ICML 2019

arXiv:1711.07673 [pdf, other]

Mondrian Processes for Flow Cytometry Analysis

Authors: Disi Ji, Eric Nalisnick, Padhraic Smyth

Abstract: Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes t… ▽ More Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes to perform automated gating by incorporating prior information of the kind used by gating technicians. The method segments cells into types via Bayesian nonparametric trees. Examining the posterior over trees allows for interpretable visualizations and uncertainty quantification - two vital qualities for implementation in clinical practice. △ Less

Submitted 28 November, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 7 pages, 4 figures, NIPS workshop ML4H: Machine Learning for Health 2017, Long Beach, CA, USA

arXiv:1704.01168 [pdf, other]

Learning Approximately Objective Priors

Authors: Eric Nalisnick, Padhraic Smyth

Abstract: Informative Bayesian priors are often difficult to elicit, and when this is the case, modelers usually turn to noninformative or objective priors. However, objective priors such as the Jeffreys and reference priors are not tractable to derive for many models of interest. We address this issue by proposing techniques for learning reference prior approximations: we select a parametric family and opt… ▽ More Informative Bayesian priors are often difficult to elicit, and when this is the case, modelers usually turn to noninformative or objective priors. However, objective priors such as the Jeffreys and reference priors are not tractable to derive for many models of interest. We address this issue by proposing techniques for learning reference prior approximations: we select a parametric family and optimize a black-box lower bound on the reference prior objective to find the member of the family that serves as a good approximation. We experimentally demonstrate the method's effectiveness by recovering Jeffreys priors and learning the Variational Autoencoder's reference prior. △ Less

Submitted 4 August, 2017; v1 submitted 4 April, 2017; originally announced April 2017.

Comments: UAI 2017

arXiv:1701.02856 [pdf, other]

Bayesian Non-Homogeneous Markov Models via Polya-Gamma Data Augmentation with Applications to Rainfall Modeling

Authors: Tracy Holsclaw, Arthur M. Greene, Andrew W. Robertson, Padhraic Smyth

Abstract: Discrete-time hidden Markov models are a broadly useful class of latent-variable models with applications in areas such as speech recognition, bioinformatics, and climate data analysis. It is common in practice to introduce temporal non-homogeneity into such models by making the transition probabilities dependent on time-varying exogenous input variables via a multinomial logistic parametrization.… ▽ More Discrete-time hidden Markov models are a broadly useful class of latent-variable models with applications in areas such as speech recognition, bioinformatics, and climate data analysis. It is common in practice to introduce temporal non-homogeneity into such models by making the transition probabilities dependent on time-varying exogenous input variables via a multinomial logistic parametrization. We extend such models to introduce additional non-homogeneity into the emission distribution using a generalized linear model (GLM), with data augmentation for sampling-based inference. However, the presence of the logistic function in the state transition model significantly complicates parameter inference for the overall model, particularly in a Bayesian context. To address this we extend the recently-proposed Polya-Gamma data augmentation approach to handle non-homogeneous hidden Markov models (NHMMs), allowing the development of an efficient Markov chain Monte Carlo (MCMC) sampling scheme. We apply our model and inference scheme to 30 years of daily rainfall in India, leading to a number of insights into rainfall-related phenomena in the region. Our proposed approach allows for fully Bayesian analysis of relatively complex NHMMs on a scale that was not possible with previous methods. Software implementing the methods described in the paper is available via the R package NHMM. △ Less

Submitted 12 January, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

Comments: 40 pages, 26 figures

arXiv:1605.06197 [pdf, other]

Stick-Breaking Variational Autoencoders

Authors: Eric Nalisnick, Padhraic Smyth

Abstract: We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-s… ▽ More We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE's. △ Less

Submitted 3 April, 2017; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: ICLR 2017, Conference Track

arXiv:1506.03208 [pdf, other]

A Scale Mixture Perspective of Multiplicative Noise in Neural Networks

Authors: Eric Nalisnick, Anima Anandkumar, Padhraic Smyth

Abstract: Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper… ▽ More Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper, we show that when a Gaussian prior is placed on a DNN's weights, applying multiplicative noise induces a Gaussian scale mixture, which can be reparameterized to circumvent the problematic likelihood function. Analysis can then proceed by using a type-II maximum likelihood procedure to derive a closed-form expression revealing how regularization evolves as a function of the network's weights. Results show that multiplicative noise forces weights to become either sparse or invariant to rescaling. We find our analysis has implications for model compression as it naturally reveals a weight pruning rule that starkly contrasts with the commonly used signal-to-noise ratio (SNR). While the SNR prunes weights with large variances, seeing them as noisy, our approach recognizes their robustness and retains them. We empirically demonstrate our approach has a strong advantage over the SNR heuristic and is competitive to retraining with soft targets produced from a teacher model. △ Less

Submitted 10 June, 2015; originally announced June 2015.

arXiv:1504.00860 [pdf, ps, other]

Bayesian Detection of Changepoints in Finite-State Markov Chains for Multiple Sequences

Authors: Petter Arnesen, Tracy Holsclaw, Padhraic Smyth

Abstract: We consider the analysis of sets of categorical sequences consisting of piecewise homogeneous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a Bay… ▽ More We consider the analysis of sets of categorical sequences consisting of piecewise homogeneous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a Bayesian framework for analyzing such data, placing priors on the locations of the changepoints and on the transition matrices and using Markov chain Monte Carlo (MCMC) techniques to obtain posterior samples given the data. Experimental results using simulated data illustrates how the methodology can be used for inference of posterior distributions for parameters and changepoints, as well as the ability to handle considerable variability in the locations of the changepoints across different sequences. We also investigate the application of the approach to sequential data from two applications involving monsoonal rainfall patterns and branching patterns in trees. △ Less

Submitted 7 April, 2015; v1 submitted 3 April, 2015; originally announced April 2015.

arXiv:1412.6599 [pdf, other]

Hot Swap** for Online Adaptation of Optimization Hyperparameters

Authors: Kevin Bache, Dennis DeCoste, Padhraic Smyth

Abstract: We describe a general framework for online adaptation of optimization hyperparameters by `hot swap**' their values during learning. We investigate this approach in the context of adaptive learning rate selection using an explore-exploit strategy from the multi-armed bandit literature. Experiments on a benchmark neural network show that the hot swap** approach leads to consistently better solut… ▽ More We describe a general framework for online adaptation of optimization hyperparameters by `hot swap**' their values during learning. We investigate this approach in the context of adaptive learning rate selection using an explore-exploit strategy from the multi-armed bandit literature. Experiments on a benchmark neural network show that the hot swap** approach leads to consistently better solutions compared to well-known alternatives such as AdaDelta and stochastic gradient with exhaustive hyperparameter search. △ Less

Submitted 13 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

Comments: Submission to ICLR 2015

MSC Class: 62L20 ACM Class: G.1.6; I.2.6

arXiv:1309.7971

Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (2013)

Authors: Ann Nicholson, Padhriac Smyth

Abstract: This is the Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, which was held in Bellevue, WA, August 11-15, 2013 This is the Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, which was held in Bellevue, WA, August 11-15, 2013 △ Less

Submitted 27 August, 2014; v1 submitted 30 September, 2013; originally announced September 2013.

Report number: UAI2013

arXiv:1309.1670 [pdf, ps, other]

doi 10.1007/JHEP01(2014)029

Scalar masses in general N=2 gauged supergravity theories

Authors: Francesca Catino, Claudio A. Scrucca, Paul Smyth

Abstract: We readdress the question of whether any universal upper bound exists on the square mass m^2 of the lightest scalar around a supersymmetry breaking vacuum in generic N=2 gauged supergravity theories for a given gravitino mass m_3/2 and cosmological constant V. We review the known bounds which apply to theories with restricted matter content from a new perspective. We then extend these results to t… ▽ More We readdress the question of whether any universal upper bound exists on the square mass m^2 of the lightest scalar around a supersymmetry breaking vacuum in generic N=2 gauged supergravity theories for a given gravitino mass m_3/2 and cosmological constant V. We review the known bounds which apply to theories with restricted matter content from a new perspective. We then extend these results to theories with both hyper and vector multiplets and a gauging involving only one generator, for which we show that such a bound exists for both V>0 and V<0. We finally argue that there is no bound for the same theories with a gauging involving two or more generators. These results imply that in N=2 supergravity theories metastable de Sitter vacua with V<<m^2_3/2 can only arise if at least two isometries are gauged, while those with V>>m^2_3/2 can also arise when a single isometry is gauged. △ Less

Submitted 18 January, 2014; v1 submitted 6 September, 2013; originally announced September 2013.

Comments: 19 pages, 1 figure; v2 minor corrections and additions

Journal ref: JHEP 1401 (2014) 029

arXiv:1305.2452 [pdf, ps, other]

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation

Authors: James Foulds, Levi Boyles, Christopher Dubois, Padhraic Smyth, Max Welling

Abstract: In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take fu… ▽ More In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. We show connections between collapsed variational Bayesian inference and MAP estimation for LDA, and leverage these connections to prove convergence properties of the proposed algorithm. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than the previous method. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software. △ Less

Submitted 10 May, 2013; originally announced May 2013.

arXiv:1305.1903 [pdf, ps, other]

The rigid limit of N=2 supergravity

Authors: Bobby E. Gunara, Jan Louis, Paul Smyth, Luca Tripodi, Roberto Valandro

Abstract: In this paper we review the rigid limit of N=2 supergravity coupled to vector and hypermultiplets. In particular we show how the respective scalar field spaces reduce to their global counterparts. In the hypermultiplet sector we focus on the relation between the local and rigid c-map. In this paper we review the rigid limit of N=2 supergravity coupled to vector and hypermultiplets. In particular we show how the respective scalar field spaces reduce to their global counterparts. In the hypermultiplet sector we focus on the relation between the local and rigid c-map. △ Less

Submitted 8 May, 2013; originally announced May 2013.

Comments: 12 pages

arXiv:1302.1754 [pdf, ps, other]

doi 10.1007/JHEP04(2013)056

Simple metastable de Sitter vacua in N=2 gauged supergravity

Authors: Francesca Catino, Claudio A. Scrucca, Paul Smyth

Abstract: We construct a simple class of N=2 gauged supergravity theories that admit metastable de Sitter vacua, generalizing the recent work done in the context of rigid supersymmetry. The setup involves one hypermultiplet and one vector multiplet spanning suitably curved quaternionic-Kahler and special-Kahler geometries, with an Abelian gauging based on a single triholomorphic isometry, but neither Fayet-… ▽ More We construct a simple class of N=2 gauged supergravity theories that admit metastable de Sitter vacua, generalizing the recent work done in the context of rigid supersymmetry. The setup involves one hypermultiplet and one vector multiplet spanning suitably curved quaternionic-Kahler and special-Kahler geometries, with an Abelian gauging based on a single triholomorphic isometry, but neither Fayet-Iliopoulos terms nor non-Abelian gauge symmetries. We construct the most general model of this type and show that in such a situation the possibility of achieving metastable supersymmetry breaking vacua crucially depends on the value of the cosmological constant V relative to the gravitino mass squared m_{3/2}^2 in Planck units. In particular, focusing on de Sitter vacua with positive V, we show that metastability is only possible when V >= 2.17 m_{3/2}^2. We also derive an upper bound on the lightest scalar mass in this kind of model relative to the gravitino mass m_{3/2} as a function of the cosmological constant V, and discuss its physical implications. △ Less

Submitted 26 April, 2013; v1 submitted 7 February, 2013; originally announced February 2013.

Comments: 26 pages, 2 figures; v2 minor corrections, some additional comments and one reference added

Journal ref: JHEP 1304 (2013) 056

arXiv:1301.3884 [pdf]

Probabilistic Models for Query Approximation with Large Sparse Binary Datasets

Authors: Dmitry Y. Pavlov, Heikki Mannila, Padhraic Smyth

Abstract: Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical pr… ▽ More Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical problem for such databases. We investigate the application of probabilistic models to this problem. In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model. We find that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint. To alleviate the computational requirements we show how one can apply bucket elimination and clique tree approaches to take advantage of structure in the models and in the queries. We provide experimental results on two large real-world transaction datasets. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-465-472

arXiv:1212.4707 [pdf, ps, other]

doi 10.1007/JHEP03(2013)144

Electrically gauged N=4 supergravities in D=4 with N=2 vacua

Authors: Christoph Horst, Jan Louis, Paul Smyth

Abstract: We study N=2 vacua in spontaneously broken N=4 electrically gauged supergravities in four space-time dimensions. We argue that the classification of all such solutions amounts to solving a system of purely algebraic equations. We then explicitly construct a special class of consistent N=2 solutions and study their properties. In particular we find that the spectrum assembles in N=2 massless or BPS… ▽ More We study N=2 vacua in spontaneously broken N=4 electrically gauged supergravities in four space-time dimensions. We argue that the classification of all such solutions amounts to solving a system of purely algebraic equations. We then explicitly construct a special class of consistent N=2 solutions and study their properties. In particular we find that the spectrum assembles in N=2 massless or BPS supermultiplets. We show that (modulo U(1) factors) arbitrary unbroken gauge groups can be realized provided that the number of N=4 vector multiplets is large enough. Below the scale of partial supersymmetry breaking we calculate the relevant terms of the low-energy effective action and argue that the special Kahler manifold for vector multiplets is completely determined, up to its dimension, and lies in the unique series of special Kahler product manifolds. △ Less

Submitted 15 January, 2013; v1 submitted 19 December, 2012; originally announced December 2012.

Comments: 48 pages; v2: one reference added

Report number: ZMP-HH/12-27

arXiv:1212.2467 [pdf]

Probabilistic models for joint clustering and time-war** of multidimensional curves

Authors: Darya Chudova, Scott Gaffney, Padhraic Smyth

Abstract: In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows non-linear time war** of the observed curves relative to the mean curves within the clusters. We also allow for arbitrary discrete-valued translation of the time… ▽ More In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows non-linear time war** of the observed curves relative to the mean curves within the clusters. We also allow for arbitrary discrete-valued translation of the time axis, random real-valued offsets of the measured curves, and additive measurement noise. The resulting model can be viewed as a dynamic Bayesian network with a special transition structure that allows effective inference and learning. The Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely time war**, translation, offset, and cluster membership for each curve. We demonstrate how Bayesian estimation methods improve the results for smaller sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, and show that the DBN models provide systematic improvements in predictive power over competing approaches. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-134-141

arXiv:1211.7364 [pdf, ps, other]

doi 10.1016/j.physletb.2013.04.031

Metastable spontaneous breaking of N=2 supersymmetry

Authors: Benoit Legeret, Claudio A. Scrucca, Paul Smyth

Abstract: We show that contrary to the common lore it is possible to spontaneously break N=2 supersymmetry even in simple theories without constant Fayet-Iliopoulos terms. We consider the most general N=2 supersymmetric theory with one hypermultiplet and one vector multiplet without Fayet-Iliopoulos terms, and show that metastable supersymmetry breaking vacua can arise if both the hyper-Kahler and the speci… ▽ More We show that contrary to the common lore it is possible to spontaneously break N=2 supersymmetry even in simple theories without constant Fayet-Iliopoulos terms. We consider the most general N=2 supersymmetric theory with one hypermultiplet and one vector multiplet without Fayet-Iliopoulos terms, and show that metastable supersymmetry breaking vacua can arise if both the hyper-Kahler and the special-Kahler geometries are suitably curved. We then also prove that while all the scalars can be massive, the lightest one is always lighter than the vector boson. Finally, we argue that these results also directly imply that metastable de Sitter vacua can exist in N=2 supergravity theories with Abelian gaugings and no Fayet-Iliopoulos terms, again contrary to common lore, at least if the cosmological constant is sufficiently large. △ Less

Submitted 26 April, 2013; v1 submitted 30 November, 2012; originally announced November 2012.

Comments: 16 pages, no figures; v2 improved introduction and conclusions; v3 minor corrections

arXiv:1209.5791 [pdf, other]

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges

Authors: Michael J. Bannister, Christopher DuBois, David Eppstein, Padhraic Smyth

Abstract: We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connecte… ▽ More We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries. △ Less

Submitted 25 September, 2012; originally announced September 2012.

arXiv:1209.0912 [pdf, ps, other]

doi 10.1007/JHEP10(2012)124

Metastable de Sitter vacua in N=2 to N=1 truncated supergravity

Authors: Francesca Catino, Claudio A. Scrucca, Paul Smyth

Abstract: We study the possibility of achieving metastable de Sitter vacua in general N=2 to N=1 truncated supergravities without vector multiplets, and compare with the situations arising in N=2 theories with only hypermultiplets and N=1 theories with only chiral multiplets. In N=2 theories based on a quaternionic manifold and a graviphoton gauging, de Sitter vacua are necessarily unstable, as a result of… ▽ More We study the possibility of achieving metastable de Sitter vacua in general N=2 to N=1 truncated supergravities without vector multiplets, and compare with the situations arising in N=2 theories with only hypermultiplets and N=1 theories with only chiral multiplets. In N=2 theories based on a quaternionic manifold and a graviphoton gauging, de Sitter vacua are necessarily unstable, as a result of the peculiar properties of the geometry. In N=1 theories based on a Kahler manifold and a superpotential, de Sitter vacua can instead be metastable provided the geometry satisfies some constraint and the superpotential can be freely adjusted. In N=2 to N=1 truncations, the crucial requirement is then that the tachyon of the mother theory be projected out from the daughter theory, so that the original unstable vacuum is projected to a metastable vacuum. We study the circumstances under which this may happen and derive general constraints for metastability on the geometry and the gauging. We then study in full detail the simplest case of quaternionic manifolds of dimension four with at least one isometry, for which there exists a general parametrization, and study two types of truncations defining Kahler submanifolds of dimension two. As an application, we finally discuss the case of the universal hypermultiplet of N=2 superstrings and its truncations to the dilaton chiral multiplet of N=1 superstrings. We argue that de Sitter vacua in such theories are necessarily unstable in weakly coupled situations, while they can in principle be metastable in strongly coupled regimes. △ Less

Submitted 5 September, 2012; originally announced September 2012.

Comments: 40 pages, no figures

arXiv:1207.7306 [pdf, ps, other]

Hierarchical Models for Relational Event Sequences

Authors: Christopher DuBois, Carter T. Butts, Daniel McFarland, Padhraic Smyth

Abstract: Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating… ▽ More Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating inferences about event-level dynamics and their variation across sequences. The hierarchical approach allows one to share information across sequences in a principled manner---we illustrate the efficacy of such sharing through a set of prediction experiments. After discussing methods for adequacy checking and model selection for this class of models, the method is illustrated with an analysis of high school classroom dynamics. △ Less

Submitted 31 July, 2012; originally announced July 2012.

arXiv:1207.4169 [pdf]

The Author-Topic Model for Authors and Documents

Authors: Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

Abstract: We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that… ▽ More We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output. △ Less

Submitted 11 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

Report number: UAI-P-2004-PG-487-494

arXiv:1207.4143 [pdf]

Modeling Waveform Shapes with Random Eects Segmental Hidden Markov Models

Authors: Seyoung Kim, Padhraic Smyth, Stefan Luther

Abstract: In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech recognition) with the addition of random effects to the generative model. The random effects component of the model handles shape variability across different waveforms within a general class of waveforms of sim… ▽ More In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech recognition) with the addition of random effects to the generative model. The random effects component of the model handles shape variability across different waveforms within a general class of waveforms of similar shape. We show that this probabilistic model provides a unified framework for learning these models from sets of waveform data as well as parsing, classification, and prediction of new waveforms. We derive a computationally efficient EM algorithm to fit the model on multiple waveforms, and introduce a scoring method that evaluates a test waveform based on its shape. Results on two real-world data sets demonstrate that the random effects methodology leads to improved accuracy (compared to alternative approaches) on classification and segmentation of real-world waveforms. △ Less

Submitted 11 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

Report number: UAI-P-2004-PG-309-316

arXiv:1207.4142 [pdf]

Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series

Authors: Sergey Kirshner, Padhraic Smyth, Andrew Robertson

Abstract: We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. Conditional Chow-Liu tree models are introduced, as an extension to standard Chow-Liu trees, for modeling conditional rather than joint densities. We describe learning algorithms for such models and show how… ▽ More We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. Conditional Chow-Liu tree models are introduced, as an extension to standard Chow-Liu trees, for modeling conditional rather than joint densities. We describe learning algorithms for such models and show how they can be used to learn parsimonious representations for the output distributions in hidden Markov models. These models are applied to the important problem of simulating and forecasting daily precipitation occurrence for networks of rain stations. To demonstrate the effectiveness of the models, we compare their performance versus a number of alternatives using historical precipitation data from Southwestern Australia and the Western United States. We illustrate how the structure and parameters of the models can be used to provide an improved meteorological interpretation of such data. △ Less

Submitted 11 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

Report number: UAI-P-2004-PG-317-324

arXiv:1206.6845 [pdf]

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Authors: Ian Porteous, Alexander T. Ihler, Padhraic Smyth, Max Welling

Abstract: Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mix… ▽ More Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infinite mixture models (e.g. over time) at the level of their parameters (i.e. the dependent Dirichlet process model). However, Gibbs samplers for infinite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

Report number: UAI-P-2006-PG-385-392

arXiv:1205.2662 [pdf]

On Smoothing and Inference for Topic Models

Authors: Arthur Asuncion, Max Welling, Padhraic Smyth, Yee Whye Teh

Abstract: Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the c… ▽ More Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents. △ Less

Submitted 9 May, 2012; originally announced May 2012.

Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Report number: UAI-P-2009-PG-27-34

Showing 1–50 of 66 results for author: Smyth, P