Search | arXiv e-print repository

A Rate-Distortion View of Uncertainty Quantification

Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski

Abstract: In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching de… ▽ More In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching deep neural networks with this property. Building on prior information bottleneck approaches, our method learns a codebook that stores a compressed representation of all inputs seen during training. The distance of a new example from this codebook can serve as an uncertainty estimate for the example. The resulting model is simple to train and provides deterministic uncertainty estimates by a single forward pass. Finally, our method achieves better out-of-distribution (OOD) detection and misclassification prediction than prior methods, including expensive ensemble methods, deep kernel Gaussian Processes, and approaches based on the standard information bottleneck. △ Less

Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Journal ref: International Conference on Machine Learning, 2024

arXiv:2405.17672 [pdf, other]

Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise

Authors: Lukasz Sztukiewicz, Jack Henry Good, Artur Dubrawski

Abstract: In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to im… ▽ More In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2402.03885 [pdf, other]

MOMENT: A Family of Open Time-series Foundation Models

Authors: Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski

Abstract: We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, e… ▽ More We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges to unlock large-scale multi-dataset pre-training. Finally, we build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pre-trained time series models. Pre-trained models (AutonLab/MOMENT-1-large) and Time Series Pile (AutonLab/Timeseries-PILE) are available on Huggingface. △ Less

Submitted 13 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024. This version contains new experimental results and a section on contemporary work

arXiv:2402.00803 [pdf, other]

Signal Quality Auditing for Time-series Data

Authors: Chufan Gao, Nicholas Gisolfi, Artur Dubrawski

Abstract: Signal quality assessment (SQA) is required for monitoring the reliability of data acquisition systems, especially in AI-driven Predictive Maintenance (PMx) application contexts. SQA is vital for addressing "silent failures" of data acquisition hardware and software, which when unnoticed, misinform the users of data, creating the risk for incorrect decisions with unintended or even catastrophic co… ▽ More Signal quality assessment (SQA) is required for monitoring the reliability of data acquisition systems, especially in AI-driven Predictive Maintenance (PMx) application contexts. SQA is vital for addressing "silent failures" of data acquisition hardware and software, which when unnoticed, misinform the users of data, creating the risk for incorrect decisions with unintended or even catastrophic consequences. We have developed an open-source software implementation of signal quality indices (SQIs) for the analysis of time-series data. We codify a range of SQIs, demonstrate them using established benchmark data, and show that they can be effective for signal quality assessment. We also study alternative approaches to denoising time-series data in an attempt to improve the quality of the already degraded signal, and evaluate them empirically on relevant real-world data. To our knowledge, our software toolkit is the first to provide an open source implementation of a broad range of signal quality assessment and improvement techniques validated on publicly available benchmark data for ease of reproducibility. The generality of our framework can be easily extended to assessing reliability of arbitrary time-series measurements in complex systems, especially when morphological patterns of the waveform shapes and signal periodicity are of key interest in downstream analyses. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.01239 [pdf, other]

Motion Informed Needle Segmentation in Ultrasound Images

Authors: Raghavv Goel, Cecilia Morales, Manpreet Singh, Artur Dubrawski, John Galeotti, Howie Choset

Abstract: Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both ne… ▽ More Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both needle features and needle motion. Our method offers three key contributions. First, we propose a compatible framework that seamlessly integrates into commonly used encoder-decoder style architectures. Second, we demonstrate superior performance compared to recent state-of-the-art needle segmentation models using our novel convolutional neural network (CNN) based KF-inspired block, achieving a 15\% reduction in pixel-wise needle tip error and an 8\% reduction in length error. Third, to our knowledge we are the first to implement a learnable filter to incorporate non-linear needle motion for improving needle segmentation. △ Less

Submitted 3 May, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: 7 pages, 4 figures, accepted at ISBI 2024

arXiv:2309.13135 [pdf, other]

Forecasting Response to Treatment with Global Deep Learning and Patient-Specific Pharmacokinetic Priors

Authors: Willa Potosnak, Cristian Challu, Kin G. Olivares, Artur Dubrawski

Abstract: Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local arch… ▽ More Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local architecture and a pharmacokinetic encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains for a blood glucose forecasting task using both realistically simulated and real-world data. Our global-local architecture improves over patient-specific models by 9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The proposed approach can have multiple beneficial applications in clinical practice, such as issuing early warnings about unexpected treatment responses, or hel** to characterize patient-specific treatment effects in terms of drug absorption and elimination characteristics. △ Less

Submitted 15 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 8 pages

arXiv:2306.09467 [pdf, other]

AQuA: A Benchmarking Tool for Label Quality Assessment

Authors: Mononito Goswami, Vedant Sanil, Arjun Choudhry, Arvind Srinivasan, Chalisa Udompanyawit, Artur Dubrawski

Abstract: Machine learning (ML) models are only as good as the data they are trained on. But recent studies have found datasets widely used to train and evaluate ML models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on the train set hurt ML models' ability to generalize, and they impact evaluation and model selection using the test set. Consequently, learning in the presence of label… ▽ More Machine learning (ML) models are only as good as the data they are trained on. But recent studies have found datasets widely used to train and evaluate ML models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on the train set hurt ML models' ability to generalize, and they impact evaluation and model selection using the test set. Consequently, learning in the presence of labeling errors is an active area of research, yet this field lacks a comprehensive benchmark to evaluate these methods. Most of these methods are evaluated on a few computer vision datasets with significant variance in the experimental protocols. With such a large pool of methods and inconsistent evaluation, it is also unclear how ML practitioners can choose the right models to assess label quality in their data. To this end, we propose a benchmarking environment AQuA to rigorously evaluate methods that enable machine learning in the presence of label noise. We also introduce a design space to delineate concrete design choices of label error detection models. We hope that our proposed design space and benchmark enable practitioners to choose the right tools to improve their label quality and that our benchmark enables objective and rigorous evaluation of machine learning tools facing mislabeled data. △ Less

Submitted 16 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code can be found at www.github.com/autonlab/aqua/

arXiv:2305.07089 [pdf, other]

Hierarchically Coherent Multivariate Mixture Networks

Authors: Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski

Abstract: Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical grou**s. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize… ▽ More Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical grou**s. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize the networks with a composite likelihood objective, allowing us to capture time series' relationships while maintaining high computational efficiency. Our approach demonstrates 13.2% average accuracy improvements on most datasets compared to state-of-the-art baselines. We conduct ablation studies of the framework components and provide theoretical foundations for them. To assist related work, the code is available at this https://github.com/Nixtla/neuralforecast. △ Less

Submitted 16 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2302.12504 [pdf, other]

Recovering Sparse and Interpretable Subgroups with Heterogeneous Treatment Effects with Censored Time-to-Event Outcomes

Authors: Chirag Nagpal, Vedant Sanil, Artur Dubrawski

Abstract: Studies involving both randomized experiments as well as observational data typically involve time-to-event outcomes such as time-to-failure, death or onset of an adverse condition. Such outcomes are typically subject to censoring due to loss of follow-up and established statistical practice involves comparing treatment efficacy in terms of hazard ratios between the treated and control groups. In… ▽ More Studies involving both randomized experiments as well as observational data typically involve time-to-event outcomes such as time-to-failure, death or onset of an adverse condition. Such outcomes are typically subject to censoring due to loss of follow-up and established statistical practice involves comparing treatment efficacy in terms of hazard ratios between the treated and control groups. In this paper we propose a statistical approach to recovering sparse phenogroups (or subtypes) that demonstrate differential treatment effects as compared to the study population. Our approach involves modelling the data as a mixture while enforcing parameter shrinkage through structured sparsity regularization. We propose a novel inference procedure for the proposed model and demonstrate its efficacy in recovering sparse phenotypes across large landmark real world clinical studies in cardiovascular health. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Presented as an extended abstract at the Machine Learning for Health Symposium (ML4H) 2022

arXiv:2301.07286 [pdf, other]

Reslicing Ultrasound Images for Data Augmentation and Vessel Reconstruction

Authors: Cecilia Morales, Jason Yao, Tejas Rane, Robert Edman, Howie Choset, Artur Dubrawski

Abstract: Robot-guided catheter insertion has the potential to deliver urgent medical care in situations where medical personnel are unavailable. However, this technique requires accurate and reliable segmentation of anatomical landmarks in the body. For the ultrasound imaging modality, obtaining large amounts of training data for a segmentation model is time-consuming and expensive. This paper introduces R… ▽ More Robot-guided catheter insertion has the potential to deliver urgent medical care in situations where medical personnel are unavailable. However, this technique requires accurate and reliable segmentation of anatomical landmarks in the body. For the ultrasound imaging modality, obtaining large amounts of training data for a segmentation model is time-consuming and expensive. This paper introduces RESUS (RESlicing of UltraSound Images), a weak supervision data augmentation technique for ultrasound images based on slicing reconstructed 3D volumes from tracked 2D images. This technique allows us to generate views which cannot be easily obtained in vivo due to physical constraints of ultrasound imaging, and use these augmented ultrasound images to train a semantic segmentation model. We demonstrate that RESUS achieves statistically significant improvement over training with non-augmented images and highlight qualitative improvements through vessel reconstruction. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2207.03517 [pdf, ps, other]

HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python

Authors: Kin G. Olivares, Federico Garza, David Luo, Cristian Challú, Max Mergenthaler, Souhaib Ben Taieb, Shanika L. Wickramasuriya, Artur Dubrawski

Abstract: Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical grou**s. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting… ▽ More Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical grou**s. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting systems indicates that we are in a propitious moment to ensure that scientific endeavors are grounded on sound baselines. For this reason, we put forward the HierarchicalForecast library, which contains preprocessed publicly available datasets, evaluation metrics, and a compiled set of statistical baseline models. Our Python-based reference framework aims to bridge the gap between statistical and econometric modeling, and Machine Learning forecasting research. Code and documentation are available in https://github.com/Nixtla/hierarchicalforecast. △ Less

Submitted 24 January, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.12088 [pdf, other]

Classifying Unstructured Clinical Notes via Automatic Weak Supervision

Authors: Chufan Gao, Mononito Goswami, Jieshi Chen, Artur Dubrawski

Abstract: Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming bu… ▽ More Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database. △ Less

Submitted 1 August, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

Comments: 18 pages, 3 figures and 6 tables. Accepted at the Machine Learning for Healthcare Conference (MLHC) 2022. Code available at https://github.com/autonlab/KeyClass

arXiv:2206.10462 [pdf, ps, other]

The Digital Twin Landscape at the Crossroads of Predictive Maintenance, Machine Learning and Physics Based Modeling

Authors: Brian Kunzer, Mario Berges, Artur Dubrawski

Abstract: The concept of a digital twin has exploded in popularity over the past decade, yet confusion around its plurality of definitions, its novelty as a new technology, and its practical applicability still exists, all despite numerous reviews, surveys, and press releases. The history of the term digital twin is explored, as well as its initial context in the fields of product life cycle management, ass… ▽ More The concept of a digital twin has exploded in popularity over the past decade, yet confusion around its plurality of definitions, its novelty as a new technology, and its practical applicability still exists, all despite numerous reviews, surveys, and press releases. The history of the term digital twin is explored, as well as its initial context in the fields of product life cycle management, asset maintenance, and equipment fleet management, operations, and planning. A definition for a minimally viable framework to utilize a digital twin is also provided based on seven essential elements. A brief tour through DT applications and industries where DT methods are employed is also outlined. The application of a digital twin framework is highlighted in the field of predictive maintenance, and its extensions utilizing machine learning and physics based modeling. Employing the combination of machine learning and physics based modeling to form hybrid digital twin frameworks, may synergistically alleviate the shortcomings of each method when used in isolation. Key challenges of implementing digital twin models in practice are additionally discussed. As digital twin technology experiences rapid growth and as it matures, its great promise to substantially enhance tools and solutions for intelligent upkeep of complex equipment, are expected to materialize. △ Less

Submitted 23 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: 21 pages, 5 figures

arXiv:2206.09074 [pdf, other]

Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

Authors: Arnab Dey, Mononito Goswami, Joo Heung Yoon, Gilles Clermont, Michael Pinsky, Marilyn Hravnak, Artur Dubrawski

Abstract: A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact.… ▽ More A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact. Previous studies have utilized supervised ML techniques that require substantial amounts of hand-labeled data. However, manually harvesting such data can be costly, time-consuming, and mundane, and is a key factor limiting the widespread adoption of ML in healthcare (HC). Instead, we explore the use of multiple, individually imperfect heuristics to automatically assign probabilistic labels to unlabeled training data using weak supervision. Our weakly supervised models perform competitively with traditional supervised techniques and require less involvement from domain experts, demonstrating their use as efficient and practical alternatives to supervised learning in HC applications of ML. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted at American Medical Informatics Association (AMIA) Annual Symposium 2022. 10 pages, 4 figures and 2 tables

arXiv:2205.00072 [pdf, other]

Doubting AI Predictions: Influence-Driven Second Opinion Recommendation

Authors: Maria De-Arteaga, Alexandra Chouldechova, Artur Dubrawski

Abstract: Effective human-AI collaboration requires a system design that provides humans with meaningful ways to make sense of and critically evaluate algorithmic recommendations. In this paper, we propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions. When machine learning algorithms are trained… ▽ More Effective human-AI collaboration requires a system design that provides humans with meaningful ways to make sense of and critically evaluate algorithmic recommendations. In this paper, we propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions. When machine learning algorithms are trained to predict human-generated assessments, experts' rich multitude of perspectives is frequently lost in monolithic algorithmic recommendations. The proposed approach aims to leverage productive disagreement by (1) identifying whether some experts are likely to disagree with an algorithmic assessment and, if so, (2) recommend an expert to request a second opinion from. △ Less

Submitted 29 April, 2022; originally announced May 2022.

Comments: ACM CHI 2022 Workshop on Trust and Reliance in AI-Human Teams (TRAIT)

arXiv:2204.07276 [pdf, other]

auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenoty** with Censored Time-to-Event Data

Authors: Chirag Nagpal, Willa Potosnak, Artur Dubrawski

Abstract: Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present au… ▽ More Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenoty** for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions. △ Less

Submitted 3 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2203.12546 [pdf, other]

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

Authors: Benedikt Boecking, Vincent Jeanselme, Artur Dubrawski

Abstract: Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete con… ▽ More Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.12023 [pdf, other]

Generative Modeling Helps Weak Supervision (and Vice Versa)

Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

Abstract: Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving… ▽ More Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples. △ Less

Submitted 11 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Published as a conference paper at ICLR 2023

ACM Class: I.2.0; I.4.m

arXiv:2202.11089 [pdf, other]

doi 10.1145/3534678.3539110

Counterfactual Phenoty** with Censored Time-to-Events

Authors: Chirag Nagpal, Mononito Goswami, Keith Dufendach, Artur Dubrawski

Abstract: Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject to censoring. Counterfactual reasoning in such scenarios requires decoupling the effects of confounding physiological characteristics that affect baseline survival rates from the effects of the interventi… ▽ More Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject to censoring. Counterfactual reasoning in such scenarios requires decoupling the effects of confounding physiological characteristics that affect baseline survival rates from the effects of the interventions being assessed. In this paper, we present a latent variable approach to model heterogeneous treatment effects by proposing that an individual can belong to one of latent clusters with distinct response characteristics. We show that this latent structure can mediate the base survival rates and helps determine the effects of an intervention. We demonstrate the ability of our approach to discover actionable phenotypes of individuals based on their treatment response on multiple large randomized clinical trials originally conducted to assess appropriate treatments to reduce cardiovascular risk. △ Less

Submitted 9 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: KDD 2022 Applied Data Science Paper. Note this version includes a correction of the published version in the definition of Restricted Mean Survival Time

arXiv:2201.12886 [pdf, other]

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Authors: Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco, Artur Dubrawski

Abstract: Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpol… ▽ More Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We prove that the hierarchical interpolation technique can efficiently approximate arbitrarily long horizons in the presence of smoothness. Additionally, we conduct extensive large-scale dataset experiments from the long-horizon forecasting literature, demonstrating the advantages of our method over the state-of-the-art methods, where N-HiTS provides an average accuracy improvement of almost 20% over the latest Transformer architectures while reducing the computation time by an order of magnitude (50 times). Our code is available at bit.ly/3VA5DoT △ Less

Submitted 29 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

arXiv:2201.02936 [pdf, other]

Weak Supervision for Affordable Modeling of Electrocardiogram Data

Authors: Mononito Goswami, Benedikt Boecking, Artur Dubrawski

Abstract: Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. W… ▽ More Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. We explore the use of multiple weak supervision sources to learn diagnostic models of abnormal heartbeats via human designed heuristics, without using ground truth labels on individual data points. Our work is among the first to define weak supervision sources directly on time series data. Results show that with as few as six intuitive time series heuristics, we are able to infer high quality probabilistic label estimates for over 100,000 heartbeats with little human effort, and use the estimated labels to train competitive classifiers evaluated on held out test data. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: Accepted at American Medical Informatics Association (AMIA) 2021 Annual Symposium. 10 pages and 6 figures

arXiv:2112.01863 [pdf, other]

Discovery of Crime Event Sequences with Constricted Spatio-Temporal Sequential Patterns

Authors: Piotr S. Maciąg, Robert Bembenik, Artur Dubrawski

Abstract: In this article, we introduce a novel type of spatio-temporal sequential patterns called Constricted Spatio-Temporal Sequential (CSTS) patterns and thoroughly analyze their properties. We demonstrate that the set of CSTS patterns is a concise representation of all spatio-temporal sequential patterns that can be discovered in a given dataset. To measure significance of the discovered CSTS patterns… ▽ More In this article, we introduce a novel type of spatio-temporal sequential patterns called Constricted Spatio-Temporal Sequential (CSTS) patterns and thoroughly analyze their properties. We demonstrate that the set of CSTS patterns is a concise representation of all spatio-temporal sequential patterns that can be discovered in a given dataset. To measure significance of the discovered CSTS patterns we adapt the participation index measure. We also provide CSTS-Miner: an algorithm that discovers all participation index strong CSTS patterns in event data. We experimentally evaluate the proposed algorithms using two crime-related datasets: Pittsburgh Police Incident Blotter Dataset and Boston Crime Incident Reports Dataset. In the experiments, the CSTS-Miner algorithm is compared with the other four state-of-the-art algorithms: STS-Miner, CSTPM, STBFM and CST-SPMiner. As the results of experiments suggest, the proposed algorithm discovers much fewer patterns than the other selected algorithms. Finally, we provide the examples of interesting crime-related patterns discovered by the proposed CSTS-Miner algorithm. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: 37 pages

ACM Class: I.5.4

arXiv:2110.13937 [pdf, other]

Provably Robust Model-Centric Explanations for Critical Decision-Making

Authors: Cecilia G. Morales, Nicholas Gisolfi, Robert Edman, James K. Miller, Artur Dubrawski

Abstract: We recommend using a model-centric, Boolean Satisfiability (SAT) formalism to obtain useful explanations of trained model behavior, different and complementary to what can be gleaned from LIME and SHAP, popular data-centric explanation tools in Artificial Intelligence (AI). We compare and contrast these methods, and show that data-centric methods may yield brittle explanations of limited practical… ▽ More We recommend using a model-centric, Boolean Satisfiability (SAT) formalism to obtain useful explanations of trained model behavior, different and complementary to what can be gleaned from LIME and SHAP, popular data-centric explanation tools in Artificial Intelligence (AI). We compare and contrast these methods, and show that data-centric methods may yield brittle explanations of limited practical utility. The model-centric framework, however, can offer actionable insights into risks of using AI models in practice. For critical applications of AI, split-second decision making is best informed by robust explanations that are invariant to properties of data, the capability offered by model-centric frameworks. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: 8 pages, 9 figures

arXiv:2107.02233 [pdf, other]

End-to-End Weak Supervision

Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

Abstract: Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sour… ▽ More Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources -- making assumptions that rarely hold in practice -- followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources. △ Less

Submitted 30 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Code URL: https://github.com/autonlab/weasel

Journal ref: Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2106.10302 [pdf, other]

Dependency Structure Misspecification in Multi-Source Weak Supervision Models

Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

Abstract: Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on tes… ▽ More Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021

arXiv:2106.05860 [pdf, other]

DMIDAS: Deep Mixed Data Sampling Regression for Long Multi-Horizon Time Series Forecasting

Authors: Cristian Challu, Kin G. Olivares, Gus Welter, Artur Dubrawski

Abstract: Neural forecasting has shown significant improvements in the accuracy of large-scale systems, yet predicting extremely long horizons remains a challenging task. Two common problems are the volatility of the predictions and their computational complexity; we addressed them by incorporating smoothness regularization and mixed data sampling techniques to a well-performing multi-layer perceptron based… ▽ More Neural forecasting has shown significant improvements in the accuracy of large-scale systems, yet predicting extremely long horizons remains a challenging task. Two common problems are the volatility of the predictions and their computational complexity; we addressed them by incorporating smoothness regularization and mixed data sampling techniques to a well-performing multi-layer perceptron based architecture (NBEATS). We validate our proposed method, DMIDAS, on high-frequency healthcare and electricity price data with long forecasting horizons (~1000 timestamps) where we improve the prediction accuracy by 5% over state-of-the-art models, reducing the number of parameters of NBEATS by nearly 70%. △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:2104.05522 [pdf, other]

doi 10.1016/j.ijforecast.2022.03.001

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx

Authors: Kin G. Olivares, Cristian Challu, Grzegorz Marcjasz, Rafał Weron, Artur Dubrawski

Abstract: We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its applica… ▽ More We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its application to electricity price forecasting (EPF) tasks across a broad range of years and markets. We observe state-of-the-art performance, significantly improving the forecast accuracy by nearly 20% over the original NBEATS model, and by up to 5% over other well established statistical and machine learning methods specialized for these tasks. Additionally, the proposed neural network has an interpretable configuration that can structurally decompose time series, visualizing the relative impact of trend and seasonal components and revealing the modeled processes' interactions with exogenous factors. To assist related work we made the code available in https://github.com/cchallu/nbeatsx. △ Less

Submitted 4 April, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: 30 pages, 7 figures, 4 tables

Journal ref: International Journal of Forecasting 2022

arXiv:2101.09648 [pdf, other]

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Authors: Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova

Abstract: Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus… ▽ More Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone. △ Less

Submitted 3 June, 2024; v1 submitted 24 January, 2021; originally announced January 2021.

Comments: Best Paper Runner-Up Award, Workshop on Information Technologies and Systems (WITS), 2021

arXiv:2012.06046 [pdf, other]

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

Authors: Benedikt Boecking, Willie Neiswanger, Eric Xing, Artur Dubrawski

Abstract: Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art perf… ▽ More Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art performance in diverse domains such as healthcare and e-commerce. One practical issue with learning from user-generated heuristics is that their creation requires creativity, foresight, and domain expertise from those who hand-craft them, a process which can be tedious and subjective. We develop the first framework for interactive weak supervision in which a method proposes heuristics and learns from user feedback given on each proposed heuristic. Our experiments demonstrate that only a small number of feedback iterations are needed to train models that achieve highly competitive test set performance without access to ground truth training labels. We conduct user studies, which show that users are able to effectively provide feedback on heuristics and that test set results track the performance of simulated oracles. △ Less

Submitted 25 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

Comments: Accepted as a conference paper at ICLR 2021

arXiv:2007.05166 [pdf, other]

Self-Reflective Variational Autoencoder

Authors: Ifigeneia Apostolopoulou, Elan Rosenfeld, Artur Dubrawski

Abstract: The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the… ▽ More The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the exact posterior, but this flexibility comes at a cost: such models are expensive to train in high-dimensional regimes and can be slow to produce samples. In this work, we introduce an orthogonal solution, which we call self-reflective inference. By redesigning the hierarchical structure of existing VAE architectures, self-reflection ensures that the stochastic flow preserves the factorization of the exact posterior, sequentially updating the latent codes in a recurrent manner consistent with the generative model. We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior - on binarized MNIST, self-reflective inference achieves state-of-the art performance without resorting to complex, computationally expensive components such as autoregressive layers. Moreover, we design a variational normalizing flow that employs the proposed architecture, yielding predictive benefits compared to its purely generative counterpart. Our proposed modification is quite general and complements the existing literature; self-reflective inference can naturally leverage advances in distribution estimation and generative modeling to improve the capacity of each layer in the hierarchy. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2006.08910 [pdf, other]

Preference-based Reinforcement Learning with Finite-Time Guarantees

Authors: Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

Abstract: Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite… ▽ More Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems. △ Less

Submitted 23 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020). Spotlight presentation

arXiv:2005.05239 [pdf, other]

System-Level Predictive Maintenance: Review of Research Literature and Gap Analysis

Authors: Kyle Miller, Artur Dubrawski

Abstract: This paper reviews current literature in the field of predictive maintenance from the system point of view. We differentiate the existing capabilities of condition estimation and failure risk forecasting as currently applied to simple components, from the capabilities needed to solve the same tasks for complex assets. System-level analysis faces more complex latent degradation states, it has to co… ▽ More This paper reviews current literature in the field of predictive maintenance from the system point of view. We differentiate the existing capabilities of condition estimation and failure risk forecasting as currently applied to simple components, from the capabilities needed to solve the same tasks for complex assets. System-level analysis faces more complex latent degradation states, it has to comprehensively account for active maintenance programs at each component level and consider coupling between different maintenance actions, while reflecting increased monetary and safety costs for system failures. As a result, methods that are effective for forecasting risk and informing maintenance decisions regarding individual components do not readily scale to provide reliable sub-system or system level insights. A novel holistic modeling approach is needed to incorporate available structural and physical knowledge and naturally handle the complexities of actively fielded and maintained assets. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: 24 pages, 3 figures

MSC Class: 97R40 ACM Class: I.2.1

arXiv:2003.01176 [pdf, other]

Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks

Authors: Chirag Nagpal, Xinyu Rachel Li, Artur Dubrawski

Abstract: We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data in a fully parametric manner. Our approach does not require making strong assumptions of constant proportional hazard of the underlying survival distribution, as required by the Cox-proportional hazard model. By jointly learning deep nonlinear representations of the input covariates, we… ▽ More We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data in a fully parametric manner. Our approach does not require making strong assumptions of constant proportional hazard of the underlying survival distribution, as required by the Cox-proportional hazard model. By jointly learning deep nonlinear representations of the input covariates, we demonstrate the benefits of our approach when used to estimate survival risks through extensive experimentation on multiple real world datasets with different levels of censoring. We further demonstrate advantages of our model in the competing risks scenario. To the best of our knowledge, this is the first work involving fully parametric estimation of survival times with competing risks in the presence of censoring. △ Less

Submitted 9 June, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: Also appeared in NeurIPS 2019 Workshop on Machine Learning for Healthcare (ML4H)

Journal ref: IEEE Journal of Biomedical and Health Informatics, 2021

arXiv:1912.07685 [pdf, other]

Pairwise Feedback for Data Programming

Authors: Benedikt Boecking, Artur Dubrawski

Abstract: The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incor… ▽ More The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incorporating pairwise feedback into the process. We discuss the ease with which such pairwise feedback can be obtained or generated in many application domains. Our experiments show that even a small number of sources of pairwise feedback can substantially improve the quality of the posterior estimate of the latent class variable. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: Presented at the NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms

arXiv:1911.05121 [pdf, other]

Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning

Authors: Chufan Gao, Fabian Falck, Mononito Goswami, Anthony Wertz, Michael R. Pinsky, Artur Dubrawski

Abstract: Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians' intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coars… ▽ More Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians' intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coarse level. Although previous research has shown that machine learning models can perform well in hemorrhage detection and survival prediction, it is unclear whether machine learning could help to identify and characterize the underlying physiological responses in raw vital sign data. We approach this problem by first transforming the high-dimensional vital sign time series into a tractable, lower-dimensional latent space using a dilated, causal convolutional encoder model trained purely unsupervised. Second, we identify informative clusters in the embeddings. By analyzing the clusters of latent embeddings and visualizing them over time, we hypothesize that the clusters correspond to the physiological response patterns that match physicians' intuition. Furthermore, we attempt to evaluate the latent embeddings using a variety of methods, such as predicting the cluster labels using explainable features. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1911.00980 [pdf, other]

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Authors: Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski

Abstract: We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB… ▽ More We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\fracΦ{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $Φ$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $Φ$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: 19 pages, 3 figures

arXiv:1910.07567 [pdf, other]

Active Learning for Graph Neural Networks via Node Feature Propagation

Authors: Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, Artur Dubrawski

Abstract: Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with… ▽ More Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with other data types like text, images, etc., how to make it effective over graphs is an open question for research. In this paper, we present an investigation on active learning with GNNs for node classification tasks. Specifically, we propose a new method, which uses node feature propagation followed by K-Medoids clustering of the nodes for instance selection in active learning. With a theoretical bound analysis we justify the design choice of our approach. In our experiments on four benchmark datasets, the proposed method outperforms other representative baseline methods consistently and significantly. △ Less

Submitted 19 November, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: 15 pages, 5 figures

arXiv:1910.06368 [pdf, other]

Thresholding Bandit Problem with Both Duels and Pulls

Authors: Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski

Abstract: The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also \emph{duel} two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to… ▽ More The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also \emph{duel} two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels. △ Less

Submitted 12 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: 15 pages, 8 figures; The 23rd International Conference on Artificial Intelligence and Statistics

arXiv:1905.05865 [pdf, other]

Nonlinear Semi-Parametric Models for Survival Analysis

Authors: Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

Abstract: Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the… ▽ More Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the input covariates and demonstrate improved performance. In this paper we argue against such deep parameterizations for survival analysis and experimentally demonstrate that more interpretable semi-parametric models inspired from mixtures of experts perform equally well or in some cases better than such overly parameterized deep models. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1811.02525 [pdf, other]

Double Adaptive Stochastic Gradient Optimization

Authors: Kin Gutierrez, ** Li, Cristian Challu, Artur Dubrawski

Abstract: Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning commun… ▽ More Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning community, and recent advances in adaptive probabilistic algorithms.We analyze the theoretical convergence improvements of our approach in a stochastic convex optimization setting, and provide empirical validation of our findings with convex and non convex objectives. We observe that the benefits of~\textsc{DASGrad} increase with the model complexity and variability of the gradients, and we explore the resulting utility in extensions of distribution-matching multitask learning. △ Less

Submitted 6 November, 2018; originally announced November 2018.

arXiv:1807.06713 [pdf, ps, other]

On the Interaction Effects Between Prediction and Clustering

Authors: Matt Barnes, Artur Dubrawski

Abstract: Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimati… ▽ More Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimating the out-of-cluster (OOC) prediction loss given an approximate clustering with probabilistic error rate $p_0$. Traditional cross-validation techniques exhibit significant empirical bias in this setting, and the few attempts to estimate and correct for these effects are intractable on larger datasets. Further, no previous work has been able to characterize the conditions under which these empirical effects occur, and if they do, what properties they have. We precisely answer these questions by providing theoretical properties which hold in various settings, and prove that expected out-of-cluster loss behavior rapidly decays with even minor clustering errors. Fortunately, we are able to leverage these same properties to construct hypothesis tests and scalable estimators necessary for correcting the problem. Empirical results on benchmark datasets validate our theoretical results and demonstrate how scaling techniques provide solutions to new classes of problems. △ Less

Submitted 28 December, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, Volume 89

arXiv:1807.00905 [pdf, other]

Learning under selective labels in the presence of expert consistency

Authors: Maria De-Arteaga, Artur Dubrawski, Alexandra Chouldechova

Abstract: We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients… ▽ More We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients. In this paper we discuss why selective labels often cannot be effectively tackled by standard methods for adjusting for sample selection bias, even if there are no unobservables. We propose a data augmentation approach that can be used to either leverage expert consistency to mitigate the partial blindness that results from selective labels, or to empirically validate whether learning under such framework may lead to unreliable models prone to systemic discrimination. △ Less

Submitted 4 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: Presented at the 2018 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

arXiv:1806.03286 [pdf, other]

Regression with Comparisons: Esca** the Curse of Dimensionality with Ordinal Information

Authors: Yichong Xu, Sivaraman Balakrishnan, Aarti Singh, Artur Dubrawski

Abstract: In supervised learning, we typically leverage a fully labeled dataset to design methods for function estimation or prediction. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. In this paper, we consider a semi-supervised regression settin… ▽ More In supervised learning, we typically leverage a fully labeled dataset to design methods for function estimation or prediction. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. In this paper, we consider a semi-supervised regression setting, where we obtain additional ordinal (or comparison) information for the unlabeled samples. We consider ordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples. We provide a precise quantification of the usefulness of these types of ordinal feedback in both nonparametric and linear regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively \emph{esca** the curse of dimensionality}. We also present lower bounds, that establish fundamental limits for the task and show that our algorithms are optimal in a variety of settings. Finally, we present extensive experiments on new datasets that demonstrate the efficacy and practicality of our algorithms and investigate their robustness to various sources of noise and model misspecification. △ Less

Submitted 6 November, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: 52 pages, 11 figures; Preliminary version in International Conference on Machine Learning 2018

Journal ref: Journal of Machine Learning Research 21 (2020) 1-54

arXiv:1804.10742 [pdf, other]

Novel Prediction Techniques Based on Clusterwise Linear Regression

Authors: Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski

Abstract: In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a rea… ▽ More In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. △ Less

Submitted 28 April, 2018; originally announced April 2018.

arXiv:1709.05602 [pdf, ps, other]

Characterization of Hemodynamic Signal by Learning Multi-View Relationships

Authors: Eric Lei, Kyle Miller, Michael R. Pinsky, Artur Dubrawski

Abstract: Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data… ▽ More Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data in an explainable manner. To address this challenge, we propose a method to characterize globally nonlinear multi-view relationships as a mixture of linear relationships. A clustering method, it identifies partitions of observations that exhibit the same relationships and learns those relationships simultaneously. It defines cluster variables by multi-view rather than spatial relationships, unlike almost all other clustering methods. Furthermore, we introduce a supervised classification method that builds on our clustering method by employing multi-view relationships as discriminative factors. The value of these methods resides in their capability to find useful structure in the data that single-view or current multi-view methods may struggle to find. We demonstrate the potential utility of the proposed approach using an application in clinical informatics to detect and characterize slow bleeding in patients whose central venous pressure (CVP) is monitored at the bedside. Presently, CVP is considered an insensitive measure of a subject's intravascular volume status or its change. However, we reason that features of CVP during inspiration and expiration should be informative in early identification of emerging changes of patient status. We empirically show how the proposed method can help discover and analyze multiple-to-multiple correlations, which could be nonlinear or vary throughout the population, by finding explainable structure of operational interest to practitioners. △ Less

Submitted 8 December, 2019; v1 submitted 16 September, 2017; originally announced September 2017.

arXiv:1705.00334 [pdf, other]

Scaling Active Search using Linear Similarity Functions

Authors: Sibi Venkatesan, James K. Miller, Jeff Schneider, Artur Dubrawski

Abstract: Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are… ▽ More Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are given a similarity function between data points. We look at an algorithm introduced by Wang et al. [2013] for Active Search over graphs and propose crucial modifications which allow it to scale significantly. Their approach selects points by minimizing an energy function over the graph induced by the similarity function on the data. Our modifications require the similarity function to be a dot-product between feature vectors of data points, equivalent to having a linear kernel for the adjacency matrix. With this, we are able to scale tremendously: for $n$ data points, the original algorithm runs in $O(n^2)$ time per iteration while ours runs in only $O(nr + r^2)$ given $r$-dimensional features. We also describe a simple alternate approach using a weighted-neighbor predictor which also scales well. In our experiments, we show that our method is competitive with existing semi-supervised approaches. We also briefly discuss conditions under which our algorithm performs well. △ Less

Submitted 21 August, 2017; v1 submitted 30 April, 2017; originally announced May 2017.

Comments: To be published as conference paper at IJCAI 2017, 11 pages, 2 figures

arXiv:1605.08455 [pdf, other]

Suppressing Background Radiation Using Poisson Principal Component Analysis

Authors: P. Tandon, P. Huggins, A. Dubrawski, S. Labov, K. Nelson

Abstract: Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source det… ▽ More Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source detection score. We have shown the method's utility in a threat detection system using mobile spectrometers in urban scenes (Tandon et al 2012). While it is commonly assumed that measured photon counts follow a Poisson process, standard PCA makes a Gaussian assumption about the data distribution, which may be a poor approximation when photon counts are low. This paper studies whether and in what conditions PCA with a Poisson-based loss function (Poisson PCA) can outperform standard Gaussian PCA in modeling background radiation to enable more sensitive and specific nuclear threat detection. △ Less

Submitted 26 May, 2016; originally announced May 2016.

arXiv:1603.02578 [pdf, other]

Batched Lazy Decision Trees

Authors: Mathieu Guillame-Bert, Artur Dubrawski

Abstract: We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well as… ▽ More We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well as memory consumption, without compromising accuracy. △ Less

Submitted 8 March, 2016; originally announced March 2016.

Comments: 7 pages, 2 figures, 3 tables, 3 algorithms

arXiv:1511.06419 [pdf, other]

Canonical Autocorrelation Analysis

Authors: Maria De-Arteaga, Artur Dubrawski, Peter Huggins

Abstract: We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar… ▽ More We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivariate correlations within just one set of variables. This can be useful when we look for hidden parsimonious structures in data, each involving only a small subset of all features. In addition, the discovered correlations are highly interpretable as they are formed by pairs of sparse linear combinations of the original features. We show how CAA can be of use as a tool for anomaly detection when the expected structure of correlations is not followed by anomalous data. We illustrate the utility of CAA in two application domains where single-class and unsupervised learning of correlation structures are particularly relevant: breast cancer diagnosis and radiation threat detection. When applied to the Wisconsin Breast Cancer data, single-class CAA is competitive with supervised methods used in literature. On the radiation threat detection task, unsupervised CAA performs significantly better than an unsupervised alternative prevalent in the domain, while providing valuable additional insights for threat analysis. △ Less

Submitted 19 November, 2015; originally announced November 2015.

Comments: 6 pages, 5 figures

arXiv:1509.06659 [pdf, other]

An Entity Resolution approach to isolate instances of Human Trafficking online

Authors: Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski

Abstract: Human trafficking is a challenging law enforcement problem, and a large amount of such activity manifests itself on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is an even more convolved a task. In this paper we propose and entity resolution pipeline using a notion of proxy labels, in order to extract cl… ▽ More Human trafficking is a challenging law enforcement problem, and a large amount of such activity manifests itself on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is an even more convolved a task. In this paper we propose and entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities. △ Less

Submitted 18 June, 2017; v1 submitted 22 September, 2015; originally announced September 2015.

Showing 1–50 of 51 results for author: Dubrawski, A