Skip to main content

Showing 1–50 of 51 results for author: Dubrawski, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10775  [pdf, other

    cs.LG cs.AI stat.ML

    A Rate-Distortion View of Uncertainty Quantification

    Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski

    Abstract: In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching de… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Journal ref: International Conference on Machine Learning, 2024

  2. arXiv:2405.17672  [pdf, other

    cs.LG cs.AI stat.ML

    Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise

    Authors: Lukasz Sztukiewicz, Jack Henry Good, Artur Dubrawski

    Abstract: In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to im… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2402.03885  [pdf, other

    cs.LG cs.AI

    MOMENT: A Family of Open Time-series Foundation Models

    Authors: Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski

    Abstract: We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, e… ▽ More

    Submitted 13 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024. This version contains new experimental results and a section on contemporary work

  4. arXiv:2402.00803  [pdf, other

    cs.LG eess.SP

    Signal Quality Auditing for Time-series Data

    Authors: Chufan Gao, Nicholas Gisolfi, Artur Dubrawski

    Abstract: Signal quality assessment (SQA) is required for monitoring the reliability of data acquisition systems, especially in AI-driven Predictive Maintenance (PMx) application contexts. SQA is vital for addressing "silent failures" of data acquisition hardware and software, which when unnoticed, misinform the users of data, creating the risk for incorrect decisions with unintended or even catastrophic co… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2312.01239  [pdf, other

    eess.IV cs.CV cs.LG

    Motion Informed Needle Segmentation in Ultrasound Images

    Authors: Raghavv Goel, Cecilia Morales, Manpreet Singh, Artur Dubrawski, John Galeotti, Howie Choset

    Abstract: Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both ne… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures, accepted at ISBI 2024

  6. arXiv:2309.13135  [pdf, other

    cs.LG q-bio.QM

    Forecasting Response to Treatment with Global Deep Learning and Patient-Specific Pharmacokinetic Priors

    Authors: Willa Potosnak, Cristian Challu, Kin G. Olivares, Artur Dubrawski

    Abstract: Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local arch… ▽ More

    Submitted 15 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 8 pages

  7. arXiv:2306.09467  [pdf, other

    cs.LG

    AQuA: A Benchmarking Tool for Label Quality Assessment

    Authors: Mononito Goswami, Vedant Sanil, Arjun Choudhry, Arvind Srinivasan, Chalisa Udompanyawit, Artur Dubrawski

    Abstract: Machine learning (ML) models are only as good as the data they are trained on. But recent studies have found datasets widely used to train and evaluate ML models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on the train set hurt ML models' ability to generalize, and they impact evaluation and model selection using the test set. Consequently, learning in the presence of label… ▽ More

    Submitted 16 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code can be found at www.github.com/autonlab/aqua/

  8. arXiv:2305.07089  [pdf, other

    stat.ML cs.LG stat.ME

    Hierarchically Coherent Multivariate Mixture Networks

    Authors: Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski

    Abstract: Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical grou**s. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize… ▽ More

    Submitted 16 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  9. arXiv:2302.12504  [pdf, other

    stat.ME cs.LG stat.ML

    Recovering Sparse and Interpretable Subgroups with Heterogeneous Treatment Effects with Censored Time-to-Event Outcomes

    Authors: Chirag Nagpal, Vedant Sanil, Artur Dubrawski

    Abstract: Studies involving both randomized experiments as well as observational data typically involve time-to-event outcomes such as time-to-failure, death or onset of an adverse condition. Such outcomes are typically subject to censoring due to loss of follow-up and established statistical practice involves comparing treatment efficacy in terms of hazard ratios between the treated and control groups. In… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Presented as an extended abstract at the Machine Learning for Health Symposium (ML4H) 2022

  10. arXiv:2301.07286  [pdf, other

    eess.IV cs.CV cs.LG cs.RO

    Reslicing Ultrasound Images for Data Augmentation and Vessel Reconstruction

    Authors: Cecilia Morales, Jason Yao, Tejas Rane, Robert Edman, Howie Choset, Artur Dubrawski

    Abstract: Robot-guided catheter insertion has the potential to deliver urgent medical care in situations where medical personnel are unavailable. However, this technique requires accurate and reliable segmentation of anatomical landmarks in the body. For the ultrasound imaging modality, obtaining large amounts of training data for a segmentation model is time-consuming and expensive. This paper introduces R… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  11. arXiv:2207.03517  [pdf, ps, other

    stat.ML cs.AI cs.LG

    HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python

    Authors: Kin G. Olivares, Federico Garza, David Luo, Cristian ChallĂș, Max Mergenthaler, Souhaib Ben Taieb, Shanika L. Wickramasuriya, Artur Dubrawski

    Abstract: Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical grou**s. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting… ▽ More

    Submitted 24 January, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

  12. arXiv:2206.12088  [pdf, other

    cs.CL cs.LG

    Classifying Unstructured Clinical Notes via Automatic Weak Supervision

    Authors: Chufan Gao, Mononito Goswami, Jieshi Chen, Artur Dubrawski

    Abstract: Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming bu… ▽ More

    Submitted 1 August, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: 18 pages, 3 figures and 6 tables. Accepted at the Machine Learning for Healthcare Conference (MLHC) 2022. Code available at https://github.com/autonlab/KeyClass

  13. arXiv:2206.10462  [pdf, ps, other

    cs.LG

    The Digital Twin Landscape at the Crossroads of Predictive Maintenance, Machine Learning and Physics Based Modeling

    Authors: Brian Kunzer, Mario Berges, Artur Dubrawski

    Abstract: The concept of a digital twin has exploded in popularity over the past decade, yet confusion around its plurality of definitions, its novelty as a new technology, and its practical applicability still exists, all despite numerous reviews, surveys, and press releases. The history of the term digital twin is explored, as well as its initial context in the fields of product life cycle management, ass… ▽ More

    Submitted 23 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: 21 pages, 5 figures

  14. arXiv:2206.09074  [pdf, other

    cs.LG eess.SP

    Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

    Authors: Arnab Dey, Mononito Goswami, Joo Heung Yoon, Gilles Clermont, Michael Pinsky, Marilyn Hravnak, Artur Dubrawski

    Abstract: A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact.… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at American Medical Informatics Association (AMIA) Annual Symposium 2022. 10 pages, 4 figures and 2 tables

  15. arXiv:2205.00072  [pdf, other

    cs.LG cs.CY cs.HC

    Doubting AI Predictions: Influence-Driven Second Opinion Recommendation

    Authors: Maria De-Arteaga, Alexandra Chouldechova, Artur Dubrawski

    Abstract: Effective human-AI collaboration requires a system design that provides humans with meaningful ways to make sense of and critically evaluate algorithmic recommendations. In this paper, we propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions. When machine learning algorithms are trained… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

    Comments: ACM CHI 2022 Workshop on Trust and Reliance in AI-Human Teams (TRAIT)

  16. arXiv:2204.07276  [pdf, other

    cs.LG cs.MS stat.ML

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenoty** with Censored Time-to-Event Data

    Authors: Chirag Nagpal, Willa Potosnak, Artur Dubrawski

    Abstract: Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present au… ▽ More

    Submitted 3 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  17. arXiv:2203.12546  [pdf, other

    cs.LG cs.AI stat.ML

    Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

    Authors: Benedikt Boecking, Vincent Jeanselme, Artur Dubrawski

    Abstract: Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete con… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  18. arXiv:2203.12023  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Generative Modeling Helps Weak Supervision (and Vice Versa)

    Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

    Abstract: Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving… ▽ More

    Submitted 11 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Published as a conference paper at ICLR 2023

    ACM Class: I.2.0; I.4.m

  19. arXiv:2202.11089  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Counterfactual Phenoty** with Censored Time-to-Events

    Authors: Chirag Nagpal, Mononito Goswami, Keith Dufendach, Artur Dubrawski

    Abstract: Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject to censoring. Counterfactual reasoning in such scenarios requires decoupling the effects of confounding physiological characteristics that affect baseline survival rates from the effects of the interventi… ▽ More

    Submitted 9 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: KDD 2022 Applied Data Science Paper. Note this version includes a correction of the published version in the definition of Restricted Mean Survival Time

  20. arXiv:2201.12886  [pdf, other

    cs.LG cs.AI

    N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

    Authors: Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco, Artur Dubrawski

    Abstract: Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpol… ▽ More

    Submitted 29 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

  21. arXiv:2201.02936  [pdf, other

    eess.SP cs.AI cs.LG

    Weak Supervision for Affordable Modeling of Electrocardiogram Data

    Authors: Mononito Goswami, Benedikt Boecking, Artur Dubrawski

    Abstract: Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. W… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

    Comments: Accepted at American Medical Informatics Association (AMIA) 2021 Annual Symposium. 10 pages and 6 figures

  22. arXiv:2112.01863  [pdf, other

    cs.LG cs.AI cs.DB

    Discovery of Crime Event Sequences with Constricted Spatio-Temporal Sequential Patterns

    Authors: Piotr S. Maciąg, Robert Bembenik, Artur Dubrawski

    Abstract: In this article, we introduce a novel type of spatio-temporal sequential patterns called Constricted Spatio-Temporal Sequential (CSTS) patterns and thoroughly analyze their properties. We demonstrate that the set of CSTS patterns is a concise representation of all spatio-temporal sequential patterns that can be discovered in a given dataset. To measure significance of the discovered CSTS patterns… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: 37 pages

    ACM Class: I.5.4

  23. arXiv:2110.13937  [pdf, other

    cs.LG cs.AI cs.RO

    Provably Robust Model-Centric Explanations for Critical Decision-Making

    Authors: Cecilia G. Morales, Nicholas Gisolfi, Robert Edman, James K. Miller, Artur Dubrawski

    Abstract: We recommend using a model-centric, Boolean Satisfiability (SAT) formalism to obtain useful explanations of trained model behavior, different and complementary to what can be gleaned from LIME and SHAP, popular data-centric explanation tools in Artificial Intelligence (AI). We compare and contrast these methods, and show that data-centric methods may yield brittle explanations of limited practical… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: 8 pages, 9 figures

  24. arXiv:2107.02233  [pdf, other

    cs.LG cs.AI stat.ML

    End-to-End Weak Supervision

    Authors: Salva RĂŒhling Cachay, Benedikt Boecking, Artur Dubrawski

    Abstract: Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sour… ▽ More

    Submitted 30 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Code URL: https://github.com/autonlab/weasel

    Journal ref: Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  25. arXiv:2106.10302  [pdf, other

    cs.LG cs.AI stat.ML

    Dependency Structure Misspecification in Multi-Source Weak Supervision Models

    Authors: Salva RĂŒhling Cachay, Benedikt Boecking, Artur Dubrawski

    Abstract: Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on tes… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021

  26. arXiv:2106.05860  [pdf, other

    cs.LG stat.ML

    DMIDAS: Deep Mixed Data Sampling Regression for Long Multi-Horizon Time Series Forecasting

    Authors: Cristian Challu, Kin G. Olivares, Gus Welter, Artur Dubrawski

    Abstract: Neural forecasting has shown significant improvements in the accuracy of large-scale systems, yet predicting extremely long horizons remains a challenging task. Two common problems are the volatility of the predictions and their computational complexity; we addressed them by incorporating smoothness regularization and mixed data sampling techniques to a well-performing multi-layer perceptron based… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  27. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx

    Authors: Kin G. Olivares, Cristian Challu, Grzegorz Marcjasz, RafaƂ Weron, Artur Dubrawski

    Abstract: We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its applica… ▽ More

    Submitted 4 April, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: 30 pages, 7 figures, 4 tables

    Journal ref: International Journal of Forecasting 2022

  28. arXiv:2101.09648  [pdf, other

    cs.LG cs.HC

    Leveraging Expert Consistency to Improve Algorithmic Decision Support

    Authors: Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova

    Abstract: Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus… ▽ More

    Submitted 3 June, 2024; v1 submitted 24 January, 2021; originally announced January 2021.

    Comments: Best Paper Runner-Up Award, Workshop on Information Technologies and Systems (WITS), 2021

  29. arXiv:2012.06046  [pdf, other

    cs.LG cs.AI stat.ML

    Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

    Authors: Benedikt Boecking, Willie Neiswanger, Eric Xing, Artur Dubrawski

    Abstract: Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art perf… ▽ More

    Submitted 25 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Accepted as a conference paper at ICLR 2021

  30. arXiv:2007.05166  [pdf, other

    cs.LG cs.CV stat.ML

    Self-Reflective Variational Autoencoder

    Authors: Ifigeneia Apostolopoulou, Elan Rosenfeld, Artur Dubrawski

    Abstract: The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  31. arXiv:2006.08910  [pdf, other

    cs.LG cs.AI stat.ML

    Preference-based Reinforcement Learning with Finite-Time Guarantees

    Authors: Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

    Abstract: Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite… ▽ More

    Submitted 23 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020). Spotlight presentation

  32. arXiv:2005.05239  [pdf, other

    cs.AI eess.SY

    System-Level Predictive Maintenance: Review of Research Literature and Gap Analysis

    Authors: Kyle Miller, Artur Dubrawski

    Abstract: This paper reviews current literature in the field of predictive maintenance from the system point of view. We differentiate the existing capabilities of condition estimation and failure risk forecasting as currently applied to simple components, from the capabilities needed to solve the same tasks for complex assets. System-level analysis faces more complex latent degradation states, it has to co… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 24 pages, 3 figures

    MSC Class: 97R40 ACM Class: I.2.1

  33. arXiv:2003.01176  [pdf, other

    cs.LG stat.AP stat.ML

    Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks

    Authors: Chirag Nagpal, Xinyu Rachel Li, Artur Dubrawski

    Abstract: We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data in a fully parametric manner. Our approach does not require making strong assumptions of constant proportional hazard of the underlying survival distribution, as required by the Cox-proportional hazard model. By jointly learning deep nonlinear representations of the input covariates, we… ▽ More

    Submitted 9 June, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Also appeared in NeurIPS 2019 Workshop on Machine Learning for Healthcare (ML4H)

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2021

  34. arXiv:1912.07685  [pdf, other

    cs.LG stat.ML

    Pairwise Feedback for Data Programming

    Authors: Benedikt Boecking, Artur Dubrawski

    Abstract: The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incor… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: Presented at the NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms

  35. arXiv:1911.05121  [pdf, other

    cs.LG stat.ML

    Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning

    Authors: Chufan Gao, Fabian Falck, Mononito Goswami, Anthony Wertz, Michael R. Pinsky, Artur Dubrawski

    Abstract: Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians' intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coars… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  36. arXiv:1911.00980  [pdf, other

    cs.LG stat.ML

    Zeroth Order Non-convex optimization with Dueling-Choice Bandits

    Authors: Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski

    Abstract: We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Comments: 19 pages, 3 figures

  37. arXiv:1910.07567  [pdf, other

    cs.LG stat.ML

    Active Learning for Graph Neural Networks via Node Feature Propagation

    Authors: Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, Artur Dubrawski

    Abstract: Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with… ▽ More

    Submitted 19 November, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: 15 pages, 5 figures

  38. arXiv:1910.06368  [pdf, other

    cs.LG stat.ML

    Thresholding Bandit Problem with Both Duels and Pulls

    Authors: Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski

    Abstract: The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also \emph{duel} two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to… ▽ More

    Submitted 12 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 15 pages, 8 figures; The 23rd International Conference on Artificial Intelligence and Statistics

  39. arXiv:1905.05865  [pdf, other

    cs.LG stat.ML

    Nonlinear Semi-Parametric Models for Survival Analysis

    Authors: Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

    Abstract: Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  40. arXiv:1811.02525  [pdf, other

    stat.ML cs.LG

    Double Adaptive Stochastic Gradient Optimization

    Authors: Kin Gutierrez, ** Li, Cristian Challu, Artur Dubrawski

    Abstract: Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning commun… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  41. arXiv:1807.06713  [pdf, ps, other

    stat.ML cs.LG

    On the Interaction Effects Between Prediction and Clustering

    Authors: Matt Barnes, Artur Dubrawski

    Abstract: Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimati… ▽ More

    Submitted 28 December, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, Volume 89

  42. arXiv:1807.00905  [pdf, other

    cs.LG stat.ML

    Learning under selective labels in the presence of expert consistency

    Authors: Maria De-Arteaga, Artur Dubrawski, Alexandra Chouldechova

    Abstract: We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients… ▽ More

    Submitted 4 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: Presented at the 2018 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

  43. arXiv:1806.03286  [pdf, other

    stat.ML cs.LG

    Regression with Comparisons: Esca** the Curse of Dimensionality with Ordinal Information

    Authors: Yichong Xu, Sivaraman Balakrishnan, Aarti Singh, Artur Dubrawski

    Abstract: In supervised learning, we typically leverage a fully labeled dataset to design methods for function estimation or prediction. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. In this paper, we consider a semi-supervised regression settin… ▽ More

    Submitted 6 November, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    Comments: 52 pages, 11 figures; Preliminary version in International Conference on Machine Learning 2018

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-54

  44. arXiv:1804.10742  [pdf, other

    cs.LG stat.ML

    Novel Prediction Techniques Based on Clusterwise Linear Regression

    Authors: Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski

    Abstract: In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a rea… ▽ More

    Submitted 28 April, 2018; originally announced April 2018.

  45. arXiv:1709.05602  [pdf, ps, other

    stat.ML cs.LG

    Characterization of Hemodynamic Signal by Learning Multi-View Relationships

    Authors: Eric Lei, Kyle Miller, Michael R. Pinsky, Artur Dubrawski

    Abstract: Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data… ▽ More

    Submitted 8 December, 2019; v1 submitted 16 September, 2017; originally announced September 2017.

  46. arXiv:1705.00334  [pdf, other

    stat.ML cs.LG

    Scaling Active Search using Linear Similarity Functions

    Authors: Sibi Venkatesan, James K. Miller, Jeff Schneider, Artur Dubrawski

    Abstract: Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are… ▽ More

    Submitted 21 August, 2017; v1 submitted 30 April, 2017; originally announced May 2017.

    Comments: To be published as conference paper at IJCAI 2017, 11 pages, 2 figures

  47. arXiv:1605.08455  [pdf, other

    cs.LG physics.data-an stat.ML

    Suppressing Background Radiation Using Poisson Principal Component Analysis

    Authors: P. Tandon, P. Huggins, A. Dubrawski, S. Labov, K. Nelson

    Abstract: Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source det… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  48. arXiv:1603.02578  [pdf, other

    cs.LG

    Batched Lazy Decision Trees

    Authors: Mathieu Guillame-Bert, Artur Dubrawski

    Abstract: We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well as… ▽ More

    Submitted 8 March, 2016; originally announced March 2016.

    Comments: 7 pages, 2 figures, 3 tables, 3 algorithms

  49. arXiv:1511.06419  [pdf, other

    stat.ML cs.LG

    Canonical Autocorrelation Analysis

    Authors: Maria De-Arteaga, Artur Dubrawski, Peter Huggins

    Abstract: We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar… ▽ More

    Submitted 19 November, 2015; originally announced November 2015.

    Comments: 6 pages, 5 figures

  50. arXiv:1509.06659  [pdf, other

    cs.SI

    An Entity Resolution approach to isolate instances of Human Trafficking online

    Authors: Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski

    Abstract: Human trafficking is a challenging law enforcement problem, and a large amount of such activity manifests itself on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is an even more convolved a task. In this paper we propose and entity resolution pipeline using a notion of proxy labels, in order to extract cl… ▽ More

    Submitted 18 June, 2017; v1 submitted 22 September, 2015; originally announced September 2015.