Skip to main content

Showing 1–39 of 39 results for author: Dubrawski, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.10775  [pdf, other

    cs.LG cs.AI stat.ML

    A Rate-Distortion View of Uncertainty Quantification

    Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski

    Abstract: In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching de… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Journal ref: International Conference on Machine Learning, 2024

  2. arXiv:2405.17672  [pdf, other

    cs.LG cs.AI stat.ML

    Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise

    Authors: Lukasz Sztukiewicz, Jack Henry Good, Artur Dubrawski

    Abstract: In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to im… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2305.07089  [pdf, other

    stat.ML cs.LG stat.ME

    Hierarchically Coherent Multivariate Mixture Networks

    Authors: Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski

    Abstract: Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical grou**s. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize… ▽ More

    Submitted 16 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  4. arXiv:2302.12504  [pdf, other

    stat.ME cs.LG stat.ML

    Recovering Sparse and Interpretable Subgroups with Heterogeneous Treatment Effects with Censored Time-to-Event Outcomes

    Authors: Chirag Nagpal, Vedant Sanil, Artur Dubrawski

    Abstract: Studies involving both randomized experiments as well as observational data typically involve time-to-event outcomes such as time-to-failure, death or onset of an adverse condition. Such outcomes are typically subject to censoring due to loss of follow-up and established statistical practice involves comparing treatment efficacy in terms of hazard ratios between the treated and control groups. In… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Presented as an extended abstract at the Machine Learning for Health Symposium (ML4H) 2022

  5. arXiv:2207.03517  [pdf, ps, other

    stat.ML cs.AI cs.LG

    HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python

    Authors: Kin G. Olivares, Federico Garza, David Luo, Cristian Challú, Max Mergenthaler, Souhaib Ben Taieb, Shanika L. Wickramasuriya, Artur Dubrawski

    Abstract: Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical grou**s. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting… ▽ More

    Submitted 24 January, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

  6. arXiv:2204.07276  [pdf, other

    cs.LG cs.MS stat.ML

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenoty** with Censored Time-to-Event Data

    Authors: Chirag Nagpal, Willa Potosnak, Artur Dubrawski

    Abstract: Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present au… ▽ More

    Submitted 3 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  7. arXiv:2203.12546  [pdf, other

    cs.LG cs.AI stat.ML

    Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

    Authors: Benedikt Boecking, Vincent Jeanselme, Artur Dubrawski

    Abstract: Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete con… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  8. arXiv:2203.12023  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Generative Modeling Helps Weak Supervision (and Vice Versa)

    Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

    Abstract: Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving… ▽ More

    Submitted 11 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Published as a conference paper at ICLR 2023

    ACM Class: I.2.0; I.4.m

  9. arXiv:2202.11089  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Counterfactual Phenoty** with Censored Time-to-Events

    Authors: Chirag Nagpal, Mononito Goswami, Keith Dufendach, Artur Dubrawski

    Abstract: Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject to censoring. Counterfactual reasoning in such scenarios requires decoupling the effects of confounding physiological characteristics that affect baseline survival rates from the effects of the interventi… ▽ More

    Submitted 9 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: KDD 2022 Applied Data Science Paper. Note this version includes a correction of the published version in the definition of Restricted Mean Survival Time

  10. arXiv:2107.02233  [pdf, other

    cs.LG cs.AI stat.ML

    End-to-End Weak Supervision

    Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

    Abstract: Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sour… ▽ More

    Submitted 30 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Code URL: https://github.com/autonlab/weasel

    Journal ref: Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  11. arXiv:2106.10302  [pdf, other

    cs.LG cs.AI stat.ML

    Dependency Structure Misspecification in Multi-Source Weak Supervision Models

    Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

    Abstract: Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on tes… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021

  12. arXiv:2106.05860  [pdf, other

    cs.LG stat.ML

    DMIDAS: Deep Mixed Data Sampling Regression for Long Multi-Horizon Time Series Forecasting

    Authors: Cristian Challu, Kin G. Olivares, Gus Welter, Artur Dubrawski

    Abstract: Neural forecasting has shown significant improvements in the accuracy of large-scale systems, yet predicting extremely long horizons remains a challenging task. Two common problems are the volatility of the predictions and their computational complexity; we addressed them by incorporating smoothness regularization and mixed data sampling techniques to a well-performing multi-layer perceptron based… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  13. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx

    Authors: Kin G. Olivares, Cristian Challu, Grzegorz Marcjasz, Rafał Weron, Artur Dubrawski

    Abstract: We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its applica… ▽ More

    Submitted 4 April, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: 30 pages, 7 figures, 4 tables

    Journal ref: International Journal of Forecasting 2022

  14. arXiv:2012.06046  [pdf, other

    cs.LG cs.AI stat.ML

    Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

    Authors: Benedikt Boecking, Willie Neiswanger, Eric Xing, Artur Dubrawski

    Abstract: Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art perf… ▽ More

    Submitted 25 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Accepted as a conference paper at ICLR 2021

  15. arXiv:2007.05166  [pdf, other

    cs.LG cs.CV stat.ML

    Self-Reflective Variational Autoencoder

    Authors: Ifigeneia Apostolopoulou, Elan Rosenfeld, Artur Dubrawski

    Abstract: The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  16. arXiv:2006.08910  [pdf, other

    cs.LG cs.AI stat.ML

    Preference-based Reinforcement Learning with Finite-Time Guarantees

    Authors: Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

    Abstract: Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite… ▽ More

    Submitted 23 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020). Spotlight presentation

  17. arXiv:2003.01176  [pdf, other

    cs.LG stat.AP stat.ML

    Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks

    Authors: Chirag Nagpal, Xinyu Rachel Li, Artur Dubrawski

    Abstract: We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data in a fully parametric manner. Our approach does not require making strong assumptions of constant proportional hazard of the underlying survival distribution, as required by the Cox-proportional hazard model. By jointly learning deep nonlinear representations of the input covariates, we… ▽ More

    Submitted 9 June, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Also appeared in NeurIPS 2019 Workshop on Machine Learning for Healthcare (ML4H)

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2021

  18. arXiv:1912.07685  [pdf, other

    cs.LG stat.ML

    Pairwise Feedback for Data Programming

    Authors: Benedikt Boecking, Artur Dubrawski

    Abstract: The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incor… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: Presented at the NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms

  19. arXiv:1911.05121  [pdf, other

    cs.LG stat.ML

    Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning

    Authors: Chufan Gao, Fabian Falck, Mononito Goswami, Anthony Wertz, Michael R. Pinsky, Artur Dubrawski

    Abstract: Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians' intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coars… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  20. arXiv:1911.00980  [pdf, other

    cs.LG stat.ML

    Zeroth Order Non-convex optimization with Dueling-Choice Bandits

    Authors: Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski

    Abstract: We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Comments: 19 pages, 3 figures

  21. arXiv:1910.07567  [pdf, other

    cs.LG stat.ML

    Active Learning for Graph Neural Networks via Node Feature Propagation

    Authors: Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, Artur Dubrawski

    Abstract: Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with… ▽ More

    Submitted 19 November, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: 15 pages, 5 figures

  22. arXiv:1910.06368  [pdf, other

    cs.LG stat.ML

    Thresholding Bandit Problem with Both Duels and Pulls

    Authors: Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski

    Abstract: The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also \emph{duel} two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to… ▽ More

    Submitted 12 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 15 pages, 8 figures; The 23rd International Conference on Artificial Intelligence and Statistics

  23. arXiv:1905.05865  [pdf, other

    cs.LG stat.ML

    Nonlinear Semi-Parametric Models for Survival Analysis

    Authors: Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

    Abstract: Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  24. arXiv:1811.02525  [pdf, other

    stat.ML cs.LG

    Double Adaptive Stochastic Gradient Optimization

    Authors: Kin Gutierrez, ** Li, Cristian Challu, Artur Dubrawski

    Abstract: Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning commun… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  25. arXiv:1807.06713  [pdf, ps, other

    stat.ML cs.LG

    On the Interaction Effects Between Prediction and Clustering

    Authors: Matt Barnes, Artur Dubrawski

    Abstract: Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimati… ▽ More

    Submitted 28 December, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, Volume 89

  26. arXiv:1807.00905  [pdf, other

    cs.LG stat.ML

    Learning under selective labels in the presence of expert consistency

    Authors: Maria De-Arteaga, Artur Dubrawski, Alexandra Chouldechova

    Abstract: We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients… ▽ More

    Submitted 4 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: Presented at the 2018 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

  27. arXiv:1806.03286  [pdf, other

    stat.ML cs.LG

    Regression with Comparisons: Esca** the Curse of Dimensionality with Ordinal Information

    Authors: Yichong Xu, Sivaraman Balakrishnan, Aarti Singh, Artur Dubrawski

    Abstract: In supervised learning, we typically leverage a fully labeled dataset to design methods for function estimation or prediction. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. In this paper, we consider a semi-supervised regression settin… ▽ More

    Submitted 6 November, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    Comments: 52 pages, 11 figures; Preliminary version in International Conference on Machine Learning 2018

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-54

  28. arXiv:1804.10742  [pdf, other

    cs.LG stat.ML

    Novel Prediction Techniques Based on Clusterwise Linear Regression

    Authors: Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski

    Abstract: In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a rea… ▽ More

    Submitted 28 April, 2018; originally announced April 2018.

  29. Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador

    Authors: Maria De-Arteaga, Artur Dubrawski

    Abstract: When sexual violence is a product of organized crime or social imaginary, the links between sexual violence episodes can be understood as a latent structure. With this assumption in place, we can use data science to uncover complex patterns. In this paper we focus on the use of data mining techniques to unveil complex anomalous spatiotemporal patterns of sexual violence. We illustrate their use by… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: Conference paper at Data for Policy 2016 - Frontiers of Data Science for Government: Ideas, Practices and Projections (Data for Policy)

  30. arXiv:1709.05602  [pdf, ps, other

    stat.ML cs.LG

    Characterization of Hemodynamic Signal by Learning Multi-View Relationships

    Authors: Eric Lei, Kyle Miller, Michael R. Pinsky, Artur Dubrawski

    Abstract: Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data… ▽ More

    Submitted 8 December, 2019; v1 submitted 16 September, 2017; originally announced September 2017.

  31. arXiv:1705.00334  [pdf, other

    stat.ML cs.LG

    Scaling Active Search using Linear Similarity Functions

    Authors: Sibi Venkatesan, James K. Miller, Jeff Schneider, Artur Dubrawski

    Abstract: Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are… ▽ More

    Submitted 21 August, 2017; v1 submitted 30 April, 2017; originally announced May 2017.

    Comments: To be published as conference paper at IJCAI 2017, 11 pages, 2 figures

  32. arXiv:1704.05820  [pdf, other

    stat.ML

    Noise-Tolerant Interactive Learning from Pairwise Comparisons

    Authors: Yichong Xu, Hongyang Zhang, Aarti Singh, Kyle Miller, Artur Dubrawski

    Abstract: We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of… ▽ More

    Submitted 19 May, 2017; v1 submitted 19 April, 2017; originally announced April 2017.

    Comments: 28 pages, 1 figure, 3 tables

  33. Evaluation of Coded Aperture Radiation Detectors using a Bayesian Approach

    Authors: K. Miller, P. Huggins, A. Dubrawski, S. Labov, K. Nelson

    Abstract: We investigate the utility of coded aperture (CA) for roadside radiation threat detection applications. With coded aperture, information in the form of photon quantity is traded for directional information. Whether and in what scenarios this trade-off is beneficial is the focus of this study. We quantify the impact of a masking approach by comparing performance with an unmasked approach in terms o… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

    Comments: Submission abstract for poster presented at nuclear science symposium (NSS) 2015

  34. arXiv:1605.08455  [pdf, other

    cs.LG physics.data-an stat.ML

    Suppressing Background Radiation Using Poisson Principal Component Analysis

    Authors: P. Tandon, P. Huggins, A. Dubrawski, S. Labov, K. Nelson

    Abstract: Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source det… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  35. arXiv:1605.01779  [pdf, other

    stat.ML

    Clustering on the Edge: Learning Structure in Graphs

    Authors: Matt Barnes, Artur Dubrawski

    Abstract: With the recent popularity of graphical clustering methods, there has been an increased focus on the information between samples. We show how learning cluster structure using edge features naturally and simultaneously determines the most likely number of clusters and addresses data scale issues. These results are particularly useful in instances where (a) there are a large number of clusters and (… ▽ More

    Submitted 5 May, 2016; originally announced May 2016.

  36. arXiv:1602.05048  [pdf, other

    stat.AP

    Do Public Events Affect Sex Trafficking Activity?

    Authors: Kyle Miller, Emily Kennedy, Artur Dubrawski

    Abstract: For several years the pervasive belief that the Super Bowl is the single biggest day for human trafficking in the United States each year has been perpetuated in popular press despite a lack of evidentiary support. The practice of relying on hearsay and popular belief for decision-making may result in misappropriation of resources in anti-trafficking efforts. We propose a data-driven approach to a… ▽ More

    Submitted 16 February, 2016; originally announced February 2016.

  37. arXiv:1511.06419  [pdf, other

    stat.ML cs.LG

    Canonical Autocorrelation Analysis

    Authors: Maria De-Arteaga, Artur Dubrawski, Peter Huggins

    Abstract: We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar… ▽ More

    Submitted 19 November, 2015; originally announced November 2015.

    Comments: 6 pages, 5 figures

  38. arXiv:1511.04402  [pdf, other

    stat.ML

    Lass-0: sparse non-convex regression by local search

    Authors: William Herlands, Maria De-Arteaga, Daniel Neill, Artur Dubrawski

    Abstract: We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate h… ▽ More

    Submitted 17 February, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015)

  39. arXiv:1509.03302  [pdf, ps, other

    stat.ML cs.CY cs.DB cs.LG

    Performance Bounds for Pairwise Entity Resolution

    Authors: Matt Barnes, Kyle Miller, Artur Dubrawski

    Abstract: One significant challenge to scaling entity resolution algorithms to massive datasets is understanding how performance changes after moving beyond the realm of small, manually labeled reference datasets. Unlike traditional machine learning tasks, when an entity resolution algorithm performs well on small hold-out datasets, there is no guarantee this performance holds on larger hold-out datasets. W… ▽ More

    Submitted 10 September, 2015; originally announced September 2015.