Skip to main content

Showing 51–57 of 57 results for author: Dubrawski, A

.
  1. arXiv:1605.01779  [pdf, other

    stat.ML

    Clustering on the Edge: Learning Structure in Graphs

    Authors: Matt Barnes, Artur Dubrawski

    Abstract: With the recent popularity of graphical clustering methods, there has been an increased focus on the information between samples. We show how learning cluster structure using edge features naturally and simultaneously determines the most likely number of clusters and addresses data scale issues. These results are particularly useful in instances where (a) there are a large number of clusters and (… ▽ More

    Submitted 5 May, 2016; originally announced May 2016.

  2. arXiv:1603.02578  [pdf, other

    cs.LG

    Batched Lazy Decision Trees

    Authors: Mathieu Guillame-Bert, Artur Dubrawski

    Abstract: We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well as… ▽ More

    Submitted 8 March, 2016; originally announced March 2016.

    Comments: 7 pages, 2 figures, 3 tables, 3 algorithms

  3. arXiv:1602.05048  [pdf, other

    stat.AP

    Do Public Events Affect Sex Trafficking Activity?

    Authors: Kyle Miller, Emily Kennedy, Artur Dubrawski

    Abstract: For several years the pervasive belief that the Super Bowl is the single biggest day for human trafficking in the United States each year has been perpetuated in popular press despite a lack of evidentiary support. The practice of relying on hearsay and popular belief for decision-making may result in misappropriation of resources in anti-trafficking efforts. We propose a data-driven approach to a… ▽ More

    Submitted 16 February, 2016; originally announced February 2016.

  4. arXiv:1511.06419  [pdf, other

    stat.ML cs.LG

    Canonical Autocorrelation Analysis

    Authors: Maria De-Arteaga, Artur Dubrawski, Peter Huggins

    Abstract: We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar… ▽ More

    Submitted 19 November, 2015; originally announced November 2015.

    Comments: 6 pages, 5 figures

  5. arXiv:1511.04402  [pdf, other

    stat.ML

    Lass-0: sparse non-convex regression by local search

    Authors: William Herlands, Maria De-Arteaga, Daniel Neill, Artur Dubrawski

    Abstract: We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate h… ▽ More

    Submitted 17 February, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015)

  6. arXiv:1509.06659  [pdf, other

    cs.SI

    An Entity Resolution approach to isolate instances of Human Trafficking online

    Authors: Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski

    Abstract: Human trafficking is a challenging law enforcement problem, and a large amount of such activity manifests itself on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is an even more convolved a task. In this paper we propose and entity resolution pipeline using a notion of proxy labels, in order to extract cl… ▽ More

    Submitted 18 June, 2017; v1 submitted 22 September, 2015; originally announced September 2015.

  7. arXiv:1509.03302  [pdf, ps, other

    stat.ML cs.CY cs.DB cs.LG

    Performance Bounds for Pairwise Entity Resolution

    Authors: Matt Barnes, Kyle Miller, Artur Dubrawski

    Abstract: One significant challenge to scaling entity resolution algorithms to massive datasets is understanding how performance changes after moving beyond the realm of small, manually labeled reference datasets. Unlike traditional machine learning tasks, when an entity resolution algorithm performs well on small hold-out datasets, there is no guarantee this performance holds on larger hold-out datasets. W… ▽ More

    Submitted 10 September, 2015; originally announced September 2015.