Skip to main content

Showing 151–200 of 237 results for author: Mahoney, M

.
  1. arXiv:1806.01270  [pdf, other

    cs.DC cs.DB physics.data-an stat.CO

    Alchemist: An Apache Spark <=> MPI Interface

    Authors: Alex Gittens, Kai Rothauge, Shusen Wang, Michael W. Mahoney, Jey Kottalam, Lisa Gerhardt, Prabhat, Michael Ringenburg, Kristyn Maschhoff

    Abstract: The Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce-style programming model can incur significant overheads when performing computations that do not map directly onto this model. One way to mitigate these costs is to off-load computations onto MPI codes. In recent work, we introduced Alchemist, a system for the… ▽ More

    Submitted 3 June, 2018; originally announced June 2018.

    Comments: Accepted for publication in Concurrency and Computation: Practice and Experience, Special Issue on the Cray User Group 2018. arXiv admin note: text overlap with arXiv:1805.11800

  2. arXiv:1805.11800  [pdf, other

    cs.DC cs.DB physics.data-an stat.CO

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Authors: Alex Gittens, Kai Rothauge, Shusen Wang, Michael W. Mahoney, Lisa Gerhardt, Prabhat, Jey Kottalam, Michael Ringenburg, Kristyn Maschhoff

    Abstract: Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 2018

  3. arXiv:1803.08021  [pdf, other

    stat.ML cs.LG stat.CO

    Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap

    Authors: Miles E. Lopes, Shusen Wang, Michael W. Mahoney

    Abstract: Over the course of the past decade, a variety of randomized algorithms have been proposed for computing approximate least-squares (LS) solutions in large-scale settings. A longstanding practical issue is that, for any given input, the user rarely knows the actual error of an approximate solution (relative to the exact solution). Likewise, it is difficult for the user to know precisely how much com… ▽ More

    Submitted 6 September, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

  4. arXiv:1802.09113  [pdf, other

    cs.LG cs.DC math.OC

    GPU Accelerated Sub-Sampled Newton's Method

    Authors: Sudhir B. Kylasa, Farbod Roosta-Khorasani, Michael W. Mahoney, Ananth Grama

    Abstract: First order methods, which solely rely on gradient information, are commonly used in diverse machine learning (ML) and data analysis (DA) applications. This is attributed to the simplicity of their implementations, as well as low per-iteration computational/storage costs. However, they suffer from significant disadvantages; most notably, their performance degrades with increasing problem ill-condi… ▽ More

    Submitted 2 March, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

  5. arXiv:1802.08241  [pdf, other

    cs.CV cs.LG stat.ML

    Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

    Authors: Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney

    Abstract: Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss… ▽ More

    Submitted 2 December, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Comments: Presented in NeurIPS'18 conference

    Journal ref: NeurIPS 2018

  6. arXiv:1802.06925  [pdf, other

    math.OC

    Inexact Non-Convex Newton-Type Methods

    Authors: Zhewei Yao, Peng Xu, Farbod Roosta-Khorasani, Michael W. Mahoney

    Abstract: For solving large-scale non-convex problems, we propose inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations. In particular, in addition to approximate sub-problem solves, both the Hessian and the gradient are suitably approximated. Using rather mild conditions on such approximations, we show that our proposed… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

    Comments: 36 pages, 2 figures

  7. arXiv:1802.06307  [pdf, other

    stat.ML

    Out-of-sample extension of graph adjacency spectral embedding

    Authors: Keith Levin, Farbod Roosta-Khorasani, Michael W. Mahoney, Carey E. Priebe

    Abstract: Many popular dimensionality reduction procedures have out-of-sample extensions, which allow a practitioner to apply a learned embedding to observations not seen in the initial training sample. In this work, we consider the problem of obtaining an out-of-sample extension for the adjacency spectral embedding, a procedure for embedding the vertices of a graph into Euclidean space. We present two diff… ▽ More

    Submitted 17 February, 2018; originally announced February 2018.

  8. arXiv:1712.08880  [pdf, ps, other

    cs.DS stat.ML

    Lectures on Randomized Numerical Linear Algebra

    Authors: Petros Drineas, Michael W. Mahoney

    Abstract: This chapter is based on lectures on Randomized Numerical Linear Algebra from the 2016 Park City Mathematics Institute summer school on The Mathematics of Data.

    Submitted 24 December, 2017; originally announced December 2017.

    Comments: To appear in the edited volume of lectures from the 2016 PCMI summer school

  9. arXiv:1712.06047  [pdf, other

    cs.DC cs.LG math.OC stat.ML

    Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

    Authors: Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney

    Abstract: Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating and synchronizing processors in a parallel setting. Iterative ML methods are particularly sensitive to communication cost since they often require com… ▽ More

    Submitted 16 December, 2017; originally announced December 2017.

    MSC Class: 68W10; 90C25 ACM Class: G.1.6

  10. arXiv:1712.05855  [pdf, other

    cs.AI

    A Berkeley View of Systems Challenges for AI

    Authors: Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

    Abstract: With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

    Comments: Berkeley Technical Report

    Report number: EECS-2017-159

  11. arXiv:1710.09553  [pdf, other

    cs.LG stat.ML

    Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

    Authors: Charles H. Martin, Michael W. Mahoney

    Abstract: We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple De… ▽ More

    Submitted 17 February, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

    Comments: 31 pages; added brief discussion of recent papers that use/extend these ideas

  12. arXiv:1710.06520  [pdf, other

    cs.SI cs.LG

    LASAGNE: Locality And Structure Aware Graph Node Embedding

    Authors: Evgeniy Faerman, Felix Borutta, Kimon Fountoulakis, Michael W. Mahoney

    Abstract: In this work we propose Lasagne, a methodology to learn locality and structure aware graph node embeddings in an unsupervised way. In particular, we show that the performance of existing random-walk based approaches depends strongly on the structural properties of the graph, e.g., the size of the graph, whether the graph has a flat or upward-slo** Network Community Profile (NCP), whether the gra… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

  13. arXiv:1709.03528  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

    Authors: Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W. Mahoney

    Abstract: For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a {\it Globally Im… ▽ More

    Submitted 11 September, 2018; v1 submitted 11 September, 2017; originally announced September 2017.

    Comments: Fixed some typos. Improved writing

  14. arXiv:1708.07827  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

    Authors: Peng Xu, Farbod Roosta-Khorasani, Michael W. Mahoney

    Abstract: While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in esca** flat regions and saddle points. These issues are particularly acute… ▽ More

    Submitted 15 February, 2018; v1 submitted 24 August, 2017; originally announced August 2017.

    Comments: 21 pages, 11 figures. Restructure the paper and add experiments

  15. arXiv:1708.07164  [pdf, ps, other

    math.OC cs.CC cs.LG stat.ML

    Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information

    Authors: Peng Xu, Fred Roosta, Michael W. Mahoney

    Abstract: We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ ε$-approximate second-order optimality which have shown to be tight. Our Hessian approximation cond… ▽ More

    Submitted 14 May, 2019; v1 submitted 23 August, 2017; originally announced August 2017.

    Comments: 32 pages

    Journal ref: Mathematical Programming 2019

  16. arXiv:1708.01945  [pdf, other

    stat.ML cs.LG math.NA

    A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication

    Authors: Miles E. Lopes, Shusen Wang, Michael W. Mahoney

    Abstract: In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems. Typically, the essential ingredient of these methods is some form of randomized dimension reduction, which accelerates computations, but also creates random approximation error. In this way, the dimension reduction step encodes a tradeoff between cost and a… ▽ More

    Submitted 3 April, 2019; v1 submitted 6 August, 2017; originally announced August 2017.

    Journal ref: Journal of Machine Learning Research, 20(39): 1-40, 2019

  17. arXiv:1706.05826  [pdf, other

    cs.DS cs.AI cs.IR

    Capacity Releasing Diffusion for Speed and Locality

    Authors: Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao

    Abstract: Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass "too aggressively," thereby failing to find the "right" clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both fa… ▽ More

    Submitted 10 June, 2018; v1 submitted 19 June, 2017; originally announced June 2017.

    Comments: Appeared in ICML 2017. Current version added reference and discussion of work on generalized Cheeger's inequalities

  18. arXiv:1706.02803  [pdf, other

    cs.LG stat.ML

    Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

    Authors: Shusen Wang, Alex Gittens, Michael W. Mahoney

    Abstract: Kernel $k$-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear $k$-means clustering algorithm. However, kernel $k$-means clustering is computationally expensive when the non-linear feature map is high-dimensional and there are many input points. Kernel approximation, e.g., the Nyström method, has been applied in previous works to a… ▽ More

    Submitted 10 February, 2019; v1 submitted 8 June, 2017; originally announced June 2017.

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-49

  19. arXiv:1705.07585  [pdf, other

    stat.ML

    Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

    Authors: Kristofer E. Bouchard, Alejandro F. Bujan, Farbod Roosta-Khorasani, Shashanka Ubaru, Prabhat, Antoine M. Snijders, Jian-Hua Mao, Edward F. Chang, Michael W. Mahoney, Sharmodeep Bhattacharyya

    Abstract: The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Meth… ▽ More

    Submitted 2 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: 42 pages; a conference version is in NIPS 2017

  20. Orbit Spaces of Linear Circle Actions

    Authors: Suzanne Craig, Naiche Downey, Lucas Goad, Michael J. Mahoney, Jordan Watts

    Abstract: In this paper, it is shown that non-isomorphic effective linear circle actions yield non-diffeomorphic differential structures on the corresponding orbit spaces.

    Submitted 1 April, 2019; v1 submitted 14 April, 2017; originally announced April 2017.

    Comments: 17 pages. This paper is the result of a Summer 2016 REU (undergraduate research) project at the University of Colorado Boulder. [V2 update: the earlier version contained material not dealt with during the REU, which will be published separately.] To appear in Involve

    Journal ref: Involve 12 (2019) 941-959

  21. arXiv:1703.07520  [pdf, other

    cs.SI physics.soc-ph

    Social Discrete Choice Models

    Authors: Danqing Zhang, Kimon Fountoulakis, Junyu Cao, Michael Mahoney, Alexei Pozdnoukhov

    Abstract: Human decision making underlies data generating process in multiple application areas, and models explaining and predicting choices made by individuals are in high demand. Discrete choice models are widely studied in economics and computational social sciences. As digital social networking facilitates information flow and spread of influence between individuals, new advances in modeling are needed… ▽ More

    Submitted 2 November, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

  22. arXiv:1702.04837  [pdf, other

    stat.ML cs.LG math.NA

    Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

    Authors: Shusen Wang, Alex Gittens, Michael W. Mahoney

    Abstract: We address the statistical and optimization impacts of the classical sketch and Hessian sketch used to approximately solve the Matrix Ridge Regression (MRR) problem. Prior research has quantified the effects of classical sketch on the strictly simpler least squares regression (LSR) problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does o… ▽ More

    Submitted 5 May, 2018; v1 submitted 15 February, 2017; originally announced February 2017.

    Comments: To appear in Journal of Machine Learning Research, 2018. A short version has appeared in International Conference on Machine Learning (ICML), 2017

    Journal ref: Journal of Machine Learning Research, 19, pp1-50, 2018

  23. arXiv:1612.04003  [pdf, other

    cs.DC

    Avoiding communication in primal and dual block coordinate descent methods

    Authors: Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney

    Abstract: Primal and dual block coordinate descent methods are iterative methods for solving regularized and unregularized optimization problems. Distributed-memory parallel implementations of these methods have become popular in analyzing large machine learning datasets. However, existing implementations communicate at every iteration which, on modern data center and supercomputing architectures, often dom… ▽ More

    Submitted 1 May, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    MSC Class: 68W10; 65F10 ACM Class: G.1.0; G.1.3; G.1.6

  24. arXiv:1609.03932  [pdf, other

    astro-ph.IM astro-ph.CO cs.DS stat.ML

    Map** the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data

    Authors: David Lawlor, Tamás Budavári, Michael W. Mahoney

    Abstract: We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital… ▽ More

    Submitted 13 September, 2016; originally announced September 2016.

    Comments: 34 pages. A modified version of this paper has been accepted to The Astrophysical Journal

  25. arXiv:1608.04845  [pdf, ps, other

    cs.DS stat.ML

    Lecture Notes on Spectral Graph Methods

    Authors: Michael W. Mahoney

    Abstract: These are lecture notes that are based on the lectures from a class I taught on the topic of Spectral Graph Methods at UC Berkeley during the Spring 2015 semester.

    Submitted 16 August, 2016; originally announced August 2016.

    Comments: 257 pages

  26. arXiv:1608.04481  [pdf, ps, other

    cs.DS stat.ML

    Lecture Notes on Randomized Linear Algebra

    Authors: Michael W. Mahoney

    Abstract: These are lecture notes that are based on the lectures from a class I taught on the topic of Randomized Linear Algebra (RLA) at UC Berkeley during the Fall 2013 semester.

    Submitted 16 August, 2016; originally announced August 2016.

    Comments: 188 pages

  27. arXiv:1607.04940  [pdf, other

    cs.SI cs.DS

    An optimization approach to locally-biased graph algorithms

    Authors: Kimon Fountoulakis, David Gleich, Michael Mahoney

    Abstract: Locally-biased graph algorithms are algorithms that attempt to find local or small-scale structure in a large data graph. In some cases, this can be accomplished by adding some sort of locality constraint and calling a traditional graph algorithm; but more interesting are locally-biased graph algorithms that compute answers by running a procedure that does not even look at most of the input graph.… ▽ More

    Submitted 4 December, 2016; v1 submitted 17 July, 2016; originally announced July 2016.

    Comments: 19 pages, 13 figures

  28. arXiv:1607.04378  [pdf, other

    cs.SD cs.MM

    DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

    Authors: Li** **g, Bo Liu, Jaeyoung Choi, Adam Janin, Julia Bernd, Michael W. Mahoney, Gerald Friedland

    Abstract: This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes… ▽ More

    Submitted 15 July, 2016; originally announced July 2016.

    Comments: An abbreviated version of this paper will be published in ACM Multimedia 2016

    ACM Class: H.5.1

  29. arXiv:1607.01335  [pdf, other

    cs.DC

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Authors: Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney, Prabhat

    Abstract: We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity… ▽ More

    Submitted 20 September, 2016; v1 submitted 5 July, 2016; originally announced July 2016.

    ACM Class: G.1.3; C.2.4

  30. arXiv:1607.00559  [pdf, ps, other

    math.OC stat.ML

    Sub-sampled Newton Methods with Non-uniform Sampling

    Authors: Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney

    Abstract: We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) := \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized… ▽ More

    Submitted 5 July, 2016; v1 submitted 2 July, 2016; originally announced July 2016.

    Comments: minor fix on v1

  31. arXiv:1605.08490  [pdf, other

    cs.SI cs.DS

    A Simple and Strongly-Local Flow-Based Method for Cut Improvement

    Authors: Nate Veldt, David F. Gleich, Michael W. Mahoney

    Abstract: Many graph-based learning problems can be cast as finding a good set of vertices nearby a seed set, and a powerful methodology for these problems is based on maximum flows. We introduce and analyze a new method for locally-biased graph-based learning called SimpleLocal, which finds good conductance cuts near a set of seed vertices. An important feature of our algorithm is that it is strongly-local… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  32. arXiv:1605.08108  [pdf, other

    math.OC cs.LG stat.ML

    FLAG n' FLARE: Fast Linearly-Coupled Adaptive Gradient Methods

    Authors: Xiang Cheng, Farbod Roosta-Khorasani, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney

    Abstract: We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions. We present accelerated and adaptive gradient methods, called FLAG and FLARE, which can offer the best of both worlds. They can achieve the optimal convergence rate by attaining the optimal first-order oracle complexity for smooth convex op… ▽ More

    Submitted 11 November, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

  33. arXiv:1604.07515  [pdf, other

    cs.DC

    Parallel Local Graph Clustering

    Authors: Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis, Michael W. Mahoney

    Abstract: Graph clustering has many important applications in computing, but due to growing sizes of graphs, even traditionally fast clustering methods such as spectral partitioning can be computationally expensive for real-world graphs of interest. Motivated partly by this, so-called local algorithms for graph clustering have received significant interest due to the fact that they can find good clusters in… ▽ More

    Submitted 8 June, 2019; v1 submitted 26 April, 2016; originally announced April 2016.

    Comments: Fixed typo in Figure 5

  34. arXiv:1602.01886  [pdf, other

    math.OC

    Variational Perspective on Local Graph Clustering

    Authors: Kimon Fountoulakis, Farbod Roosta-Khorasan, Julian Shun, Xiang Cheng, Michael W. Mahoney

    Abstract: Modern graph clustering applications require the analysis of large graphs and this can be computationally expensive. In this regard, local spectral graph clustering methods aim to identify well-connected clusters around a given "seed set" of reference nodes without accessing the entire graph. The celebrated Approximate Personalized PageRank (APPR) algorithm in the seminal paper by Andersen et al.… ▽ More

    Submitted 6 December, 2017; v1 submitted 4 February, 2016; originally announced February 2016.

    Comments: The title changed from "Exploiting Optimization for Local Graph Clustering". The abstract and introduction are written in a variational theme. Motivation and background for local graph clustering is provided. We bound the volume of the support of the optimal solution of the l1-regularized PageRank problem. This result is used to bound running time for iterative shrinkage-thresholding method

  35. arXiv:1601.04738  [pdf, ps, other

    math.OC cs.LG stat.ML

    Sub-Sampled Newton Methods II: Local Convergence Rates

    Authors: Farbod Roosta-Khorasani, Michael W. Mahoney

    Abstract: Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex constraint set $\mathcal{X} \subseteq \mathbb{R}^{p}$ where both $n$ and $p$ are large. In such problems, sub-sampling as a way to reduce $n$ can offer great amount… ▽ More

    Submitted 25 February, 2016; v1 submitted 18 January, 2016; originally announced January 2016.

  36. arXiv:1601.04737  [pdf, other

    math.OC cs.LG stat.ML

    Sub-Sampled Newton Methods I: Globally Convergent Algorithms

    Authors: Farbod Roosta-Khorasani, Michael W. Mahoney

    Abstract: Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms and we provide… ▽ More

    Submitted 25 February, 2016; v1 submitted 18 January, 2016; originally announced January 2016.

  37. arXiv:1511.06468  [pdf, ps, other

    cs.DS math.NA

    Faster Parallel Solver for Positive Linear Programs via Dynamically-Bucketed Selective Coordinate Descent

    Authors: Di Wang, Michael Mahoney, Nishanth Mohan, Satish Rao

    Abstract: We provide improved parallel approximation algorithms for the important class of packing and covering linear programs. In particular, we present new parallel $ε$-approximate packing and covering solvers which run in $\tilde{O}(1/ε^2)$ expected time, i.e., in expectation they take $\tilde{O}(1/ε^2)$ iterations and they do $\tilde{O}(N/ε^2)$ total work, where $N$ is the size of the constraint matrix… ▽ More

    Submitted 19 November, 2015; originally announced November 2015.

  38. arXiv:1510.05185  [pdf, other

    cs.SI math.PR nlin.AO physics.data-an physics.soc-ph

    A Local Perspective on Community Structure in Multilayer Networks

    Authors: Lucas G. S. Jeub, Michael W. Mahoney, Peter J. Mucha, Mason A. Porter

    Abstract: The analysis of multilayer networks is among the most active areas of network science, and there are now several methods to detect dense "communities" of nodes in multilayer networks. One way to define a community is as a set of nodes that trap a diffusion-like dynamical process (usually a random walk) for a long time. In this view, communities are sets of nodes that create bottlenecks to the spre… ▽ More

    Submitted 22 May, 2016; v1 submitted 17 October, 2015; originally announced October 2015.

    Comments: 20 pages, 5 figures (some with multiple parts)

  39. arXiv:1509.05111   

    stat.ME stat.ML

    Optimal Subsampling Approaches for Large Sample Linear Regression

    Authors: Rong Zhu, ** Ma, Michael W. Mahoney, Bin Yu

    Abstract: A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample from the original full sample and uses it as a surrogate for subsequent computation and estimation. In this paper, we study subsampling methods under two scenari… ▽ More

    Submitted 22 November, 2015; v1 submitted 16 September, 2015; originally announced September 2015.

    Comments: This paper has been withdrawn by the author due to the incompleteness of this draft

  40. arXiv:1508.02439  [pdf, ps, other

    cs.DS math.NA

    Unified Acceleration Method for Packing and Covering Problems via Diameter Reduction

    Authors: Di Wang, Satish Rao, Michael W. Mahoney

    Abstract: The linear coupling method was introduced recently by Allen-Zhu and Orecchia for solving convex optimization problems with first order methods, and it provides a conceptually simple way to integrate a gradient descent step and mirror descent step in each iteration. The high-level approach of the linear coupling method is very flexible, and it has shown initial promise by providing improved algorit… ▽ More

    Submitted 6 October, 2015; v1 submitted 10 August, 2015; originally announced August 2015.

    Comments: Fixed typo in packing LP formulation (page 1), and wrong citation in the discussion of earlier works on page 2

  41. arXiv:1505.06659  [pdf, ps, other

    stat.ML

    Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares -- ICML

    Authors: Garvesh Raskutti, Michael Mahoney

    Abstract: We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an \emph{algorithmic perspective}, when using sketching matrices constructed from random projections and leverage-score sampling, if the number of samples $r$ much smaller than the original sample size $n$, then the worst-case (WC)… ▽ More

    Submitted 25 May, 2015; originally announced May 2015.

    Comments: 9 pages, Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR

  42. arXiv:1505.00398  [pdf, other

    stat.ML cs.LG math.NA

    Block Basis Factorization for Scalable Kernel Matrix Evaluation

    Authors: Ruoxi Wang, Yingzhou Li, Michael W. Mahoney, Eric Darve

    Abstract: Kernel methods are widespread in machine learning; however, they are limited by the quadratic complexity of the construction, application, and storage of kernel matrices. Low-rank matrix approximation algorithms are widely used to address this problem and reduce the arithmetic and storage cost. However, we observed that for some datasets with wide intra-class variability, the optimal kernel parame… ▽ More

    Submitted 4 May, 2021; v1 submitted 3 May, 2015; originally announced May 2015.

    Comments: 16 pages, 5 figures

    Journal ref: SIAM Journal on Matrix Analysis and Applications, 2019, Vol. 40, No. 4 : pp. 1497-1526

  43. arXiv:1502.03571  [pdf, ps, other

    math.OC stat.ML

    Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning

    Authors: Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney

    Abstract: In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e.g., $\ell_2$ and $\ell_1$ regression problems. We propose a hybrid algorithm named pwSGD… ▽ More

    Submitted 10 July, 2017; v1 submitted 12 February, 2015; originally announced February 2015.

    Comments: A conference version of this paper appears under the same title in Proceedings of ACM-SIAM Symposium on Discrete Algorithms, Arlington, VA, 2016

  44. arXiv:1502.03032  [pdf, ps, other

    cs.DC cs.DS math.NA stat.ML

    Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

    Authors: Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

    Abstract: In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. Here, we review recent work on develo** and implementing randomized matrix algorithms in large-scale parallel and distributed environments. Randomized algorithms for matrix problems have received a great deal of attention… ▽ More

    Submitted 27 July, 2015; v1 submitted 10 February, 2015; originally announced February 2015.

  45. arXiv:1412.8293  [pdf, ps, other

    stat.ML cs.LG math.NA stat.CO

    Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

    Authors: Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael Mahoney

    Abstract: We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead… ▽ More

    Submitted 9 August, 2015; v1 submitted 29 December, 2014; originally announced December 2014.

    Comments: A short version of this paper has been presented in ICML 2014

  46. arXiv:1411.1546  [pdf, other

    cs.DS cs.SI physics.soc-ph stat.AP

    Tree decompositions and social graphs

    Authors: Aaron B. Adcock, Blair D. Sullivan, Michael W. Mahoney

    Abstract: Recent work has established that large informatics graphs such as social and information networks have non-trivial tree-like structure when viewed at moderate size scales. Here, we present results from the first detailed empirical evaluation of the use of tree decomposition (TD) heuristics for structure identification and extraction in social graphs. Although TDs have historically been used in str… ▽ More

    Submitted 3 May, 2016; v1 submitted 6 November, 2014; originally announced November 2014.

    Comments: v2 has 44 pages, 21 figures, 7 tables, 107 references. To appear in Internet Mathematics

  47. arXiv:1411.0306  [pdf, other

    stat.ML cs.LG stat.CO

    Fast Randomized Kernel Methods With Statistical Guarantees

    Authors: Ahmed El Alaoui, Michael W. Mahoney

    Abstract: One approach to improving the running time of kernel-based machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of \emph{statisti… ▽ More

    Submitted 8 November, 2015; v1 submitted 2 November, 2014; originally announced November 2014.

    Comments: Improved presentation. Technical details fixed. A conference version of this paper appears in NIPS15 under the modified title "Fast Randomized Kernel Ridge Regression with Statistical Guarantees"

  48. High-order boundary integral equation solution of high frequency wave scattering from obstacles in an unbounded linearly stratified medium

    Authors: Alex. H. Barnett, Bradley J. Nelson, J. Matthew Mahoney

    Abstract: We apply boundary integral equations for the first time to the two-dimensional scattering of time-harmonic waves from a smooth obstacle embedded in a continuously-graded unbounded medium. In the case we solve the square of the wavenumber (refractive index) varies linearly in one coordinate, i.e. $(Δ+ E + x_2)u(x_1,x_2) = 0$ where $E$ is a constant; this models quantum particles of fixed energy in… ▽ More

    Submitted 25 September, 2014; originally announced September 2014.

    Comments: 22 pages, 9 figures, submitted to J. Comput. Phys

    MSC Class: 65N38; 65N80; 34M60; 65D20

  49. arXiv:1406.5986  [pdf, other

    stat.ML

    A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

    Authors: Garvesh Raskutti, Michael Mahoney

    Abstract: We consider statistical as well as algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data $(X, Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$, sketching algorithms use a sketching matrix, $S\in\mathbb{R}^{r \times n}$ with $r \ll n$. Then, rather than solving the LS problem using the full data $(X,Y)$, ske… ▽ More

    Submitted 25 August, 2015; v1 submitted 23 June, 2014; originally announced June 2014.

    Comments: 27 pages, 5 figures

  50. arXiv:1403.3795  [pdf, other

    cs.SI cond-mat.dis-nn math.CO nlin.AO physics.soc-ph

    Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks

    Authors: Lucas G. S. Jeub, Prakash Balachandran, Mason A. Porter, Peter J. Mucha, Michael W. Mahoney

    Abstract: It is common in the study of networks to investigate meso-scale features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective t… ▽ More

    Submitted 8 October, 2014; v1 submitted 15 March, 2014; originally announced March 2014.

    Comments: 32 pages, 19 figures (many with multiple parts); the abstract is abridged because of space limitations in the arXiv's abstract field