Skip to main content

Showing 1–50 of 97 results for author: Musco, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18752  [pdf, other

    cs.LG cs.GT

    Competitive Algorithms for Online Knapsack with Succinct Predictions

    Authors: Mohammadreza Daneshvaramoli, Helia Karisani, Adam Lechowicz, Bo Sun, Cameron Musco, Mohammad Hajiesmaili

    Abstract: In the online knapsack problem, the goal is to pack items arriving online with different values and weights into a capacity-limited knapsack to maximize the total value of the accepted items. We study \textit{learning-augmented} algorithms for this problem, which aim to use machine-learned predictions to move beyond pessimistic worst-case guarantees. Existing learning-augmented algorithms for onli… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 29 pages, 10 figures, Submitted to NeurIPS 2024

    MSC Class: 68Q25; 68T05 ACM Class: F.2.2; I.2.6

  2. arXiv:2406.07521  [pdf, other

    cs.DS cs.LG

    Faster Spectral Density Estimation and Sparsification in the Nuclear Norm

    Authors: Yujia **, Ishani Karmarkar, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

    Abstract: We consider the problem of estimating the spectral density of the normalized adjacency matrix of an $n$-node undirected graph. We provide a randomized algorithm that, with $O(nε^{-2})$ queries to a degree and neighbor oracle and in $O(nε^{-3})$ time, estimates the spectrum up to $ε$ accuracy in the Wasserstein-1 metric. This improves on previous state-of-the-art methods, including an $O(nε^{-7})$… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024

  3. arXiv:2405.18680  [pdf, ps, other

    cs.DS cs.CG cs.DB cs.LG

    Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

    Authors: Haya Diwan, **rui Gou, Cameron Musco, Christopher Musco, Torsten Suel

    Abstract: There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.09312  [pdf, ps, other

    cs.LG

    Agnostic Active Learning of Single Index Models with Linear Sample Complexity

    Authors: Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Christopher Musco, Yi Li

    Abstract: We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientif… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  5. arXiv:2405.05865  [pdf, ps, other

    cs.DS cs.LG math.NA math.OC

    Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

    Authors: Michał Dereziński, Christopher Musco, Jiaming Yang

    Abstract: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nyström approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergen… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2404.13757  [pdf, ps, other

    cs.DS math.NA

    Sublinear Time Low-Rank Approximation of Toeplitz Matrices

    Authors: Cameron Musco, Kshiteej Sheth

    Abstract: We present a sublinear time algorithm for computing a near optimal low-rank approximation to any positive semidefinite (PSD) Toeplitz matrix $T\in \mathbb{R}^{d\times d}$, given noisy access to its entries. In particular, given entrywise query access to $T+E$ for an arbitrary noise matrix $E\in \mathbb{R}^{d\times d}$, integer rank $k\leq d$, and error parameter $δ>0$, our algorithm runs in time… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Published in SODA 2024. Updated proofs

  7. arXiv:2402.09379  [pdf, other

    cs.DS math.NA

    Fixed-sparsity matrix approximation from matrix-vector products

    Authors: Noah Amsel, Tyler Chen, Feyza Duman Keles, Diana Halikias, Cameron Musco, Christopher Musco

    Abstract: We study the problem of approximating a matrix $\mathbf{A}$ with a matrix that has a fixed sparsity pattern (e.g., diagonal, banded, etc.), when $\mathbf{A}$ is accessed only by matrix-vector products. We describe a simple randomized algorithm that returns an approximation with the given sparsity pattern with Frobenius-norm error at most $(1+\varepsilon)$ times the best possible error. When each r… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  8. arXiv:2312.11712  [pdf, other

    cs.CR cs.LG

    A Simple and Practical Method for Reducing the Disparate Impact of Differential Privacy

    Authors: Lucas Rosenblatt, Julia Stoyanovich, Christopher Musco

    Abstract: Differentially private (DP) mechanisms have been deployed in a variety of high-impact social settings (perhaps most notably by the U.S. Census). Since all DP mechanisms involve adding noise to results of statistical queries, they are expected to impact our ability to accurately analyze and learn from data, in effect trading off privacy with utility. Alarmingly, the impact of DP on utility can vary… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  9. arXiv:2312.03691  [pdf, other

    cs.LG cs.SI

    On the Role of Edge Dependency in Graph Generative Models

    Authors: Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos Tsourakakis

    Abstract: In this work, we introduce a novel evaluation framework for generative models of graphs, emphasizing the importance of model-generated graph overlap (Chanpuriya et al., 2021) to ensure both accuracy and edge-diversity. We delineate a hierarchy of graph generative models categorized into three levels of complexity: edge independent, node independent, and fully dependent models. This hierarchy encap… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  10. arXiv:2311.14023  [pdf, ps, other

    math.NA cs.DS

    Algorithm-agnostic low-rank approximation of operator monotone matrix functions

    Authors: David Persson, Raphael A. Meyer, Christopher Musco

    Abstract: Low-rank approximation of a matrix function, $f(A)$, is an important task in computational mathematics. Most methods require direct access to $f(A)$, which is often considerably more expensive than accessing $A$. Persson and Kressner (SIMAX 2023) avoid this issue for symmetric positive semidefinite matrices by proposing funNyström, which first constructs a Nyström approximation to $A$ using subspa… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    MSC Class: 65F15; 65F55; 65F60; 68W25

  11. arXiv:2310.18265  [pdf, other

    cs.DS cs.LG math.OC stat.ML

    Structured Semidefinite Programming for Recovering Structured Preconditioners

    Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian

    Abstract: We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, com… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Merge of arXiv:1812.06295 and arXiv:2008.01722

  12. arXiv:2310.04966  [pdf, other

    cs.LG

    Improved Active Learning via Dependent Leverage Score Sampling

    Authors: Atsushi Shimizu, Xiaoou Cheng, Christopher Musco, Jonathan Weare

    Abstract: We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the \emph{pivotal sampling algorithm}, which we test on problems motivated by learning-based methods for parametric PDE… ▽ More

    Submitted 4 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: To appear at ICLR 2024

  13. arXiv:2309.16157  [pdf, other

    cs.DB cs.DS

    Sampling Methods for Inner Product Sketching

    Authors: Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang

    Abstract: Recently, Bessa et al. (PODS 2023) showed that sketches based on coordinated weighted sampling theoretically and empirically outperform popular linear sketching methods like Johnson-Lindentrauss projection and CountSketch for the ubiquitous problem of inner product estimation. We further develop this finding by introducing and analyzing two alternative sampling-based methods. In contrast to the co… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 17 pages, 10 figures

  14. arXiv:2308.06448  [pdf, other

    cs.LG cs.SI

    Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More

    Authors: Sudhanshu Chanpuriya, Cameron Musco

    Abstract: Algorithms for node clustering typically focus on finding homophilous structure in graphs. That is, they find sets of similar nodes with many edges within, rather than across, the clusters. However, graphs often also exhibit heterophilous structure, as exemplified by (nearly) bipartite and tripartite graphs, where most edges occur across the clusters. Grappling with such structure is typically lef… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  15. arXiv:2308.05907  [pdf, ps, other

    cs.DS cs.DB

    Simple Analysis of Priority Sampling

    Authors: Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang

    Abstract: We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup.

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: 6 pages

  16. arXiv:2307.00474  [pdf, other

    cs.DS cs.LG

    Moments, Random Walks, and Limits for Spectrum Approximation

    Authors: Yujia **, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

    Abstract: We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on $[-1,1]$ that cannot be approximated to accuracy $ε$ in Wasserstein-1 distance even if we know \emph{all} of their moments to multiplicative accuracy $(1\pm2^{-Ω(1/ε)})$; this result matches an upper bound of Kong and Valiant [Anna… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  17. arXiv:2305.18755  [pdf, other

    cs.LG

    Dimensionality Reduction for General KDE Mode Finding

    Authors: Xinyu Luo, Christopher Musco, Cas Widdershoven

    Abstract: Finding the mode of a high dimensional probability distribution $D$ is a fundamental algorithmic problem in statistics and data analysis. There has been particular interest in efficient methods for solving the problem when $D$ is represented as a mixture model or kernel density estimate, although few algorithmic results with worst-case approximation and runtime guarantees are known. In this work,… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Full version of a paper published at ICML'23

  18. arXiv:2305.14451  [pdf, other

    cs.LG cs.AI stat.ML

    Kernel Interpolation with Sparse Grids

    Authors: Mohit Yadav, Daniel Sheldon, Cameron Musco

    Abstract: Structured kernel interpolation (SKI) accelerates Gaussian process (GP) inference by interpolating the kernel covariance function using a dense grid of inducing points, whose corresponding kernel matrix is highly structured and thus amenable to fast linear algebra. Unfortunately, SKI scales poorly in the dimension of the input points, since the dense grid size grows exponentially with the dimensio… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at Neural Information Processing Systems (NeurIPS) 2022

  19. arXiv:2305.05826  [pdf, ps, other

    cs.DS math.NA

    Universal Matrix Sparsifiers and Fast Deterministic Algorithms for Linear Algebra

    Authors: Rajarshi Bhattacharjee, Gregory Dexter, Cameron Musco, Archan Ray, Sushant Sachdeva, David P Woodruff

    Abstract: Let $\mathbf S \in \mathbb R^{n \times n}$ satisfy $\|\mathbf 1-\mathbf S\|_2\leεn$, where $\mathbf 1$ is the all ones matrix and $\|\cdot\|_2$ is the spectral norm. It is well-known that there exists such an $\mathbf S$ with just $O(n/ε^2)$ non-zero entries: we can let $\mathbf S$ be the scaled adjacency matrix of a Ramanujan expander graph. We show that such an $\mathbf S$ yields a $universal$… ▽ More

    Submitted 12 January, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 41 pages

    ACM Class: F.2.1; G.1.3; G.1.2; G.4; I.1.2

  20. arXiv:2305.02535  [pdf, other

    cs.DS math.NA

    On the Unreasonable Effectiveness of Single Vector Krylov Methods for Low-Rank Approximation

    Authors: Raphael A. Meyer, Cameron Musco, Christopher Musco

    Abstract: Krylov subspace methods are a ubiquitous tool for computing near-optimal rank $k$ approximations of large matrices. While "large block" Krylov methods with block size at least $k$ give the best known theoretical guarantees, block size one (a single vector) or a small constant is often preferred in practice. Despite their popularity, we lack theoretical bounds on the performance of such "small bloc… ▽ More

    Submitted 6 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: 41 pages, 7 figures. To appear at SODA 2024

    MSC Class: 65F55 (Primary) 65F15 (Secondary) ACM Class: G.1.3; F.2.1

  21. arXiv:2304.02261  [pdf, other

    cs.DS cs.LG stat.ML

    Optimal Sketching Bounds for Sparse Linear Regression

    Authors: Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

    Abstract: We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $Θ(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This ex… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: AISTATS 2023

  22. arXiv:2303.06396  [pdf, other

    cs.LG stat.ML

    No-regret Algorithms for Fair Resource Allocation

    Authors: Abhishek Sinha, Ativ Joshi, Rajarshi Bhattacharjee, Cameron Musco, Mohammad Hajiesmaili

    Abstract: We consider a fair resource allocation problem in the no-regret setting against an unrestricted adversary. The objective is to allocate resources equitably among several agents in an online fashion so that the difference of the aggregate $α$-fair utilities of the agents between an optimal static clairvoyant allocation and that of the online policy grows sub-linearly with time. The problem is chall… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

  23. arXiv:2301.05811  [pdf, other

    cs.DB cs.DS

    Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation

    Authors: Aline Bessa, Majid Daliri, Juliana Freire, Cameron Musco, Christopher Musco, Aécio Santos, Haoxiang Zhang

    Abstract: We present a new approach for computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm, our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation, such as CountSketch and Johnson-Lindenstrauss projection. Spec… ▽ More

    Submitted 5 May, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: 23 pages, 6 figures

    Journal ref: In Proceedings of the ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS) 2023

  24. arXiv:2211.11328  [pdf, ps, other

    cs.DS math.NA

    Toeplitz Low-Rank Approximation with Sublinear Query Complexity

    Authors: Michael Kapralov, Hannah Lawrence, Mikhail Makarov, Cameron Musco, Kshiteej Sheth

    Abstract: We present a sublinear query algorithm for outputting a near-optimal low-rank approximation to any positive semidefinite Toeplitz matrix $T \in \mathbb{R}^{d \times d}$. In particular, for any integer rank $k \leq d$ and $ε,δ> 0$, our algorithm makes $\tilde{O} \left (k^2 \cdot \log(1/δ) \cdot \text{poly}(1/ε) \right )$ queries to the entries of $T$ and outputs a rank… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted in SODA 2023

  25. arXiv:2211.06790  [pdf, other

    cs.DS

    Near-Linear Sample Complexity for $L_p$ Polynomial Regression

    Authors: Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff, Samson Zhou

    Abstract: We study $L_p$ polynomial regression. Given query access to a function $f:[-1,1] \rightarrow \mathbb{R}$, the goal is to find a degree $d$ polynomial $\hat{q}$ such that, for a given parameter $\varepsilon > 0$, $$ \|\hat{q}-f\|_p\le (1+\varepsilon) \cdot \min_{q:\text{deg}(q)\le d}\|q-f\|_p. $$ Here $\|\cdot\|_p$ is the $L_p$ norm, $\|g\|_p = (\int_{-1}^1 |g(t)|^p dt)^{1/p}$. We show that queryin… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Comments: 68 pages, to be presented at SODA 2023

  26. arXiv:2210.13601  [pdf, other

    cs.LG

    Active Learning for Single Neuron Models with Lipschitz Non-Linearities

    Authors: Aarshvi Gajjar, Chinmay Hegde, Christopher Musco

    Abstract: We consider the problem of active learning for single neuron models, also sometimes called ``ridge functions'', in the agnostic setting (under adversarial label noise). Such models have been shown to be broadly effective in modeling physical phenomena, and for constructing surrogate data-driven models for partial differential equations. Surprisingly, we show that for a single neuron model with a… ▽ More

    Submitted 18 July, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Inadvertently submitting an incorrect writeup that does not align with the intended content

  27. arXiv:2210.06594  [pdf, other

    cs.LG cs.AI cs.DS econ.EM stat.ME

    Sample Constrained Treatment Effect Estimation

    Authors: Raghavendra Addanki, David Arbour, Tung Mai, Cameron Musco, Anup Rao

    Abstract: Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Conference on Neural Information Processing Systems (NeurIPS) 2022

  28. arXiv:2210.00032  [pdf, other

    cs.LG cs.SI

    Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs

    Authors: Sudhanshu Chanpuriya, Ryan A. Rossi, Sungchul Kim, Tong Yu, Jane Hoffswell, Nedim Lipka, Shunan Guo, Cameron Musco

    Abstract: Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  29. arXiv:2208.03268  [pdf, other

    cs.DS math.NA

    A Tight Analysis of Hutchinson's Diagonal Estimator

    Authors: Prathamesh Dharangutte, Christopher Musco

    Abstract: Let $\mathbf{A}\in \mathbb{R}^{n\times n}$ be a matrix with diagonal $\text{diag}(\mathbf{A})$ and let $\bar{\mathbf{A}}$ be $\mathbf{A}$ with its diagonal set to all zeros. We show that Hutchinson's estimator run for $m$ iterations returns a diagonal estimate $\tilde{d}\in \mathbb{R}^n$ such that with probability $(1-δ)$,… ▽ More

    Submitted 6 November, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

    Comments: To appear in SIAM Symposium on Simplicity in Algorithms (SOSA23)

  30. arXiv:2207.02817  [pdf, ps, other

    cs.DS

    Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries

    Authors: Raghavendra Addanki, Andrew McGregor, Cameron Musco

    Abstract: We study the problem of estimating the number of edges in an $n$-vertex graph, accessed via the Bipartite Independent Set query model introduced by Beame et al. (ITCS '18). In this model, each query returns a Boolean, indicating the existence of at least one edge between two specified sets of nodes. We present a non-adaptive algorithm that returns a $(1\pm ε)$ relative error approximation to the n… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: European Symposium on Algorithms (ESA) 2022

  31. arXiv:2203.07557  [pdf, other

    cs.DS

    Fast Regression for Structured Inputs

    Authors: Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff, Samson Zhou

    Abstract: We study the $\ell_p$ regression problem, which requires finding $\mathbf{x}\in\mathbb R^{d}$ that minimizes $\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$ for a matrix $\mathbf{A}\in\mathbb R^{n \times d}$ and response vector $\mathbf{b}\in\mathbb R^{n}$. There has been recent interest in develo** subsampling methods for this problem that can outperform standard techniques when $n$ is very large. Howe… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  32. arXiv:2202.04139  [pdf, other

    cs.LG cs.SI

    Simplified Graph Convolution with Heterophily

    Authors: Sudhanshu Chanpuriya, Cameron Musco

    Abstract: Recent work has shown that a simple, fast method called Simple Graph Convolution (SGC) (Wu et al., 2019), which eschews deep learning, is competitive with deep methods like graph convolutional networks (GCNs) (Kipf & Welling, 2017) in common graph machine learning benchmarks. The use of graph data in SGC implicitly assumes the common but not universal graph characteristic of homophily, wherein nod… ▽ More

    Submitted 3 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

  33. arXiv:2112.09631  [pdf, other

    cs.LG cs.CL

    Sublinear Time Approximation of Text Similarity Matrices

    Authors: Archan Ray, Nicholas Monath, Andrew McCallum, Cameron Musco

    Abstract: We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $Ω(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this qua… ▽ More

    Submitted 27 April, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 25 pages, 10 figures

    MSC Class: F.2.1

  34. Local Edge Dynamics and Opinion Polarization

    Authors: Nikita Bhalla, Adam Lechowicz, Cameron Musco

    Abstract: The proliferation of social media platforms, recommender systems, and their joint societal impacts have prompted significant interest in opinion formation and evolution within social networks. We study how local edge dynamics can drive opinion polarization. In particular, we introduce a variant of the classic Friedkin-Johnsen opinion dynamics, augmented with a simple time-evolving network model. E… ▽ More

    Submitted 8 December, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

    Comments: Accepted to WSDM 2023. 14 pages, 30 figures

  35. arXiv:2111.04888  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Active Linear Regression for $\ell_p$ Norms and Beyond

    Authors: Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda

    Abstract: We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+ε)$-approxim… ▽ More

    Submitted 26 September, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Abstract shortened to meet arXiv limits; v2: improved bounds; v3: improved bounds; v4: to appear in FOCS 2022

  36. arXiv:2111.03030  [pdf, other

    cs.LG cs.SI

    Exact Representation of Sparse Networks with Symmetric Nonnegative Embeddings

    Authors: Sudhanshu Chanpuriya, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Zhao Song, Cameron Musco

    Abstract: Many models for undirected graphs are based on factorizing the graph's adjacency matrix; these models find a vector representation of each node such that the predicted probability of a link between two nodes increases with the similarity (dot product) of their associated vectors. Recent work has shown that these models are unable to capture key structures in real-world graphs, particularly heterop… ▽ More

    Submitted 30 September, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

  37. arXiv:2111.00048  [pdf, other

    cs.LG cs.SI

    On the Power of Edge Independent Graph Models

    Authors: Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos E. Tsourakakis

    Abstract: Why do many modern neural-network-based graph generative models fail to reproduce typical real-world network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic Erdös-Rényi and stochastic block models, as well… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

  38. arXiv:2110.13752  [pdf, other

    cs.DS

    Dynamic Trace Estimation

    Authors: Prathamesh Dharangutte, Christopher Musco

    Abstract: We study a dynamic version of the implicit trace estimation problem. Given access to an oracle for computing matrix-vector multiplications with a dynamically changing matrix A, our goal is to maintain an accurate approximation to A's trace using as few multiplications as possible. We present a practical algorithm for solving this problem and prove that, in a natural setting, its complexity is quad… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  39. arXiv:2110.11981  [pdf, other

    cs.SI

    How to Quantify Polarization in Models of Opinion Dynamics

    Authors: Christopher Musco, Indu Ramesh, Johan Ugander, R. Teal Witter

    Abstract: It is widely believed that society is becoming increasingly polarized around important issues, a dynamic that does not align with common mathematical models of opinion formation in social networks. In particular, measures of polarization based on opinion variance are known to decrease over time in frameworks such as the popular DeGroot model. Complementing recent work that seeks to resolve this ap… ▽ More

    Submitted 25 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

  40. arXiv:2109.07647  [pdf, other

    cs.DS math.NA

    Sublinear Time Eigenvalue Approximation via Random Sampling

    Authors: Rajarshi Bhattacharjee, Gregory Dexter, Petros Drineas, Cameron Musco, Archan Ray

    Abstract: We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm εn$ using those of a randomly sampled… ▽ More

    Submitted 21 July, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: 58 pages, 4 figures

    MSC Class: F.2.1; G.1.3; G.1.2; G.4; I.1.2

  41. arXiv:2106.04254  [pdf, other

    cs.LG cs.DS

    Coresets for Classification -- Simplified and Strengthened

    Authors: Tung Mai, Anup B. Rao, Cameron Musco

    Abstract: We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm ε)$ relative error with $\tilde O(d \cdot μ_y(X)^2/ε^2)$ points, where $μ_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in by Munt… ▽ More

    Submitted 17 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  42. Sublinear Time Spectral Density Estimation

    Authors: Vladimir Braverman, Aditya Krishnan, Christopher Musco

    Abstract: We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $ε$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/ε))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong… ▽ More

    Submitted 14 April, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to STOC'22

  43. arXiv:2104.03353  [pdf, other

    cs.DB cs.DS cs.IR

    Correlation Sketches for Approximate Join-Correlation Queries

    Authors: Aécio Santos, Aline Bessa, Fernando Chirigati, Christopher Musco, Juliana Freire

    Abstract: The increasing availability of structured datasets, from Web tables and open-data portals to enterprise data, opens up opportunities~to enrich analytics and improve machine learning models through relational data augmentation. In this paper, we introduce a new class of data augmentation queries: join-correlation queries. Given a column $Q$ and a join column $K_Q$ from a query table… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21)

  44. Public Transport Planning: When Transit Network Connectivity Meets Commuting Demand

    Authors: Sheng Wang, Yuan Sun, Christopher Musco, Zhifeng Bao

    Abstract: In this paper, we make a first attempt to incorporate both commuting demand and transit network connectivity in bus route planning (CT-Bus), and formulate it as a constrained optimization problem: planning a new bus route with k edges over an existing transit network without building new bus stops to maximize a linear aggregation of commuting demand and connectivity of the transit network. We prov… ▽ More

    Submitted 3 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: SIGMOD 2021, 14 pages

  45. arXiv:2102.08532  [pdf, other

    cs.LG cs.SI

    DeepWalking Backwards: From Embeddings Back to Graphs

    Authors: Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos E. Tsourakakis

    Abstract: Low-dimensional node embeddings play a key role in analyzing graph datasets. However, little work studies exactly what information is encoded by popular embedding methods, and how this information correlates with performance in downstream machine learning tasks. We tackle this question by studying whether embeddings can be inverted to (approximately) recover the graph used to generate them. Focusi… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

  46. arXiv:2102.08341  [pdf, other

    cs.DS cs.LG math.NA

    Faster Kernel Matrix Algebra via Density Estimation

    Authors: Arturs Backurs, Piotr Indyk, Cameron Musco, Tal Wagner

    Abstract: We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K \in \mathbb{R}^{n \times n}$ corresponding to $n$ points $x_1,\ldots,x_n \in \mathbb{R}^d$. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to $1+ε$ relative error i… ▽ More

    Submitted 17 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

  47. arXiv:2101.11751  [pdf, other

    cs.LG cs.AI

    Faster Kernel Interpolation for Gaussian Processes

    Authors: Mohit Yadav, Daniel Sheldon, Cameron Musco

    Abstract: A key challenge in scaling Gaussian Process (GP) regression to massive datasets is that exact inference requires computation with a dense n x n kernel matrix, where n is the number of data points. Significant work focuses on approximating the kernel matrix via interpolation using a smaller set of m inducing points. Structured kernel interpolation (SKI) is among the most scalable methods: by placin… ▽ More

    Submitted 13 August, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: To appear, Artificial Intelligence and Statistics (AISTATS) 2021

  48. arXiv:2012.13976  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Intervention Efficient Algorithms for Approximate Learning of Causal Graphs

    Authors: Raghavendra Addanki, Andrew McGregor, Cameron Musco

    Abstract: We study the problem of learning the causal relationships between a set of observed variables in the presence of latents, while minimizing the cost of interventions on the observed variables. We assume access to an undirected graph $G$ on the observed variables whose edges represent either all direct causal relationships or, less restrictively, a superset of causal relationships (identified, e.g.,… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

    Comments: To appear, International Conference on Algorithmic Learning Theory(ALT) 2021

  49. arXiv:2011.09986  [pdf, other

    cs.LG cs.DS

    Estimation of Shortest Path Covariance Matrices

    Authors: Raj Kumar Maity, Cameron Musco

    Abstract: We study the sample complexity of estimating the covariance matrix $\mathbfΣ \in \mathbb{R}^{d\times d}$ of a distribution $\mathcal D$ over $\mathbb{R}^d$ given independent samples, under the assumption that $\mathbfΣ$ is graph-structured. In particular, we focus on shortest path covariance matrices, where the covariance between any two measurements is determined by the shortest path distance in… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

  50. arXiv:2010.10218  [pdf, other

    cs.LG cs.AI stat.ML

    Model-specific Data Subsampling with Influence Functions

    Authors: Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

    Abstract: Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we dev… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.