Skip to main content

Showing 1–14 of 14 results for author: Steinerberger, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.11487  [pdf, other

    math.NA stat.ML

    Randomly Pivoted Partial Cholesky: Random How?

    Authors: Stefan Steinerberger

    Abstract: We consider the problem of finding good low rank approximations of symmetric, positive-definite $A \in \mathbb{R}^{n \times n}$. Chen-Epperly-Tropp-Webber showed, among many other things, that the randomly pivoted partial Cholesky algorithm that chooses the $i-$th row with probability proportional to the diagonal entry $A_{ii}$ leads to a universal contraction of the trace norm (the Schatten 1-nor… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2007.13288  [pdf, other

    math.NA cs.LG math.OC stat.ML

    On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

    Authors: Stefan Steinerberger

    Abstract: We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that… ▽ More

    Submitted 1 September, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

  3. arXiv:2003.09969  [pdf, other

    math.PR cs.DM cs.LG math.SP stat.ML

    Spectral Clustering Revisited: Information Hidden in the Fiedler Vector

    Authors: Adela DePavia, Stefan Steinerberger

    Abstract: We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  4. arXiv:2002.12317  [pdf, other

    cs.LG stat.ML

    The Spectral Underpinning of word2vec

    Authors: Ariel Jaffe, Yuval Kluger, Ofir Lindenbaum, Jonathan Patsenker, Erez Peterfreund, Stefan Steinerberger

    Abstract: word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an under… ▽ More

    Submitted 9 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  5. Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

    Authors: Dmitry Kobak, George Linderman, Stefan Steinerberger, Yuval Kluger, Philipp Berens

    Abstract: T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of fre… ▽ More

    Submitted 4 April, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

    Journal ref: ECML PKDD 2019

  6. arXiv:1806.11096  [pdf, other

    stat.ML cs.LG math.FA

    Recovering Trees with Convex Clustering

    Authors: Eric C. Chi, Stefan Steinerberger

    Abstract: Convex clustering refers, for given $\left\{x_1, \dots, x_n\right\} \subset \mathbb{R}^p$, to the minimization of \begin{eqnarray*} u(γ) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + γ\sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that… ▽ More

    Submitted 28 June, 2018; v1 submitted 28 June, 2018; originally announced June 2018.

    Comments: 26 pages, 7 figures

  7. arXiv:1803.06989  [pdf, other

    math.ST cs.LG math.NA stat.ML

    Numerical Integration on Graphs: where to sample and how to weigh

    Authors: George C. Linderman, Stefan Steinerberger

    Abstract: Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{|V|}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is know… ▽ More

    Submitted 19 March, 2018; originally announced March 2018.

  8. Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding

    Authors: George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger

    Abstract: t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)… ▽ More

    Submitted 24 December, 2017; originally announced December 2017.

  9. arXiv:1711.04712  [pdf, other

    math.CO cs.DM cs.DS math.PR stat.ML

    Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

    Authors: George C. Linderman, Gal Mishne, Yuval Kluger, Stefan Steinerberger

    Abstract: If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

  10. arXiv:1706.02582  [pdf, other

    cs.LG stat.ML

    Clustering with t-SNE, provably

    Authors: George C. Linderman, Stefan Steinerberger

    Abstract: t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove th… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

  11. arXiv:1706.01362  [pdf, other

    math.SP math-ph math.AP math.FA stat.ML

    The Geometry of Nodal Sets and Outlier Detection

    Authors: Xiuyuan Cheng, Gal Mishne, Stefan Steinerberger

    Abstract: Let $(M,g)$ be a compact manifold and let $-Δφ_k = λ_k φ_k$ be the sequence of Laplacian eigenfunctions. We present a curious new phenomenon which, so far, we only managed to understand in a few highly specialized cases: the family of functions $f_N:M \rightarrow \mathbb{R}_{\geq 0}$ $$ f_N(x) = \sum_{k \leq N}{ \frac{1}{\sqrt{λ_k}} \frac{|φ_k(x)|}{\|φ_k\|_{L^{\infty}(M)}}}$$ seems strangely suite… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: 11 pages, 7 figures

  12. arXiv:1702.02670  [pdf, other

    stat.ML math.ST

    Stochastic Neighbor Embedding separates well-separated clusters

    Authors: Uri Shaham, Stefan Steinerberger

    Abstract: Stochastic Neighbor Embedding and its variants are widely used dimensionality reduction techniques -- despite their popularity, no theoretical results are known. We prove that the optimal SNE embedding of well-separated clusters from high dimensions to any Euclidean space R^d manages to successfully separate the clusters in a quantitative way. The result also applies to a larger family of methods… ▽ More

    Submitted 22 February, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

  13. arXiv:1611.03033  [pdf, other

    math.SP math-ph math.AP stat.ML

    On the Diffusion Geometry of Graph Laplacians and Applications

    Authors: Xiuyuan Cheng, Manas Rachh, Stefan Steinerberger

    Abstract: We study directed, weighted graphs $G=(V,E)$ and consider the (not necessarily symmetric) averaging operator $$ (\mathcal{L}u)(i) = -\sum_{j \sim_{} i}{p_{ij} (u(j) - u(i))},$$ where $p_{ij}$ are normalized edge weights. Given a vertex $i \in V$, we define the diffusion distance to a set $B \subset V$ as the smallest number of steps $d_{B}(i) \in \mathbb{N}$ required for half of all random walks s… ▽ More

    Submitted 9 November, 2016; originally announced November 2016.

  14. arXiv:1607.04566  [pdf, other

    stat.ML

    Spectral Echolocation via the Wave Embedding

    Authors: Alexander Cloninger, Stefan Steinerberger

    Abstract: Spectral embedding uses eigenfunctions of the discrete Laplacian on a weighted graph to obtain coordinates for an embedding of an abstract data set into Euclidean space. We propose a new pre-processing step of first using the eigenfunctions to simulate a low-frequency wave moving over the data and using both position as well as change in time of the wave to obtain a refined metric to which classic… ▽ More

    Submitted 15 July, 2016; originally announced July 2016.