-
$O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds
Authors:
Andrew Lee,
Harlin Lee,
Jose A. Perea,
Nikolas Schonsheck,
Madeleine Weinstein
Abstract:
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel (respectively, Grassmannian) manifolds. In this work, we propose an algorithm called Principal Stiefel Coordinates (PSC) to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to…
▽ More
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel (respectively, Grassmannian) manifolds. In this work, we propose an algorithm called Principal Stiefel Coordinates (PSC) to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an $O(k)$-equivariant manner ($k \leq n \ll N$). We begin by observing that each element $α\in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $V_k(\mathbb{R}^N)$. Next, we optimize for such an embedding map that minimizes data fit error by warm-starting with the output of principal component analysis (PCA) and applying gradient descent. Then, we define a continuous and $O(k)$-equivariant map $π_α$ that acts as a ``closest point operator'' to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $α$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that the PCA output globally minimizes projection error in a noiseless setting, but that our algorithm achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Leveraging the Power of Place: A Data-Driven Decision Helper to Improve the Location Decisions of Economic Immigrants
Authors:
Jeremy Ferwerda,
Nicholas Adams-Cohen,
Kirk Bansak,
Jennifer Fei,
Duncan Lawrence,
Jeremy M. Weinstein,
Jens Hainmueller
Abstract:
A growing number of countries have established programs to attract immigrants who can contribute to their economy. Research suggests that an immigrant's initial arrival location plays a key role in sha** their economic success. Yet immigrants currently lack access to personalized information that would help them identify optimal destinations. Instead, they often rely on availability heuristics,…
▽ More
A growing number of countries have established programs to attract immigrants who can contribute to their economy. Research suggests that an immigrant's initial arrival location plays a key role in sha** their economic success. Yet immigrants currently lack access to personalized information that would help them identify optimal destinations. Instead, they often rely on availability heuristics, which can lead to the selection of sub-optimal landing locations, lower earnings, elevated outmigration rates, and concentration in the most well-known locations. To address this issue and counteract the effects of cognitive biases and limited information, we propose a data-driven decision helper that draws on behavioral insights, administrative data, and machine learning methods to inform immigrants' location decisions. The decision helper provides personalized location recommendations that reflect immigrants' preferences as well as data-driven predictions of the locations where they maximize their expected earnings given their profile. We illustrate the potential impact of our approach using backtests conducted with administrative data that links landing data of recent economic immigrants from Canada's Express Entry system with their earnings retrieved from tax records. Simulations across various scenarios suggest that providing location recommendations to incoming economic immigrants can increase their initial earnings and lead to a mild shift away from the most populous landing destinations. Our approach can be implemented within existing institutional structures at minimal cost, and offers governments an opportunity to harness their administrative data to improve outcomes for economic immigrants.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Voronoi Cells in Metric Algebraic Geometry of Plane Curves
Authors:
Madeline Brandt,
Madeleine Weinstein
Abstract:
Voronoi cells of varieties encode many features of their metric geometry. We prove that each Voronoi or Delaunay cell of a plane curve appears as the limit of a sequence of cells obtained from point samples of the curve. We use this result to study metric features of plane curves, including the medial axis, curvature, evolute, bottlenecks, and reach. In each case, we provide algebraic equations de…
▽ More
Voronoi cells of varieties encode many features of their metric geometry. We prove that each Voronoi or Delaunay cell of a plane curve appears as the limit of a sequence of cells obtained from point samples of the curve. We use this result to study metric features of plane curves, including the medial axis, curvature, evolute, bottlenecks, and reach. In each case, we provide algebraic equations defining the object and, where possible, give formulas for the degrees of these algebraic varieties. We show how to identify the desired metric feature from Voronoi or Delaunay cells, and therefore how to approximate it by a finite point sample from the variety.
△ Less
Submitted 18 August, 2023; v1 submitted 26 June, 2019;
originally announced June 2019.
-
Voronoi Cells of Varieties
Authors:
Diego Cifuentes,
Kristian Ranestad,
Bernd Sturmfels,
Madeleine Weinstein
Abstract:
Every real algebraic variety determines a Voronoi decomposition of its ambient Euclidean space. Each Voronoi cell is a convex semialgebraic set in the normal space of the variety at a point. We compute the algebraic boundaries of these Voronoi cells.
Every real algebraic variety determines a Voronoi decomposition of its ambient Euclidean space. Each Voronoi cell is a convex semialgebraic set in the normal space of the variety at a point. We compute the algebraic boundaries of these Voronoi cells.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
Analyzing Big Data with Dynamic Quantum Clustering
Authors:
M. Weinstein,
F. Meirer,
A. Hume,
Ph. Sciau,
G. Shaked,
R. Hofstetter,
E. Persi,
A. Mehta,
D. Horn
Abstract:
How does one search for a needle in a multi-dimensional haystack without knowing what a needle is and without knowing if there is one in the haystack? This kind of problem requires a paradigm shift - away from hypothesis driven searches of the data - towards a methodology that lets the data speak for itself. Dynamic Quantum Clustering (DQC) is such a methodology. DQC is a powerful visual method th…
▽ More
How does one search for a needle in a multi-dimensional haystack without knowing what a needle is and without knowing if there is one in the haystack? This kind of problem requires a paradigm shift - away from hypothesis driven searches of the data - towards a methodology that lets the data speak for itself. Dynamic Quantum Clustering (DQC) is such a methodology. DQC is a powerful visual method that works with big, high-dimensional data. It exploits variations of the density of the data (in feature space) and unearths subsets of the data that exhibit correlations among all the measured variables. The outcome of a DQC analysis is a movie that shows how and why sets of data-points are eventually classified as members of simple clusters or as members of - what we call - extended structures. This allows DQC to be successfully used in a non-conventional exploratory mode where one searches data for unexpected information without the need to model the data. We show how this works for big, complex, real-world datasets that come from five distinct fields: i.e., x-ray nano-chemistry, condensed matter, biology, seismology and finance. These studies show how DQC excels at uncovering unexpected, small - but meaningful - subsets of the data that contain important information. We also establish an important new result: namely, that big, complex datasets often contain interesting structures that will be missed by many conventional clustering techniques. Experience shows that these structures appear frequently enough that it is crucial to know they can exist, and that when they do, they encode important hidden information. In short, we not only demonstrate that DQC can be flexibly applied to datasets that present significantly different challenges, we also show how a simple analysis can be used to look for the needle in the haystack, determine what it is, and find what this means.
△ Less
Submitted 17 October, 2013; v1 submitted 10 October, 2013;
originally announced October 2013.
-
Reducing Memory Cost of Exact Diagonalization using Singular Value Decomposition
Authors:
Marvin Weinstein,
Assa Auerbach,
V. Ravi Chandra
Abstract:
We present a modified Lanczos algorithm to diagonalize lattice Hamiltonians with dramatically reduced memory requirements, {\em without restricting to variational ansatzes}. The lattice of size $N$ is partitioned into two subclusters. At each iteration the Lanczos vector is projected into two sets of $n_{\rm svd}$ smaller subcluster vectors using singular value decomposition. For low entanglement…
▽ More
We present a modified Lanczos algorithm to diagonalize lattice Hamiltonians with dramatically reduced memory requirements, {\em without restricting to variational ansatzes}. The lattice of size $N$ is partitioned into two subclusters. At each iteration the Lanczos vector is projected into two sets of $n_{\rm svd}$ smaller subcluster vectors using singular value decomposition. For low entanglement entropy $S_{ee}$, (satisfied by short range Hamiltonians), the truncation error is expected to vanish as $\exp(-n_{\rm svd}^{1/S_{ee}})$. Convergence is tested for the Heisenberg model on Kagomé clusters of 24, 30 and 36 sites, with no lattice symmetries exploited, using less than 15GB of dynamical memory. Generalization of the Lanczos-SVD algorithm to multiple partitioning is discussed, and comparisons to other techniques are given.
△ Less
Submitted 12 September, 2011; v1 submitted 29 April, 2011;
originally announced May 2011.
-
Strange Bedfellows: Quantum Mechanics and Data Mining
Authors:
Marvin Weinstein
Abstract:
Last year, in 2008, I gave a talk titled {\it Quantum Calisthenics}. This year I am going to tell you about how the work I described then has spun off into a most unlikely direction. What I am going to talk about is how one maps the problem of finding clusters in a given data set into a problem in quantum mechanics. I will then use the tricks I described to let quantum evolution lets the cluster…
▽ More
Last year, in 2008, I gave a talk titled {\it Quantum Calisthenics}. This year I am going to tell you about how the work I described then has spun off into a most unlikely direction. What I am going to talk about is how one maps the problem of finding clusters in a given data set into a problem in quantum mechanics. I will then use the tricks I described to let quantum evolution lets the clusters come together on their own.
△ Less
Submitted 2 November, 2009;
originally announced November 2009.
-
Dynamic quantum clustering: a method for visual exploration of structures in data
Authors:
Marvin Weinstein,
David Horn
Abstract:
A given set of data-points in some feature space may be associated with a Schrodinger equation whose potential is determined by the data. This is known to lead to good clustering solutions. Here we extend this approach into a full-fledged dynamical scheme using a time-dependent Schrodinger equation. Moreover, we approximate this Hamiltonian formalism by a truncated calculation within a set of Ga…
▽ More
A given set of data-points in some feature space may be associated with a Schrodinger equation whose potential is determined by the data. This is known to lead to good clustering solutions. Here we extend this approach into a full-fledged dynamical scheme using a time-dependent Schrodinger equation. Moreover, we approximate this Hamiltonian formalism by a truncated calculation within a set of Gaussian wave functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition or feature filtering.
△ Less
Submitted 18 August, 2009;
originally announced August 2009.