-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Astronomical Spectroscopy with Skipper CCDs: First Results from a Skipper CCD Focal Plane Prototype at SIFS
Authors:
Edgar Marrufo Villalpando,
Alex Drlica-Wagner,
Brandon Roach,
Marco Bonati,
Abhishek Bakshi,
Julia Campa,
Gustavo Cancelo,
Braulio Cancino,
Claudio R. Chavez,
Fernando Chierchie,
Juan Estrada,
Guillermo Fernandez Moroni,
Luciano Fraga,
Manuel E. Gaido,
Stephen E. Holland,
Rachel Hur,
Michelle Jonas,
Peter Moore,
Eduardo Paolini,
Andrés A. Plazas Malagón,
Leandro Stefanazzi,
Javier Tiffenberg,
Ken Treptou,
Sho Uemura,
Neal Wilcer
Abstract:
We present the first on-sky results from an ultra-low-readout-noise Skipper CCD focal plane prototype for the SOAR Integral Field Spectrograph (SIFS). The Skipper CCD focal plane consists of four 6k x 1k, 15 $μ$m pixel, fully-depleted, p-channel devices that have been thinned to ~250 $μ$m, backside processed, and treated with an anti-reflective coating. These Skipper CCDs were configured for astro…
▽ More
We present the first on-sky results from an ultra-low-readout-noise Skipper CCD focal plane prototype for the SOAR Integral Field Spectrograph (SIFS). The Skipper CCD focal plane consists of four 6k x 1k, 15 $μ$m pixel, fully-depleted, p-channel devices that have been thinned to ~250 $μ$m, backside processed, and treated with an anti-reflective coating. These Skipper CCDs were configured for astronomical spectroscopy, i.e., single-sample readout noise < 4.3 e- rms/pixel, the ability to achieve multi-sample readout noise $\ll$ 1 e- rms/pixel, full-well capacities ~40,000-65,000 e-, low dark current and charge transfer inefficiency (~2 x 10$^{-4}$ e-/pixel/s and 3.44 x 10$^{-7}$, respectively), and an absolute quantum efficiency of $\gtrsim$ 80% between 450 nm and 980 nm ($\gtrsim$ 90% between 600 nm and 900 nm). We optimized the readout sequence timing to achieve sub-electron noise (~0.5 e- rms/pixel) in a region of 2k x 4k pixels and photon-counting noise (~0.22 e- rms/pixel) in a region of 220 x 4k pixels, each with a readout time of $\lesssim$ 17 min. We observed two quasars (HB89 1159+123 and QSO J1621-0042) at redshift z ~ 3.5, two high-redshift galaxy clusters (CL J1001+0220 and SPT-CL J2040-4451), an emission line galaxy at z = 0.3239, a candidate member star of the Boötes II ultra-faint dwarf galaxy, and five CALSPEC spectrophotometric standard stars (HD074000, HD60753, HD106252, HD101452, HD200654). We present charge-quantized, photon-counting observations of the quasar HB89 1159+123 and show the detector sensitivity increase for faint spectral features. We demonstrate signal-to-noise performance improvements for SIFS observations in the low-background, readout-noise-dominated regime. We outline scientific studies that will leverage the SIFS-Skipper CCD data and new detector architectures that utilize the Skipper floating gate amplifier with faster readout times.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Efficient Certificates of Anti-Concentration Beyond Gaussians
Authors:
Ainesh Bakshi,
Pravesh Kothari,
Goutham Rajendran,
Madhur Tulsiani,
Aravindan Vijayaraghavan
Abstract:
A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificate…
▽ More
A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points $X$ corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures, yet remain limited to rotationally invariant distributions.
This work presents a new (and arguably the most natural) formulation for anti-concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_p$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. Our approach constructs a canonical integer program for anti-concentration and analysis a sum-of-squares relaxation of it, independent of the intended application. We rely on duality and analyze a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Structure learning of Hamiltonians from real-time evolution
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Ewin Tang
Abstract:
We initiate the study of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-studied under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But i…
▽ More
We initiate the study of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-studied under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But is it possible to learn a local Hamiltonian without prior knowledge of its interaction structure?
We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with an evolution time scaling with $1/\varepsilon$, and has the following appealing properties: (1) it does not need to know the Hamiltonian terms; (2) it works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; (3) it evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. To our knowledge, no prior algorithm with Heisenberg-limited scaling existed with even one of these properties. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^2$.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
High-Temperature Gibbs States are Unentangled and Efficiently Preparable
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Ewin Tang
Abstract:
We show that thermal states of local Hamiltonians are separable above a constant temperature. Specifically, for a local Hamiltonian $H$ on a graph with degree $\mathfrak{d}$, its Gibbs state at inverse temperature $β$, denoted by $ρ=e^{-βH}/ \textrm{tr}(e^{-βH})$, is a classical distribution over product states for all $β< 1/(c\mathfrak{d})$, where $c$ is a constant. This sudden death of thermal e…
▽ More
We show that thermal states of local Hamiltonians are separable above a constant temperature. Specifically, for a local Hamiltonian $H$ on a graph with degree $\mathfrak{d}$, its Gibbs state at inverse temperature $β$, denoted by $ρ=e^{-βH}/ \textrm{tr}(e^{-βH})$, is a classical distribution over product states for all $β< 1/(c\mathfrak{d})$, where $c$ is a constant. This sudden death of thermal entanglement upends conventional wisdom about the presence of short-range quantum correlations in Gibbs states.
Moreover, we show that we can efficiently sample from the distribution over product states. In particular, for any $β< 1/( c \mathfrak{d}^3)$, we can prepare a state $ε$-close to $ρ$ in trace distance with a depth-one quantum circuit and $\textrm{poly}(n) \log(1/ε)$ classical overhead. A priori the task of preparing a Gibbs state is a natural candidate for achieving super-polynomial quantum speedups, but our results rule out this possibility above a fixed constant temperature.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
File System Aging
Authors:
Alex Conway,
Ainesh Bakshi,
Arghya Bhattacharya,
Rory Bennett,
Yizheng Jiao,
Eric Knorr,
Yang Zhan,
Michael A. Bender,
William Jannen,
Rob Johnson,
Bradley C. Kuszmaul,
Donald E. Porter,
Jun Yuan,
Martin Farach-Colton
Abstract:
File systems must allocate space for files without knowing what will be added or removed in the future. Over the life of a file system, this may cause suboptimal file placement decisions that eventually lead to slower performance, or aging. Conventional wisdom suggests that file system aging is a solved problem in the common case; heuristics to avoid aging, such as colocating related files and dat…
▽ More
File systems must allocate space for files without knowing what will be added or removed in the future. Over the life of a file system, this may cause suboptimal file placement decisions that eventually lead to slower performance, or aging. Conventional wisdom suggests that file system aging is a solved problem in the common case; heuristics to avoid aging, such as colocating related files and data blocks, are effective until a storage device fills up, at which point space pressure exacerbates fragmentation-based aging. However, this article describes both realistic and synthetic workloads that can cause these heuristics to fail, inducing large performance declines due to aging, even when the storage device is nearly empty.
We argue that these slowdowns are caused by poor layout. We demonstrate a correlation between the read performance of a directory scan and the locality within a file system's access patterns, using a dynamic layout score. We complement these results with microbenchmarks that show that space pressure can cause a substantial amount of inter-file and intra-file fragmentation. However, our results suggest that the effect of free-space fragmentation on read performance is best described as accelerating the file system aging process. The effect on write performance is non-existent in some cases, and, in most cases, an order of magnitude smaller than the read degradation from fragmentation caused by normal usage.
In short, many file systems are exquisitely prone to read aging after a variety of write patterns. We show, however, that aging is not inevitable. BetrFS, a file system based on write-optimized dictionaries, exhibits almost no aging in our experiments. We present a framework for understanding and predicting aging, and identify the key features of BetrFS that avoid aging.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
Authors:
Ainesh Bakshi,
Vincent Cohen-Addad,
Samuel B. Hopkins,
Rajesh Jayaram,
Silvio Lattanzi
Abstract:
Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities $\{d_{i,j}\}_{i , j \in [n]}$ over $n$ points, the goal is to find an embedding $\{x_1,\dots,x_n\} \in \mathbb{R}^k$ that minimizes \[\text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \l…
▽ More
Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities $\{d_{i,j}\}_{i , j \in [n]}$ over $n$ points, the goal is to find an embedding $\{x_1,\dots,x_n\} \in \mathbb{R}^k$ that minimizes \[\text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{\|x_i - x_j\|}{d_{i,j}}\right)^2 \right] \]
Kamada-Kawai provides a more relaxed measure of the quality of a low-dimensional metric embedding than the traditional bi-Lipschitz-ness measure studied in theoretical computer science; this is advantageous because strong hardness-of-approximation results are known for the latter, Kamada-Kawai admits nontrivial approximation algorithms. Despite its popularity, our theoretical understanding of MDS is limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai in the constant-$k$ regime, with cost $\text{OPT} +ε$ in $n^2 2^{\text{poly}(Δ/ε)}$ time, where $Δ$ is the aspect ratio of the input. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $Δ$: we achieve a solution with cost $\tilde{O}(\log Δ)\text{OPT}^{Ω(1)}+ε$ in time $n^{O(1)}2^{\text{poly}(\log(Δ)/ε)}$.
Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards develo** general-purpose techniques for efficient metric optimization algorithms.
△ Less
Submitted 11 April, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Characterization and Optimization of Skipper CCDs for the SOAR Integral Field Spectrograph
Authors:
Edgar Marrufo Villalpando,
Alex Drlica-Wagner,
Andrés A. Plazas Malagón,
Abhishek Bakshi,
Marco Bonati,
Julia Campa,
Braulio Cancino,
Claudio R. Chavez,
Juan Estrada,
Guillermo Fernandez Moroni,
Luciano Fraga,
Manuel E. Gaido,
Stephen Holland,
Rachel Hur,
Michelle Jonas,
Peter Moore,
Javier Tiffenberg
Abstract:
We present results from the characterization and optimization of six Skipper CCDs for use in a prototype focal plane for the SOAR Integral Field Spectrograph (SIFS). We tested eight Skipper CCDs and selected six for SIFS based on performance results. The Skipper CCDs are 6k $\times$ 1k, 15 $μ$m pixels, thick, fully-depleted, $p$-channel devices that have been thinned to $\sim 250 μ$m, backside pro…
▽ More
We present results from the characterization and optimization of six Skipper CCDs for use in a prototype focal plane for the SOAR Integral Field Spectrograph (SIFS). We tested eight Skipper CCDs and selected six for SIFS based on performance results. The Skipper CCDs are 6k $\times$ 1k, 15 $μ$m pixels, thick, fully-depleted, $p$-channel devices that have been thinned to $\sim 250 μ$m, backside processed, and treated with an antireflective coating. We optimize readout time to achieve $<4.3$ e$^-$ rms/pixel in a single non-destructive readout and $0.5$ e$^-$ rms/pixel in $5 \%$ of the detector. We demonstrate single-photon counting with $N_{\rm samp}$ = 400 ($σ_{\rm 0e^-} \sim$ 0.18 e$^-$ rms/pixel) for all 24 amplifiers (four amplifiers per detector). We also perform conventional CCD characterization measurements such as cosmetic defects ($ <0.45 \%$ ``bad" pixels), dark current ($\sim 2 \times 10^{-4}$ e$^-$/pixel/sec.), charge transfer inefficiency ($3.44 \times 10^{-7}$ on average), and charge diffusion (PSF $< 7.5 μ$m). We report on characterization and optimization measurements that are only enabled by photon-counting. Such results include voltage optimization to achieve full-well capacities $\sim 40,000-63,000$ e$^-$ while maintaining photon-counting capabilities, clock induced charge optimization, non-linearity measurements at low signals (few tens of electrons). Furthermore, we perform measurements of the brighter-fatter effect and absolute quantum efficiency ($\gtrsim\, 80 \%$ between 450 nm and 980 nm; $\gtrsim\,90 \%$ between 600 nm and 900 nm) using Skipper CCDs.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Stealthy Terrain-Aware Multi-Agent Active Search
Authors:
Nikhil Angad Bakshi,
Jeff Schneider
Abstract:
Stealthy multi-agent active search is the problem of making efficient sequential data-collection decisions to identify an unknown number of sparsely located targets while adapting to new sensing information and concealing the search agents' location from the targets. This problem is applicable to reconnaissance tasks wherein the safety of the search agents can be compromised as the targets may be…
▽ More
Stealthy multi-agent active search is the problem of making efficient sequential data-collection decisions to identify an unknown number of sparsely located targets while adapting to new sensing information and concealing the search agents' location from the targets. This problem is applicable to reconnaissance tasks wherein the safety of the search agents can be compromised as the targets may be adversarial. Prior work usually focuses either on adversarial search, where the risk of revealing the agents' location to the targets is ignored or evasion strategies where efficient search is ignored. We present the Stealthy Terrain-Aware Reconnaissance (STAR) algorithm, a multi-objective parallelized Thompson sampling-based algorithm that relies on a strong topographical prior to reason over changing visibility risk over the course of the search. The STAR algorithm outperforms existing state-of-the-art multi-agent active search methods on both rate of recovery of targets as well as minimising risk even when subject to noisy observations, communication failures and an unknown number of targets.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Learning quantum Hamiltonians at any temperature in polynomial time
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Ewin Tang
Abstract:
We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obta…
▽ More
We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $ε$ from polynomially many copies of the Gibbs state at any constant $β> 0$.
Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Morris Yau
Abstract:
Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical…
▽ More
Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.
△ Less
Submitted 23 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
A Near-Linear Time Algorithm for the Chamfer Distance
Authors:
Ainesh Bakshi,
Piotr Indyk,
Rajesh Jayaram,
Sandeep Silwal,
Erik Waingarten
Abstract:
For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, an…
▽ More
For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets.
We overcome this bottleneck and present the first $(1+ε)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate map** from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Krylov Methods are (nearly) Optimal for Low-Rank Approximation
Authors:
Ainesh Bakshi,
Shyam Narayanan
Abstract:
We consider the problem of rank-$1$ low-rank approximation (LRA) in the matrix-vector product model under various Schatten norms: $$
\min_{\|u\|_2=1} \|A (I - u u^\top)\|_{\mathcal{S}_p} , $$ where $\|M\|_{\mathcal{S}_p}$ denotes the $\ell_p$ norm of the singular values of $M$. Given $\varepsilon>0$, our goal is to output a unit vector $v$ such that…
▽ More
We consider the problem of rank-$1$ low-rank approximation (LRA) in the matrix-vector product model under various Schatten norms: $$
\min_{\|u\|_2=1} \|A (I - u u^\top)\|_{\mathcal{S}_p} , $$ where $\|M\|_{\mathcal{S}_p}$ denotes the $\ell_p$ norm of the singular values of $M$. Given $\varepsilon>0$, our goal is to output a unit vector $v$ such that $$
\|A(I - vv^\top)\|_{\mathcal{S}_p} \leq (1+\varepsilon) \min_{\|u\|_2=1}\|A(I - u u^\top)\|_{\mathcal{S}_p}. $$ Our main result shows that Krylov methods (nearly) achieve the information-theoretically optimal number of matrix-vector products for Spectral ($p=\infty$), Frobenius ($p=2$) and Nuclear ($p=1$) LRA.
In particular, for Spectral LRA, we show that any algorithm requires $Ω\left(\log(n)/\varepsilon^{1/2}\right)$ matrix-vector products, exactly matching the upper bound obtained by Krylov methods [MM15, BCW22]. Our lower bound addresses Open Question 1 in [Woo14], providing evidence for the lack of progress on algorithms for Spectral LRA and resolves Open Question 1.2 in [BCW22]. Next, we show that for any fixed constant $p$, i.e. $1\leq p =O(1)$, there is an upper bound of $O\left(\log(1/\varepsilon)/\varepsilon^{1/3}\right)$ matrix-vector products, implying that the complexity does not grow as a function of input size. This improves the $O\left(\log(n/\varepsilon)/\varepsilon^{1/3}\right)$ bound recently obtained in [BCW22], and matches their $Ω\left(1/\varepsilon^{1/3}\right)$ lower bound, to a $\log(1/\varepsilon)$ factor.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search
Authors:
Nikhil Angad Bakshi,
Tejus Gupta,
Ramina Ghods,
Jeff Schneider
Abstract:
Robotic solutions for quick disaster response are essential to ensure minimal loss of life, especially when the search area is too dangerous or too vast for human rescuers. We model this problem as an asynchronous multi-agent active-search task where each robot aims to efficiently seek objects of interest (OOIs) in an unknown environment. This formulation addresses the requirement that search miss…
▽ More
Robotic solutions for quick disaster response are essential to ensure minimal loss of life, especially when the search area is too dangerous or too vast for human rescuers. We model this problem as an asynchronous multi-agent active-search task where each robot aims to efficiently seek objects of interest (OOIs) in an unknown environment. This formulation addresses the requirement that search missions should focus on quick recovery of OOIs rather than full coverage of the search region. Previous approaches fail to accurately model sensing uncertainty, account for occlusions due to foliage or terrain, or consider the requirement for heterogeneous search teams and robustness to hardware and communication failures. We present the Generalized Uncertainty-aware Thompson Sampling (GUTS) algorithm, which addresses these issues and is suitable for deployment on heterogeneous multi-robot systems for active search in large unstructured environments. We show through simulation experiments that GUTS consistently outperforms existing methods such as parallelized Thompson Sampling and exhaustive search, recovering all OOIs in 80% of all runs. In contrast, existing approaches recover all OOIs in less than 40% of all runs. We conduct field tests using our multi-robot system in an unstructured environment with a search area of approximately 75,000 sq. m. Our system demonstrates robustness to various failure modes, achieving full recovery of OOIs (where feasible) in every field run, and significantly outperforming our baseline.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
An Improved Classical Singular Value Transformation for Quantum Machine Learning
Authors:
Ainesh Bakshi,
Ewin Tang
Abstract:
We study quantum speedups in quantum machine learning (QML) by analyzing the quantum singular value transformation (QSVT) framework. QSVT, introduced by [GSLW, STOC'19, arXiv:1806.01838], unifies all major types of quantum speedup; in particular, a wide variety of QML proposals are applications of QSVT on low-rank classical data. We challenge these proposals by providing a classical algorithm that…
▽ More
We study quantum speedups in quantum machine learning (QML) by analyzing the quantum singular value transformation (QSVT) framework. QSVT, introduced by [GSLW, STOC'19, arXiv:1806.01838], unifies all major types of quantum speedup; in particular, a wide variety of QML proposals are applications of QSVT on low-rank classical data. We challenge these proposals by providing a classical algorithm that matches the performance of QSVT in this regime up to a small polynomial overhead.
We show that, given a matrix $A \in \mathbb{C}^{m\times n}$, a vector $b \in \mathbb{C}^{n}$, a bounded degree-$d$ polynomial $p$, and linear-time pre-processing, we can output a description of a vector $v$ such that $\|v - p(A) b\| \leq \varepsilon\|b\|$ in $\widetilde{\mathcal{O}}(d^{11} \|A\|_{\mathrm{F}}^4 / (\varepsilon^2 \|A\|^4 ))$ time. This improves upon the best known classical algorithm [CGLLTW, STOC'20, arXiv:1910.06151], which requires $\widetilde{\mathcal{O}}(d^{22} \|A\|_{\mathrm{F}}^6 /(\varepsilon^6 \|A\|^6 ) )$ time, and narrows the gap with QSVT, which, after linear-time pre-processing to load input into a quantum-accessible memory, can estimate the magnitude of an entry $p(A)b$ to $\varepsilon\|b\|$ error in $\widetilde{\mathcal{O}}(d\|A\|_{\mathrm{F}}/(\varepsilon \|A\|))$ time.
Our key insight is to combine the Clenshaw recurrence, an iterative method for computing matrix polynomials, with sketching techniques to simulate QSVT classically. We introduce several new classical techniques in this work, including (a) a non-oblivious matrix sketch for approximately preserving bi-linear forms, (b) a new stability analysis for the Clenshaw recurrence, and (c) a new technique to bound arithmetic progressions of the coefficients appearing in the Chebyshev series expansion of bounded functions, each of which may be of independent interest.
△ Less
Submitted 3 August, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
A New Approach to Learning Linear Dynamical Systems
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Morris Yau
Abstract:
Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in…
▽ More
Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in the 1960's. Towards these ends, we provide the first polynomial time algorithm for learning a linear dynamical system from a polynomial length trajectory up to polynomial error in the system parameters under essentially minimal assumptions: observability, controllability, and marginal stability. Our algorithm is built on a method of moments estimator to directly estimate Markov parameters from which the dynamics can be extracted. Furthermore, we provide statistical lower bounds when our observability and controllability assumptions are violated.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation
Authors:
Ainesh Bakshi,
Piotr Indyk,
Praneeth Kacham,
Sandeep Silwal,
Samson Zhou
Abstract:
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent comput…
▽ More
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $Ω(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts.
We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Performance of the CMS High Granularity Calorimeter prototype to charged pion beams of 20$-$300 GeV/c
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
M. Alhusseini,
J. Alison,
J. P. Figueiredo de sa Sousa de Almeida,
P. G. Dias de Almeida,
A. Alpana,
M. Alyari,
I. Andreev,
U. Aras,
P. Aspell,
I. O. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
S. Banerjee,
P. DeBarbaro,
P. Bargassa,
D. Barney,
F. Beaudette
, et al. (435 additional authors not shown)
Abstract:
The upgrade of the CMS experiment for the high luminosity operation of the LHC comprises the replacement of the current endcap calorimeter by a high granularity sampling calorimeter (HGCAL). The electromagnetic section of the HGCAL is based on silicon sensors interspersed between lead and copper (or copper tungsten) absorbers. The hadronic section uses layers of stainless steel as an absorbing med…
▽ More
The upgrade of the CMS experiment for the high luminosity operation of the LHC comprises the replacement of the current endcap calorimeter by a high granularity sampling calorimeter (HGCAL). The electromagnetic section of the HGCAL is based on silicon sensors interspersed between lead and copper (or copper tungsten) absorbers. The hadronic section uses layers of stainless steel as an absorbing medium and silicon sensors as an active medium in the regions of high radiation exposure, and scintillator tiles directly readout by silicon photomultipliers in the remaining regions. As part of the development of the detector and its readout electronic components, a section of a silicon-based HGCAL prototype detector along with a section of the CALICE AHCAL prototype was exposed to muons, electrons and charged pions in beam test experiments at the H2 beamline at the CERN SPS in October 2018. The AHCAL uses the same technology as foreseen for the HGCAL but with much finer longitudinal segmentation. The performance of the calorimeters in terms of energy response and resolution, longitudinal and transverse shower profiles is studied using negatively charged pions, and is compared to GEANT4 predictions. This is the first report summarizing results of hadronic showers measured by the HGCAL prototype using beam test data.
△ Less
Submitted 27 May, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Design of a Skipper CCD Focal Plane for the SOAR Integral Field Spectrograph
Authors:
Edgar Marrufo Villalpando,
Alex Drlica-Wagner,
Marco Bonati,
Abhishek Bakshi,
Vanessa Bawden de Paula Macanhan,
Braulio Cancino,
Gregory E. Derylo,
Juan Estrada,
Guillermo Fernandez Moroni,
Luciano Fraga,
Stephen Holland,
Michelle J. Jonas,
Agustín Lapi,
Peter Moore,
Andrés A. Plazas Malagón,
Leandro Stefanazzi,
Javier Tiffenberg
Abstract:
We present the development of a Skipper Charge-Coupled Device (CCD) focal plane prototype for the SOAR Telescope Integral Field Spectrograph (SIFS). This mosaic focal plane consists of four 6k $\times$ 1k, 15 $μ$m pixel Skipper CCDs mounted inside a vacuum dewar. We describe the process of packaging the CCDs so that they can be easily tested, transported, and installed in a mosaic focal plane. We…
▽ More
We present the development of a Skipper Charge-Coupled Device (CCD) focal plane prototype for the SOAR Telescope Integral Field Spectrograph (SIFS). This mosaic focal plane consists of four 6k $\times$ 1k, 15 $μ$m pixel Skipper CCDs mounted inside a vacuum dewar. We describe the process of packaging the CCDs so that they can be easily tested, transported, and installed in a mosaic focal plane. We characterize the performance of $\sim 650 μ$m thick, fully-depleted engineering-grade Skipper CCDs in preparation for performing similar characterization tests on science-grade Skipper CCDs which will be thinned to 250$μ$m and backside processed with an antireflective coating. We achieve a single-sample readout noise of $4.5 e^{-} rms/pix$ for the best performing amplifiers and sub-electron resolution (photon counting capabilities) with readout noise $σ\sim 0.16 e^{-} rms/pix$ from 800 measurements of the charge in each pixel. We describe the design and construction of the Skipper CCD focal plane and provide details about the synchronized readout electronics system that will be implemented to simultaneously read 16 amplifiers from the four Skipper CCDs (4-amplifiers per detector). Finally, we outline future plans for laboratory testing, installation, commissioning, and science verification of our Skipper CCD focal plane.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Snowmass 2021 CMB-S4 White Paper
Authors:
Kevork Abazajian,
Arwa Abdulghafour,
Graeme E. Addison,
Peter Adshead,
Zeeshan Ahmed,
Marco Ajello,
Daniel Akerib,
Steven W. Allen,
David Alonso,
Marcelo Alvarez,
Mustafa A. Amin,
Mandana Amiri,
Adam Anderson,
Behzad Ansarinejad,
Melanie Archipley,
Kam S. Arnold,
Matt Ashby,
Han Aung,
Carlo Baccigalupi,
Carina Baker,
Abhishek Bakshi,
Debbie Bard,
Denis Barkats,
Darcy Barron,
Peter S. Barry
, et al. (331 additional authors not shown)
Abstract:
This Snowmass 2021 White Paper describes the Cosmic Microwave Background Stage 4 project CMB-S4, which is designed to cross critical thresholds in our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. We provide an overview of the science case, the technical design, and project plan.
This Snowmass 2021 White Paper describes the Cosmic Microwave Background Stage 4 project CMB-S4, which is designed to cross critical thresholds in our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. We provide an overview of the science case, the technical design, and project plan.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Robust Nonparametric Distribution Forecast with Backtest-based Bootstrap and Adaptive Residual Selection
Authors:
Longshaokan Wang,
Lingda Wang,
Mina Georgieva,
Paulo Machado,
Abinaya Ulagappa,
Safwan Ahmed,
Yan Lu,
Arjun Bakshi,
Farhad Ghassemi
Abstract:
Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for planning - for example when making production capacity or inventory allocation decisions. We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive…
▽ More
Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for planning - for example when making production capacity or inventory allocation decisions. We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive residual selection. The proposed approach is robust to the choice of the underlying forecasting model, accounts for uncertainty around the input covariates, and relaxes the independence between residuals and covariates assumption. It reduces the Absolute Coverage Error by more than 63% compared to the classic bootstrap approaches and by 2% - 32% compared to a variety of State-of-the-Art deep learning approaches on in-house product sales data and M4-hourly competition data.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Low-Rank Approximation with $1/ε^{1/3}$ Matrix-Vector Products
Authors:
Ainesh Bakshi,
Kenneth L. Clarkson,
David P. Woodruff
Abstract:
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $ε$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+ε)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where…
▽ More
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $ε$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+ε)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrtε)$ matrix-vector products, improving on the naïve $\tilde{O}(k/ε)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/ε))$ factors.
Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/ε^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/ε^{1/2})$ bound to $\tilde{O}(k/ε^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $(1+ ε)$-factor when $p \geq (\log d)/ε$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $Ω(1/ε^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tildeΘ(1/ε^{1/3})$ is the optimal complexity for constant~$k$.
To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators.
△ Less
Submitted 16 June, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Response of a CMS HGCAL silicon-pad electromagnetic calorimeter prototype to 20-300 GeV positrons
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
F. Alam Khan,
M. Alhusseini,
J. Alison,
A. Alpana,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
P. Aspell,
I. O. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
S. Bannerjee,
P. Bargassa,
D. Barney,
F. Beaudette
, et al. (364 additional authors not shown)
Abstract:
The Compact Muon Solenoid Collaboration is designing a new high-granularity endcap calorimeter, HGCAL, to be installed later this decade. As part of this development work, a prototype system was built, with an electromagnetic section consisting of 14 double-sided structures, providing 28 sampling layers. Each sampling layer has an hexagonal module, where a multipad large-area silicon sensor is glu…
▽ More
The Compact Muon Solenoid Collaboration is designing a new high-granularity endcap calorimeter, HGCAL, to be installed later this decade. As part of this development work, a prototype system was built, with an electromagnetic section consisting of 14 double-sided structures, providing 28 sampling layers. Each sampling layer has an hexagonal module, where a multipad large-area silicon sensor is glued between an electronics circuit board and a metal baseplate. The sensor pads of approximately 1 cm$^2$ are wire-bonded to the circuit board and are readout by custom integrated circuits. The prototype was extensively tested with beams at CERN's Super Proton Synchrotron in 2018. Based on the data collected with beams of positrons, with energies ranging from 20 to 300 GeV, measurements of the energy resolution and linearity, the position and angular resolutions, and the shower shapes are presented and compared to a detailed Geant4 simulation.
△ Less
Submitted 31 March, 2022; v1 submitted 12 November, 2021;
originally announced November 2021.
-
Simulation of Derivatives Post-Trade Services using an Authoritative Data Store and the ISDA Common Domain Model
Authors:
Vikram A. Bakshi,
Aishwarya Nair,
Lee Braine
Abstract:
In this paper, we present a summary of the design and implementation of a simulation of post-trade services for interest rate swaps, from execution to maturity. We use an authoritative data store (ADS) and the International Swaps and Derivatives Association (ISDA) Common Domain Model (CDM) to simulate a potential future architecture. We start by providing a brief overview of the CDM and the lifecy…
▽ More
In this paper, we present a summary of the design and implementation of a simulation of post-trade services for interest rate swaps, from execution to maturity. We use an authoritative data store (ADS) and the International Swaps and Derivatives Association (ISDA) Common Domain Model (CDM) to simulate a potential future architecture. We start by providing a brief overview of the CDM and the lifecycle of an interest rate swap. We then compare our simulated future state architecture with a typical current state architecture. Next, we present the key requirements of the simulated system, several suitable design patterns, and a summary of the implementation. The simulation uses the CDM to address the industry problems of inconsistent processes and inconsistent data, and an authoritative data store to address the industry problem of duplicated data.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Learning a Latent Simplex in Input-Sparsity Time
Authors:
Ainesh Bakshi,
Chiranjib Bhattacharyya,
Ravi Kannan,
David P. Woodruff,
Samson Zhou
Abstract:
We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$). A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models…
▽ More
We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$). A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models, and topic models can be cast as learning a latent simplex. Bhattacharyya and Kannan (SODA, 2020) give an algorithm for learning such a latent simplex in time roughly $O(k\cdot\textrm{nnz}(A))$, where $\textrm{nnz}(A)$ is the number of non-zeros in $A$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$, which holds in many of these applications. Further, we show this assumption is necessary, as otherwise an algorithm for learning a latent simplex would imply an algorithmic breakthrough for spectral low rank approximation.
At a high level, Bhattacharyya and Kannan provide an adaptive algorithm that makes $k$ matrix-vector product queries to $A$ and each query is a function of all queries preceding it. Since each matrix-vector product requires $\textrm{nnz}(A)$ time, their overall running time appears unavoidable. Instead, we obtain a low-rank approximation to $A$ in input-sparsity time and show that the column space thus obtained has small $\sinΘ$ (angular) distance to the right top-$k$ singular space of $A$. Our algorithm then selects $k$ points in the low-rank subspace with the largest inner product with $k$ carefully chosen random vectors. By working in the low-rank subspace, we avoid reading the entire matrix in each iteration and thus circumvent the $Θ(k\cdot\textrm{nnz}(A))$ running time.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Construction and commissioning of CMS CE prototype silicon modules
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
M. Alhusseini,
J. Alison,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
M. Andrews,
P. Aspell,
I. A. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
P. Bargassa,
D. Barney,
E. Becheva,
P. Behera,
A. Belloni
, et al. (307 additional authors not shown)
Abstract:
As part of its HL-LHC upgrade program, the CMS Collaboration is develo** a High Granularity Calorimeter (CE) to replace the existing endcap calorimeters. The CE is a sampling calorimeter with unprecedented transverse and longitudinal readout for both electromagnetic (CE-E) and hadronic (CE-H) compartments. The calorimeter will be built with $\sim$30,000 hexagonal silicon modules. Prototype modul…
▽ More
As part of its HL-LHC upgrade program, the CMS Collaboration is develo** a High Granularity Calorimeter (CE) to replace the existing endcap calorimeters. The CE is a sampling calorimeter with unprecedented transverse and longitudinal readout for both electromagnetic (CE-E) and hadronic (CE-H) compartments. The calorimeter will be built with $\sim$30,000 hexagonal silicon modules. Prototype modules have been constructed with 6-inch hexagonal silicon sensors with cell areas of 1.1~$cm^2$, and the SKIROC2-CMS readout ASIC. Beam tests of different sampling configurations were conducted with the prototype modules at DESY and CERN in 2017 and 2018. This paper describes the construction and commissioning of the CE calorimeter prototype, the silicon modules used in the construction, their basic performance, and the methods used for their calibration.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
The DAQ system of the 12,000 Channel CMS High Granularity Calorimeter Prototype
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
M. Alhusseini,
J. Alison,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
M. Andrews,
P. Aspell,
I. A. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
P. Bargassa,
D. Barney,
E. Becheva,
P. Behera,
A. Belloni
, et al. (307 additional authors not shown)
Abstract:
The CMS experiment at the CERN LHC will be upgraded to accommodate the 5-fold increase in the instantaneous luminosity expected at the High-Luminosity LHC (HL-LHC). Concomitant with this increase will be an increase in the number of interactions in each bunch crossing and a significant increase in the total ionising dose and fluence. One part of this upgrade is the replacement of the current endca…
▽ More
The CMS experiment at the CERN LHC will be upgraded to accommodate the 5-fold increase in the instantaneous luminosity expected at the High-Luminosity LHC (HL-LHC). Concomitant with this increase will be an increase in the number of interactions in each bunch crossing and a significant increase in the total ionising dose and fluence. One part of this upgrade is the replacement of the current endcap calorimeters with a high granularity sampling calorimeter equipped with silicon sensors, designed to manage the high collision rates. As part of the development of this calorimeter, a series of beam tests have been conducted with different sampling configurations using prototype segmented silicon detectors. In the most recent of these tests, conducted in late 2018 at the CERN SPS, the performance of a prototype calorimeter equipped with ${\approx}12,000\rm{~channels}$ of silicon sensors was studied with beams of high-energy electrons, pions and muons. This paper describes the custom-built scalable data acquisition system that was built with readily available FPGA mezzanines and low-cost Raspberry PI computers.
△ Less
Submitted 8 December, 2020; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Robustly Learning Mixtures of $k$ Arbitrary Gaussians
Authors:
Ainesh Bakshi,
Ilias Diakonikolas,
He Jia,
Daniel M. Kane,
Pravesh K. Kothari,
Santosh S. Vempala
Abstract:
We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mix…
▽ More
We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient \emph{partial clustering} algorithm that relies on the sum-of-squares method, and a novel \emph{tensor decomposition} algorithm that allows errors in both Frobenius norm and low-rank terms.
△ Less
Submitted 7 June, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Robust Linear Regression: Optimal Rates in Polynomial Time
Authors:
Ainesh Bakshi,
Adarsh Prasad
Abstract:
We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a $k$-hypercontractive distribution and an $ε$-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer…
▽ More
We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a $k$-hypercontractive distribution and an $ε$-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer for the true distribution at a rate proportional to $ε^{2-2/k}$, when the noise is independent of the covariates. We note that no such estimator was known prior to our work, even with access to unbounded computation. The rate we achieve is information-theoretically optimal and thus we resolve the main open question in Klivans, Kothari and Meka [COLT'18].
Our key insight is to identify an analytic condition that serves as a polynomial relaxation of independence of random variables. In particular, we show that when the moments of the noise and covariates are negatively-correlated, we obtain the same rate as independent noise. Further, when the condition is not satisfied, we obtain a rate proportional to $ε^{2-4/k}$, and again match the information-theoretic lower bound. Our central technical contribution is to algorithmically exploit independence of random variables in the "sum-of-squares" framework by formulating it as the aforementioned polynomial inequality.
△ Less
Submitted 4 December, 2020; v1 submitted 29 June, 2020;
originally announced July 2020.
-
Testing Positive Semi-Definiteness via Random Submatrices
Authors:
Ainesh Bakshi,
Nadiia Chepurko,
Rajesh Jayaram
Abstract:
We study the problem of testing whether a matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ with bounded entries ($\|\mathbf{A}\|_\infty \leq 1$) is positive semi-definite (PSD), or $ε$-far in Euclidean distance from the PSD cone, meaning that $\min_{\mathbf{B} \succeq 0} \|\mathbf{A} - \mathbf{B}\|_F^2 > εn^2$, where $\mathbf{B} \succeq 0$ denotes that $\mathbf{B}$ is PSD. Our main algorithmic cont…
▽ More
We study the problem of testing whether a matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ with bounded entries ($\|\mathbf{A}\|_\infty \leq 1$) is positive semi-definite (PSD), or $ε$-far in Euclidean distance from the PSD cone, meaning that $\min_{\mathbf{B} \succeq 0} \|\mathbf{A} - \mathbf{B}\|_F^2 > εn^2$, where $\mathbf{B} \succeq 0$ denotes that $\mathbf{B}$ is PSD. Our main algorithmic contribution is a non-adaptive tester which distinguishes between these cases using only $\tilde{O}(1/ε^4)$ queries to the entries of $\mathbf{A}$. If instead of the Euclidean norm we considered the distance in spectral norm, we obtain the "$\ell_\infty$-gap problem", where $\mathbf{A}$ is either PSD or satisfies $\min_{\mathbf{B}\succeq 0} \|\mathbf{A}- \mathbf{B}\|_2 > εn$. For this related problem, we give a $\tilde{O}(1/ε^2)$ query tester, which we show is optimal up to $\log(1/ε)$ factors. Our testers randomly sample a collection of principal submatrices and check whether these submatrices are PSD. Consequentially, our algorithms achieve one-sided error: whenever they output that $\mathbf{A}$ is not PSD, they return a certificate that $\mathbf{A}$ has negative eigenvalues.
We complement our upper bound for PSD testing with Euclidean norm distance by giving a $\tildeΩ(1/ε^2)$ lower bound for any non-adaptive algorithm. Our lower bound construction is general, and can be used to derive lower bounds for a number of spectral testing problems. As an example of the applicability of our construction, we obtain a new $\tildeΩ(1/ε^4)$ sampling lower bound for testing the Schatten-$1$ norm with a $εn^{1.5}$ gap, extending a result of Balcan, Li, Woodruff, and Zhang [SODA'19]. In addition, it yields new sampling lower bounds for estimating the Ky-Fan Norm, and the cost of the best rank-$k$ approximation.
△ Less
Submitted 17 September, 2020; v1 submitted 13 May, 2020;
originally announced May 2020.
-
Outlier-Robust Clustering of Non-Spherical Mixtures
Authors:
Ainesh Bakshi,
Pravesh Kothari
Abstract:
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $ε$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/η)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(ε+η)$ fraction of the points whenever every pair of mixture component…
▽ More
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $ε$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/η)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(ε+η)$ fraction of the points whenever every pair of mixture components are separated by $1-\exp(-\text{poly}(k/η)^k)$ in total variation (TV) distance. Such a result was not previously known even for $k=2$. TV separation is the statistically weakest possible notion of separation and captures important special cases such as mixed linear regression and subspace clustering.
Our main conceptual contribution is to distill simple analytic properties - (certifiable) hypercontractivity and bounded variance of degree 2 polynomials and anti-concentration of linear projections - that are necessary and sufficient for mixture models to be (efficiently) clusterable. As a consequence, our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere. Even the information-theoretic clusterability of separated distributions satisfying these two analytic assumptions was not known prior to our work and is likely to be of independent interest.
Our algorithms build on the recent sequence of works relying on certifiable anti-concentration first introduced in the works of Karmarkar, Klivans, and Kothari and Raghavendra, and Yau in 2019. Our techniques expand the sum-of-squares toolkit to show robust certifiability of TV-separated Gaussian clusters in data. This involves giving a low-degree sum-of-squares proof of statements that relate parameter (i.e. mean and covariances) distance to total variation distance by relying only on hypercontractivity and anti-concentration.
△ Less
Submitted 14 December, 2020; v1 submitted 6 May, 2020;
originally announced May 2020.
-
List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time
Authors:
Ainesh Bakshi,
Pravesh K. Kothari
Abstract:
In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two…
▽ More
In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two recent independent works (Raghavendra-Yau, Bakshi-Kothari 2020) gave the first efficient algorithm for this problem. These results, however, obtain an error that grows with the dimension (linearly in [RY] and logarithmically in BK) at the cost of quasi-polynomial running time) and rely on \emph{certifiable anti-concentration} - a relatively strict condition satisfied essentially only by the Gaussian distribution.
In this work, we improve on these results on all three fronts: \emph{dimension-independent} error via a faster fixed-polynomial running time under less restrictive distributional assumptions. Specifically, we give a $poly(1/α) d^{O(1)}$ time algorithm that outputs a list containing a $\hatΠ$ satisfying $\|\hatΠ -Π_*\|_F \leq O(1/α)$. Our result only needs $\mathcal{D}$ to have \emph{certifiably hypercontractive} degree 2 polynomials. As a result, in addition to Gaussians, our algorithm applies to the uniform distribution on the hypercube and $q$-ary cubes and arbitrary product distributions with subgaussian marginals. Prior work (Raghavendra and Yau, 2020) had identified such distributions as potential hard examples as such distributions do not exhibit strong enough anti-concentration. When $\mathcal{D}$ satisfies certifiable anti-concentration, we obtain a stronger error guarantee of $\|\hatΠ-Π_*\|_F \leq η$ for any arbitrary $η> 0$ in $d^{O(poly(1/α) + \log (1/η))}$ time.
△ Less
Submitted 7 January, 2021; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Robust and Sample Optimal Algorithms for PSD Low-Rank Approximation
Authors:
Ainesh Bakshi,
Nadiia Chepurko,
David P. Woodruff
Abstract:
Recently, Musco and Woodruff (FOCS, 2017) showed that given an $n \times n$ positive semidefinite (PSD) matrix $A$, it is possible to compute a $(1+ε)$-approximate relative-error low-rank approximation to $A$ by querying $O(nk/ε^{2.5})$ entries of $A$ in time $O(nk/ε^{2.5} +n k^{ω-1}/ε^{2(ω-1)})$. They also showed that any relative-error low-rank approximation algorithm must query $Ω(nk/ε)$ entrie…
▽ More
Recently, Musco and Woodruff (FOCS, 2017) showed that given an $n \times n$ positive semidefinite (PSD) matrix $A$, it is possible to compute a $(1+ε)$-approximate relative-error low-rank approximation to $A$ by querying $O(nk/ε^{2.5})$ entries of $A$ in time $O(nk/ε^{2.5} +n k^{ω-1}/ε^{2(ω-1)})$. They also showed that any relative-error low-rank approximation algorithm must query $Ω(nk/ε)$ entries of $A$, this gap has since remained open. Our main result is to resolve this question by obtaining an optimal algorithm that queries $O(nk/ε)$ entries of $A$ and outputs a relative-error low-rank approximation in $O(n(k/ε)^{ω-1})$ time. Note, our running time improves that of Musco and Woodruff, and matches the information-theoretic lower bound if the matrix-multiplication exponent $ω$ is $2$.
We then extend our techniques to negative-type distance matrices. Bakshi and Woodruff (NeurIPS, 2018) showed a bi-criteria, relative-error low-rank approximation which queries $O(nk/ε^{2.5})$ entries and outputs a rank-$(k+4)$ matrix. We show that the bi-criteria guarantee is not necessary and obtain an $O(nk/ε)$ query algorithm, which is optimal. Our algorithm applies to all distance matrices that arise from metrics satisfying negative-type inequalities, including $\ell_1, \ell_2,$ spherical metrics and hypermetrics.
Next, we introduce a new robust low-rank approximation model which captures PSD matrices that have been corrupted with noise. While a sample complexity lower bound precludes sublinear algorithms for arbitrary PSD matrices, we provide the first sublinear time and query algorithms when the corruption on the diagonal entries is bounded. As a special case, we show sample-optimal sublinear time algorithms for low-rank approximation of correlation matrices corrupted by noise.
△ Less
Submitted 15 June, 2021; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams
Authors:
Ainesh Bakshi,
Nadiia Chepurko,
David P. Woodruff
Abstract:
We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let $α$ be the cardinality of the largest independent set. Our goal is to estimate $α$ in a small amount of space, given that…
▽ More
We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let $α$ be the cardinality of the largest independent set. Our goal is to estimate $α$ in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating $β$, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating $α$ and $β$.
For unit-length intervals, we obtain a $(2+ε)$-approximation to $α$ and $β$ in poly$(\frac{\log(n)}ε)$ space. We also show a matching lower bound. Combined with the $3/2$-approximation for insertion-only streams by Cabello and Perez-Lanterno [CP15], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a $\left(\frac{8\sqrt{3}}π\right)$-approximation to $α$ and $β$ in poly$(\log(n), ε^{-1})$ space, which is closely related to the hexagonal circle packing constant.
We provide algorithms for estimating $α$ for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating $α$ and $β$, where the parameter is the ratio of maximum to minimum interval length.
△ Less
Submitted 24 March, 2020; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Standard Model Physics at the HL-LHC and HE-LHC
Authors:
P. Azzi,
S. Farry,
P. Nason,
A. Tricoli,
D. Zeppenfeld,
R. Abdul Khalek,
J. Alimena,
N. Andari,
L. Aperio Bella,
A. J. Armbruster,
J. Baglio,
S. Bailey,
E. Bakos,
A. Bakshi,
C. Baldenegro,
F. Balli,
A. Barker,
W. Barter,
J. de Blas,
F. Blekman,
D. Bloch,
A. Bodek,
M. Boonekamp,
E. Boos,
J. D. Bossio Sola
, et al. (201 additional authors not shown)
Abstract:
The successful operation of the Large Hadron Collider (LHC) and the excellent performance of the ATLAS, CMS, LHCb and ALICE detectors in Run-1 and Run-2 with $pp$ collisions at center-of-mass energies of 7, 8 and 13 TeV as well as the giant leap in precision calculations and modeling of fundamental interactions at hadron colliders have allowed an extraordinary breadth of physics studies including…
▽ More
The successful operation of the Large Hadron Collider (LHC) and the excellent performance of the ATLAS, CMS, LHCb and ALICE detectors in Run-1 and Run-2 with $pp$ collisions at center-of-mass energies of 7, 8 and 13 TeV as well as the giant leap in precision calculations and modeling of fundamental interactions at hadron colliders have allowed an extraordinary breadth of physics studies including precision measurements of a variety physics processes. The LHC results have so far confirmed the validity of the Standard Model of particle physics up to unprecedented energy scales and with great precision in the sectors of strong and electroweak interactions as well as flavour physics, for instance in top quark physics. The upgrade of the LHC to a High Luminosity phase (HL-LHC) at 14 TeV center-of-mass energy with 3 ab$^{-1}$ of integrated luminosity will probe the Standard Model with even greater precision and will extend the sensitivity to possible anomalies in the Standard Model, thanks to a ten-fold larger data set, upgraded detectors and expected improvements in the theoretical understanding. This document summarises the physics reach of the HL-LHC in the realm of strong and electroweak interactions and top quark physics, and provides a glimpse of the potential of a possible further upgrade of the LHC to a 27 TeV $pp$ collider, the High-Energy LHC (HE-LHC), assumed to accumulate an integrated luminosity of 15 ab$^{-1}$.
△ Less
Submitted 20 December, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Learning Two Layer Rectified Neural Networks in Polynomial Time
Authors:
Ainesh Bakshi,
Rajesh Jayaram,
David P. Woodruff
Abstract:
Consider the following fundamental learning problem: given input examples $x \in \mathbb{R}^d$ and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, map** $\mathbb{R}^d$ to $\mathbb{R}^m$, with $k$ non-linear activation units $f(\cdot)$, where $f(x) = \max \{x , 0\}$ is the ReLU. Such a…
▽ More
Consider the following fundamental learning problem: given input examples $x \in \mathbb{R}^d$ and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, map** $\mathbb{R}^d$ to $\mathbb{R}^m$, with $k$ non-linear activation units $f(\cdot)$, where $f(x) = \max \{x , 0\}$ is the ReLU. Such a network is specified by two weight matrices, $\mathbf{U}^* \in \mathbb{R}^{m \times k}, \mathbf{V}^* \in \mathbb{R}^{k \times d}$, such that the label of an example $x \in \mathbb{R}^{d}$ is given by $\mathbf{U}^* f(\mathbf{V}^* x)$, where $f(\cdot)$ is applied coordinate-wise. Given $n$ samples as a matrix $\mathbf{X} \in \mathbb{R}^{d \times n}$ and the (possibly noisy) labels $\mathbf{U}^* f(\mathbf{V}^* \mathbf{X}) + \mathbf{E}$ of the network on these samples, where $\mathbf{E}$ is a noise matrix, our goal is to recover the weight matrices $\mathbf{U}^*$ and $\mathbf{V}^*$.
In this work, we develop algorithms and hardness results under varying assumptions on the input and noise. Although the problem is NP-hard even for $k=2$, by assuming Gaussian marginals over the input $\mathbf{X}$ we are able to develop polynomial time algorithms for the approximate recovery of $\mathbf{U}^*$ and $\mathbf{V}^*$. Perhaps surprisingly, in the noiseless case our algorithms recover $\mathbf{U}^*,\mathbf{V}^*$ exactly, i.e., with no error. To the best of the our knowledge, this is the first algorithm to accomplish exact recovery. For the noisy case, we give the first polynomial time algorithm that approximately recovers the weights in the presence of mean-zero noise $\mathbf{E}$. Our algorithms generalize to a larger class of rectified activation functions, $f(x) = 0$ when $x\leq 0$, and $f(x) > 0$ otherwise.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Sublinear Time Low-Rank Approximation of Distance Matrices
Authors:
Ainesh Bakshi,
David P. Woodruff
Abstract:
Let $\mathbf{P}=\{ p_1, p_2, \ldots p_n \}$ and $\mathbf{Q} = \{ q_1, q_2 \ldots q_m \}$ be two point sets in an arbitrary metric space. Let $\mathbf{A}$ represent the $m\times n$ pairwise distance matrix with $\mathbf{A}_{i,j} = d(p_i, q_j)$. Such distance matrices are commonly computed in software packages and have applications to learning image manifolds, handwriting recognition, and multi-dime…
▽ More
Let $\mathbf{P}=\{ p_1, p_2, \ldots p_n \}$ and $\mathbf{Q} = \{ q_1, q_2 \ldots q_m \}$ be two point sets in an arbitrary metric space. Let $\mathbf{A}$ represent the $m\times n$ pairwise distance matrix with $\mathbf{A}_{i,j} = d(p_i, q_j)$. Such distance matrices are commonly computed in software packages and have applications to learning image manifolds, handwriting recognition, and multi-dimensional unfolding, among other things. In an attempt to reduce their description size, we study low rank approximation of such matrices. Our main result is to show that for any underlying distance metric $d$, it is possible to achieve an additive error low-rank approximation in sublinear time. We note that it is provably impossible to achieve such a guarantee in sublinear time for arbitrary matrices $\mathbf{A}$, and consequently our proof exploits special properties of distance matrices. We develop a recursive algorithm based on additive projection-cost preserving sampling. We then show that in general, relative error approximation in sublinear time is impossible for distance matrices, even if one allows for bicriteria solutions. Additionally, we show that if $\mathbf{P} = \mathbf{Q}$ and $d$ is the squared Euclidean distance, which is not a metric but rather the square of a metric, then a relative error bicriteria solution can be found in sublinear time.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science
Authors:
Mayank Singh,
Abhishek Niranjan,
Divyansh Gupta,
Nikhil Angad Bakshi,
Animesh Mukherjee,
Pawan Goyal
Abstract:
Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citation sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sente…
▽ More
Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citation sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sentences is a more nuanced micro-scale phenomenon observed even for well-known researchers. The current work poses several interesting questions and attempts to answer them by empirically investigating a large bibliographic text dataset from computer science containing millions of lines of citation sentences. In particular, we report evidences of massive copying behavior. We also present several striking real examples throughout the paper to showcase widespread adoption of this undesirable practice. In contrast to the popular perception, we find that copying tendency increases as an author matures. The copying behavior is reported to exist in all fields of computer science; however, the theoretical fields indicate more copying than the applied fields.
△ Less
Submitted 6 May, 2017;
originally announced May 2017.
-
Robust Communication-Optimal Distributed Clustering Algorithms
Authors:
Pranjal Awasthi,
Ainesh Bakshi,
Maria-Florina Balcan,
Colin White,
David Woodruff
Abstract:
In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer understanding through the lens of beyond worst-case analysis. Our main motivation is the following: for many applications such as clustering prote…
▽ More
In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer understanding through the lens of beyond worst-case analysis. Our main motivation is the following: for many applications such as clustering proteins by function or clustering communities in a social network, there is some unknown target clustering, and the hope is that running a $k$-median or $k$-means algorithm will produce clusterings which are close to matching the target clustering. Worst-case results can guarantee constant factor approximations to the optimal $k$-median or $k$-means objective value, but not closeness to the target clustering.
Our first result is a distributed algorithm which returns a near-optimal clustering assuming a natural notion of stability, namely, approximation stability [Balcan et. al 2013], even when a constant fraction of the data are outliers. The communication complexity is $\tilde O(sk+z)$ where $s$ is the number of machines, $k$ is the number of clusters, and $z$ is the number of outliers.
Next, we show this amount of communication cannot be improved even in the setting when the input satisfies various non-worst-case assumptions. We give a matching $Ω(sk+z)$ lower bound on the communication required both for approximating the optimal $k$-means or $k$-median cost up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance. These lower bounds hold even when the data satisfies approximation stability or other common notions of stability, and the cluster sizes are balanced. Therefore, $Ω(sk+z)$ is a communication bottleneck, even for real-world instances.
△ Less
Submitted 6 March, 2019; v1 submitted 2 March, 2017;
originally announced March 2017.
-
Smart Contract Templates: essential requirements and design options
Authors:
Christopher D. Clack,
Vikram A. Bakshi,
Lee Braine
Abstract:
Smart Contract Templates support legally-enforceable smart contracts, using operational parameters to connect legal agreements to standardised code. In this paper, we explore the design landscape of potential formats for storage and transmission of smart legal agreements. We identify essential requirements and describe a number of key design options, from which we envisage future development of st…
▽ More
Smart Contract Templates support legally-enforceable smart contracts, using operational parameters to connect legal agreements to standardised code. In this paper, we explore the design landscape of potential formats for storage and transmission of smart legal agreements. We identify essential requirements and describe a number of key design options, from which we envisage future development of standardised formats for defining and manipulating smart legal agreements. This provides a preliminary step towards supporting industry adoption of legally-enforceable smart contracts.
△ Less
Submitted 15 December, 2016; v1 submitted 14 December, 2016;
originally announced December 2016.
-
Smart Contract Templates: foundations, design landscape and research directions
Authors:
Christopher D. Clack,
Vikram A. Bakshi,
Lee Braine
Abstract:
In this position paper, we consider some foundational topics regarding smart contracts (such as terminology, automation, enforceability, and semantics) and define a smart contract as an automatable and enforceable agreement. We explore a simple semantic framework for smart contracts, covering both operational and non-operational aspects, and describe templates and agreements for legally-enforceabl…
▽ More
In this position paper, we consider some foundational topics regarding smart contracts (such as terminology, automation, enforceability, and semantics) and define a smart contract as an automatable and enforceable agreement. We explore a simple semantic framework for smart contracts, covering both operational and non-operational aspects, and describe templates and agreements for legally-enforceable smart contracts, based on legal documents. Building upon the Ricardian Contract, we identify operational parameters in the legal documents and use these to connect legal agreements to standardised code. We also explore the design landscape, including increasing sophistication of parameters, increasing use of common standardised code, and long-term research.
△ Less
Submitted 15 March, 2017; v1 submitted 2 August, 2016;
originally announced August 2016.
-
Gravitational surface Hamiltonian and entropy quantization
Authors:
Ashish Bakshi,
Bibhas Ranjan Majhi,
Saurav Samanta
Abstract:
The surface Hamiltonian corresponding to the surface part of a gravitational action has $xp$ structure where $p$ is conjugate momentum of $x$. Moreover, it leads to $TS$ on the horizon of a black hole. Here $T$ and $S$ are temperature and entropy of the horizon. Imposing the hermiticity condition we quantize this Hamiltonian. This leads to an equidistant spectrum of its eigenvalues. Using this we…
▽ More
The surface Hamiltonian corresponding to the surface part of a gravitational action has $xp$ structure where $p$ is conjugate momentum of $x$. Moreover, it leads to $TS$ on the horizon of a black hole. Here $T$ and $S$ are temperature and entropy of the horizon. Imposing the hermiticity condition we quantize this Hamiltonian. This leads to an equidistant spectrum of its eigenvalues. Using this we show that the entropy of the horizon is quantized. This analysis holds for any order of Lanczos-Lovelock gravity. For general relativity, the area spectrum is consistent with Bekenstein's observation. This provides a more robust confirmation of this earlier result as the calculation is based on the direct quantization of the Hamiltonian in the sense of usual quantum mechanics.
△ Less
Submitted 28 December, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.
-
Polynomial Time Algorithm for $2$-Stable Clustering Instances
Authors:
Ainesh Bakshi,
Nadiia Chepurko
Abstract:
Clustering with most objective functions is NP-Hard, even to approximate well in the worst case. Recently, there has been work on exploring different notions of stability which lend structure to the problem. The notion of stability, $α$-perturbation resilience, that we study in this paper was originally introduced by Bilu et al.~\cite{Bilu10}. The works of Awasthi et al~\cite{Awasthi12} and Balcan…
▽ More
Clustering with most objective functions is NP-Hard, even to approximate well in the worst case. Recently, there has been work on exploring different notions of stability which lend structure to the problem. The notion of stability, $α$-perturbation resilience, that we study in this paper was originally introduced by Bilu et al.~\cite{Bilu10}. The works of Awasthi et al~\cite{Awasthi12} and Balcan et al.~\cite{Balcan12} provide a polynomial time algorithm for $3$-stable and $(1+\sqrt{2})$-stable instances respectively. This paper provides a polynomial time algorithm for $2$-stable instances, improving on and answering an open question in ~\cite{Balcan12}.
△ Less
Submitted 11 February, 2017; v1 submitted 25 July, 2016;
originally announced July 2016.
-
A Novel Feature Selection and Extraction Technique for Classification
Authors:
Kratarth Goel,
Raunaq Vohra,
Ainesh Bakshi
Abstract:
This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to handwritten digit recognition and text categorizat…
▽ More
This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to handwritten digit recognition and text categorization.
△ Less
Submitted 26 December, 2014;
originally announced December 2014.
-
Comments on the Refractive Index of Tin Sulphide Nano-crystalline Thin Films
Authors:
Amit Jakhar,
Ashu Jamdagni,
Ayushi Bakshi,
Taruna Verma,
Vibhav Shukla,
Priyal Jain,
Nidhi Sinha,
P Arun
Abstract:
The refractive indices of nano-crystalline thin films of Tin (IV) Sulphide (SnS) were investigated here. The experimental data conformed well with the single oscillator model for refractive indices. Based on the this, we explain the increasing trend of refractive index to the improvement in crystal ordering with increasing grain size.
The refractive indices of nano-crystalline thin films of Tin (IV) Sulphide (SnS) were investigated here. The experimental data conformed well with the single oscillator model for refractive indices. Based on the this, we explain the increasing trend of refractive index to the improvement in crystal ordering with increasing grain size.
△ Less
Submitted 16 February, 2013;
originally announced February 2013.