Search | arXiv e-print repository

Technical design report for the CODEX-$β$ demonstrator

Authors: CODEX-b collaboration, :, Giulio Aielli, Juliette Alimena, James Beacham, Eli Ben Haim, Andras Burucs, Roberto Cardarelli, Matthew Charles, Xabier Cid Vidal, Albert De Roeck, Biplab Dey, Silviu Dobrescu, Ozgur Durmus, Mohamed Elashri, Vladimir Gligorov, Rebeca Gonzalez Suarez, Thomas Gorordo, Zarria Gray, Conor Henderson, Louis Henry, Philip Ilten, Daniel Johnson, Jacob Kautz, Simon Knapen , et al. (28 additional authors not shown)

Abstract: The CODEX-$β$ apparatus is a demonstrator for the proposed future CODEX-b experiment, a long-lived-particle detector foreseen for operation at IP8 during HL-LHC data-taking. The demonstrator project, intended to collect data in 2025, is described, with a particular focus on the design, construction, and installation of the new apparatus. The CODEX-$β$ apparatus is a demonstrator for the proposed future CODEX-b experiment, a long-lived-particle detector foreseen for operation at IP8 during HL-LHC data-taking. The demonstrator project, intended to collect data in 2025, is described, with a particular focus on the design, construction, and installation of the new apparatus. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.00204 [pdf]

Learning from metastable grain boundaries

Authors: Avanish Mishra, Sumit A. Suresh, Saryu J. Fensin, Nithin Mathew, Edward M. Kober

Abstract: Grain boundaries (GBs) govern critical properties of polycrystals. Although significant advancements have been made in characterizing minimum energy GBs, real GBs are seldom found in such states, making it challenging to establish structure-property relationships. This diversity of atomic arrangements in metastable states motivates using data-driven methods to establish these relationships. In thi… ▽ More Grain boundaries (GBs) govern critical properties of polycrystals. Although significant advancements have been made in characterizing minimum energy GBs, real GBs are seldom found in such states, making it challenging to establish structure-property relationships. This diversity of atomic arrangements in metastable states motivates using data-driven methods to establish these relationships. In this study, we utilize a vast atomistic database (~5000) of minimum energy and metastable states of symmetric tilt copper GBs, combined with physically-motivated local atomic environment (LAE) descriptors (Strain Functional Descriptors, SFDs) to predict GB properties. Our regression models exhibit robust predictive capabilities using only 19 descriptors, generalizing to atomic environments in nanocrystals. A significant highlight of our work is integration of an unsupervised method with SFDs to elucidate LAEs at GBs and their role in determining properties. Our research underscores the role of a physics-based representation of LAEs and efficacy of data-driven methods in establishing GB structure-property relationships. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.12403 [pdf, other]

Searching for gravitational wave optical counterparts with the Zwicky Transient Facility: summary of O4a

Authors: Tomás Ahumada, Shreya Anand, Michael W. Coughlin, Vaidehi Gupta, Mansi M. Kasliwal, Viraj R. Karambelkar, Robert D. Stein, Gaurav Waratkar, Vishwajeet Swain, Theophile Jegou du Laz, Akash Anumarlapudi, Igor Andreoni, Mattia Bulla, Gokul P. Srinivasaragavan, Andrew Toivonen, Avery Wold, Eric C. Bellm, S. Bradley Cenko, David L. Kaplan, Jesper Sollerman, Varun Bhalerao, Daniel Perley, Anirudh Salgundi, Aswin Suresh, K-Ryan Hinds , et al. (27 additional authors not shown)

Abstract: During the first half of the fourth observing run (O4a) of the International Gravitational Wave Network (IGWN), the Zwicky Transient Facility (ZTF) conducted a systematic search for kilonova (KN) counterparts to binary neutron star (BNS) and neutron star-black hole (NSBH) merger candidates. Here, we present a comprehensive study of the five high-significance (FAR < 1 per year) BNS and NSBH candida… ▽ More During the first half of the fourth observing run (O4a) of the International Gravitational Wave Network (IGWN), the Zwicky Transient Facility (ZTF) conducted a systematic search for kilonova (KN) counterparts to binary neutron star (BNS) and neutron star-black hole (NSBH) merger candidates. Here, we present a comprehensive study of the five high-significance (FAR < 1 per year) BNS and NSBH candidates in O4a. Our follow-up campaigns relied on both target-of-opportunity observations (ToO) and re-weighting of the nominal survey schedule to maximize coverage. We describe the toolkit we have been develo**, Fritz, an instance of SkyPortal, instrumental in coordinating and managing our telescope scheduling, candidate vetting, and follow-up observations through a user-friendly interface. ZTF covered a total of 2841 deg$^2$ within the skymaps of the high-significance GW events, reaching a median depth of g~20.2 mag. We circulated 15 candidates, but found no viable KN counterpart to any of the GW events. Based on the ZTF non-detections of the high-significance events in O4a, we used a Bayesian approach, nimbus, to quantify the posterior probability of KN model parameters that are consistent with our non-detections. Our analysis favors KNe with initial absolute magnitude fainter than -16 mag. The joint posterior probability of a GW170817-like KN associated with all our O4a follow-ups was 64%. Additionally, we use a survey simulation software, simsurvey, to determine that our combined filtered efficiency to detect a GW170817-like KN is 36%, when considering the 5 confirmed astrophysical events in O3 (1 BNS and 4 NSBH), along with our O4a follow-ups. Following Kasliwal et al. (2020), we derived joint constraints on the underlying KN luminosity function based on our O3 and O4a follow-ups, determining that no more than 76% of KNe fading at 1 mag/day can peak at a magnitude brighter than -17.5 mag. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: submitted

arXiv:2405.04354 [pdf, other]

A transversality theorem for semi-algebraic sets with application to signal recovery from the second moment and cryo-EM

Authors: Tamir Bendory, Nadav Dym, Dan Edidin, Arun Suresh

Abstract: Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-al… ▽ More Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits. As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning. △ Less

Submitted 10 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.11607 [pdf, other]

Private federated discovery of out-of-vocabulary words for Gboard

Authors: Ziteng Sun, Peter Kairouz, Haicheng Sun, Adria Gascon, Ananda Theertha Suresh

Abstract: The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for G… ▽ More The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States). △ Less

Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09221 [pdf, other]

Exploring and Improving Drafts in Blockwise Parallel Decoding

Authors: Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

Abstract: Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verifie… ▽ More Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified and conditionally accepted by the autoregressive model. This paper contributes to the understanding and improvement of block drafts in two ways. First, we analyze the token distributions produced by multiple prediction heads. Secondly, we leverage this analysis to develop algorithms to improve BPD inference speed by refining the block drafts using n-gram and neural language models. Experiments demonstrate that refined block drafts yield a +5-21% increase in block efficiency (i.e., the number of accepted tokens from the block draft) across diverse datasets. △ Less

Submitted 5 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.04780 [pdf, other]

Radio AGN Activity in Low Redshift Galaxies is Not Directly Related to Star Formation Rates

Authors: Arjun Suresh, Michael R. Blanton

Abstract: We examine the demographics of radio-emitting active galactic nuclei (AGN) in the local universe as a function of host galaxy properties, most notably both stellar mass and star formation rate. Radio AGN activity is theoretically implicated in hel** reduce star formation rates of galaxies, and therefore it is natural to investigate the relationship between these two galaxy properties. We use a s… ▽ More We examine the demographics of radio-emitting active galactic nuclei (AGN) in the local universe as a function of host galaxy properties, most notably both stellar mass and star formation rate. Radio AGN activity is theoretically implicated in hel** reduce star formation rates of galaxies, and therefore it is natural to investigate the relationship between these two galaxy properties. We use a sample of around 10, 000 galaxies from the Map** Nearby Galaxies at APO (MaNGA) survey, part of the Sloan Digital Sky Survey IV (SDSS-IV), along with the Faint Images of the Radio Sky at Twenty centimeters (FIRST) radio survey and the National Radio Astronomy Observatory (NRAO) Very Large Array (VLA) Sky Survey (NVSS). There are 1,126 galaxies in MaNGA with radio detections. Using star formation rate and stellar mass estimates based on Pipe3D, inferred from the high signal-to-noise ratio measurements from MaNGA, we show that star formation rates are strongly correlated with 20 cm radio emission, as expected. We identify as radio AGN those radio emitters that are much stronger than expected from the star formation rate. Using this sample of AGN, the well-measured stellar velocity dispersions from MaNGA, and the black hole M-sigma relationship, we examine the Eddington ratio distribution and its dependence on stellar mass and star formation rate. We find that the Eddington ratio distribution depends strongly on stellar mass, with more massive galaxies having larger Eddington ratios. As found in previous studies, the AGN fraction increases rapidly with stellar mass. We do not find any dependence on star formation rate, specific star formation rate, or velocity dispersion when controlling for stellar mass. We conclude that galaxy star formation rates appear to be unrelated to the presence or absence of a radio AGN, which may be useful in constraining theoretical models of AGN feedback. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.01730 [pdf, other]

Asymptotics of Language Model Alignment

Authors: Joy Qi** Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

Abstract: Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while kee** $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which choo… ▽ More Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while kee** $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $φ_Δ$ that maximizes $E_{φ_Δ} r(y)$ subject to a relative entropy constraint $KL(φ_Δ|| p) \leq Δ.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected. In this paper, we offer a closed-form characterization of the optimal KL-constrained RL solution. We demonstrate that any alignment method that achieves a comparable trade-off between KL divergence and reward must approximate the optimal KL-constrained RL solution in terms of relative entropy. To further analyze the properties of alignment methods, we introduce two simplifying assumptions: we let the language model be memoryless, and the reward model be linear. Although these assumptions may not reflect complex real-world scenarios, they enable a precise characterization of the asymptotic behavior of both the best-of-$N$ alignment, and the KL-constrained RL method, in terms of information-theoretic quantities. We prove that the reward of the optimal KL-constrained RL solution satisfies a large deviation principle, and we fully characterize its rate function. We also show that the rate of growth of the scaled cumulants of the reward is characterized by a proper Renyi cross entropy. Finally, we show that best-of-$N$ is asymptotically equivalent to KL-constrained RL solution by proving that their expected rewards are asymptotically equal, and concluding that the two distributions must be close in KL divergence. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.10444 [pdf, other]

Optimal Block-Level Draft Verification for Accelerating Speculative Decoding

Authors: Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh

Abstract: Speculative decoding has shown to be an effective method for lossless acceleration of large language models (LLMs) during inference. In each iteration, the algorithm first uses a smaller model to draft a block of tokens. The tokens are then verified by the large model in parallel and only a subset of tokens will be kept to guarantee that the final output follows the distribution of the large model… ▽ More Speculative decoding has shown to be an effective method for lossless acceleration of large language models (LLMs) during inference. In each iteration, the algorithm first uses a smaller model to draft a block of tokens. The tokens are then verified by the large model in parallel and only a subset of tokens will be kept to guarantee that the final output follows the distribution of the large model. In all of the prior speculative decoding works, the draft verification is performed token-by-token independently. In this work, we propose a better draft verification algorithm that provides additional wall-clock speedup without incurring additional computation cost and draft tokens. We first formulate the draft verification step as a block-level optimal transport problem. The block-level formulation allows us to consider a wider range of draft verification algorithms and obtain a higher number of accepted tokens in expectation in one draft block. We propose a verification algorithm that achieves the optimal accepted length for the block-level transport problem. We empirically evaluate our proposed block-level verification algorithm in a wide range of tasks and datasets, and observe consistent improvements in wall-clock speedup when compared to token-level verification algorithm. To the best of our knowledge, our work is the first to establish improvement over speculative decoding through a better draft verification algorithm. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.08100 [pdf, other]

Efficient Language Model Architectures for Differentially Private Federated Learning

Authors: Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh

Abstract: Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices. SGD is the standard client optimizer for on device training in cross-device FL, favored for its memory and computational efficiency. However, in centralized training of neural language models, adaptive optimizers are preferred as th… ▽ More Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices. SGD is the standard client optimizer for on device training in cross-device FL, favored for its memory and computational efficiency. However, in centralized training of neural language models, adaptive optimizers are preferred as they offer improved stability and performance. In light of this, we ask if language models can be modified such that they can be efficiently trained with SGD client optimizers and answer this affirmatively. We propose a scale-invariant Coupled Input Forget Gate (SI CIFG) recurrent network by modifying the sigmoid and tanh activations in the recurrent cell and show that this new model converges faster and achieves better utility than the standard CIFG recurrent model in cross-device FL in large scale experiments. We further show that the proposed scale invariant modification also helps in federated learning of larger transformer models. Finally, we demonstrate the scale invariant modification is also compatible with other non-adaptive algorithms. Particularly, our results suggest an improved privacy utility trade-off in federated learning with differential privacy. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.10161 [pdf, other]

Robotic Exploration using Generalized Behavioral Entropy

Authors: Aamodh Suresh, Carlos Nieto-Granda, Sonia Martinez

Abstract: This work presents and evaluates a novel strategy for robotic exploration that leverages human models of uncertainty perception. To do this, we introduce a measure of uncertainty that we term ``Behavioral entropy'', which builds on Prelec's probability weighting from Behavioral Economics. We show that the new operator is an admissible generalized entropy, analyze its theoretical properties and com… ▽ More This work presents and evaluates a novel strategy for robotic exploration that leverages human models of uncertainty perception. To do this, we introduce a measure of uncertainty that we term ``Behavioral entropy'', which builds on Prelec's probability weighting from Behavioral Economics. We show that the new operator is an admissible generalized entropy, analyze its theoretical properties and compare it with other common formulations such as Shannon's and Renyi's. In particular, we discuss how the new formulation is more expressive in the sense of measures of sensitivity and perceptiveness to uncertainty introduced here. Then we use Behavioral entropy to define a new type of utility function that can guide a frontier-based environment exploration process. The approach's benefits are illustrated and compared in a Proof-of-Concept and ROS-unity simulation environment with a Clearpath Warthog robot. We show that the robot equipped with Behavioral entropy explores faster than Shannon and Renyi entropies. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08000 [pdf, other]

An Automated Catalog of Long Period Variables using Infrared Lightcurves from Palomar Gattini-IR

Authors: Aswin Suresh, Viraj Karambelkar, Mansi M. Kasliwal, Michael C. B. Ashley, Kishalay De, Matthew J. Hankins, Anna M. Moore, Jamie Soon, Roberto Soria, Tony Travouillon, Kayton K. Truong

Abstract: Stars in the Asymptotic Giant Branch (AGB) phase, dominated by low to intermediate-mass stars in the late stage of evolution, undergo periodic pulsations, with periods of several hundred days, earning them the name Long Period Variables (LPVs). These stars gradually shed their mass through stellar winds and mass ejections, envelo** themselves in dust. Infrared (IR) surveys can probe these dust-e… ▽ More Stars in the Asymptotic Giant Branch (AGB) phase, dominated by low to intermediate-mass stars in the late stage of evolution, undergo periodic pulsations, with periods of several hundred days, earning them the name Long Period Variables (LPVs). These stars gradually shed their mass through stellar winds and mass ejections, envelo** themselves in dust. Infrared (IR) surveys can probe these dust-enshrouded phases and uncover populations of LPV stars in the Milky Way. In this paper, we present a catalog of 159,696 Long Period Variables using near-IR lightcurves from the Palomar Gattini - IR (PGIR) survey. PGIR has been surveying the entire accessible northern sky ($δ> -28^{\circ}$) in the J-band at a cadence of 2-3 days since September 2018, and has produced J-band lightcurves for more than 60 million sources. We used a gradient-boosted decision tree classifier trained on a comprehensive feature set extracted from PGIR lightcurves to search for LPVs in this dataset. We developed a parallelized and optimized code to extract features at a rate of ~0.1 seconds per lightcurve. Our model can successfully distinguish LPVs from other stars with a true positive rate and weighted g-mean of 0.95. 73,346 (~46%) of the sources in our catalog are new, previously unknown LPVs. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 15 pages, 15 figures

arXiv:2401.01879 [pdf, other]

Theoretical guarantees on the best-of-n alignment policy

Authors: Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

Abstract: A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We di… ▽ More A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. We also explore the tightness of this upper bound in different regimes. Finally, we propose a new estimator for the KL divergence and empirically show that it provides a tight approximation through a few examples. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.06658 [pdf, other]

Mean estimation in the add-remove model of differential privacy

Authors: Alex Kulesza, Ananda Theertha Suresh, Yuyan Wang

Abstract: Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estim… ▽ More Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estimation under the add-remove model. We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all $ε$, and that this constant is the same as the optimal algorithm under the swap model. These results show that the add-remove and swap models give nearly identical errors for mean estimation, even though the add-remove model cannot treat the size of the dataset as public information. We also demonstrate empirically that our proposed algorithm yields at least a factor of two improvement in mean squared error over algorithms frequently used in practice. One of our main technical contributions is a new hour-glass mechanism, which might be of independent interest in other scenarios. △ Less

Submitted 19 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03867 [pdf, other]

doi 10.1109/JSAIT.2024.3397741

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

Authors: Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

Abstract: Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estim… ▽ More Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estimating the worst-case performance gap across groups (e.g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes. To address this issue, we propose an approach to test for performance disparities based on Conditional Value-at-Risk (CVaR). By allowing a small probabilistic slack on the groups over which a model has approximately equal performance, we show that the sample complexity required for discovering performance violations is reduced exponentially to be at most upper bounded by the square root of the number of groups. As a byproduct of our analysis, when the groups are weighted by a specific prior distribution, we show that Rényi entropy of order 2/3 of the prior distribution captures the sample complexity of the proposed CVaR test algorithm. Finally, we also show that there exists a non-i.i.d. data collection strategy that results in a sample complexity independent of the number of groups. △ Less

Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted for publication in the IEEE Journal on Selected Areas in Information Theory (JSAIT)

arXiv:2311.10195 [pdf, other]

doi 10.1038/s41586-023-06673-6

Minutes-duration Optical Flares with Supernova Luminosities

Authors: Anna Y. Q. Ho, Daniel A. Perley, ** Chen, Steve Schulze, Vik Dhillon, Harsh Kumar, Aswin Suresh, Vishwajeet Swain, Michael Bremer, Stephen J. Smartt, Joseph P. Anderson, G. C. Anupama, Supachai Awiphan, Sudhanshu Barway, Eric C. Bellm, Sagi Ben-Ami, Varun Bhalerao, Thomas de Boer, Thomas G. Brink, Rick Burruss, Poonam Chandra, Ting-Wan Chen, Wen-** Chen, Jeff Cooke, Michael W. Coughlin , et al. (52 additional authors not shown)

Abstract: In recent years, certain luminous extragalactic optical transients have been observed to last only a few days. Their short observed duration implies a different powering mechanism from the most common luminous extragalactic transients (supernovae) whose timescale is weeks. Some short-duration transients, most notably AT2018cow, display blue optical colours and bright radio and X-ray emission. Seve… ▽ More In recent years, certain luminous extragalactic optical transients have been observed to last only a few days. Their short observed duration implies a different powering mechanism from the most common luminous extragalactic transients (supernovae) whose timescale is weeks. Some short-duration transients, most notably AT2018cow, display blue optical colours and bright radio and X-ray emission. Several AT2018cow-like transients have shown hints of a long-lived embedded energy source, such as X-ray variability, prolonged ultraviolet emission, a tentative X-ray quasiperiodic oscillation, and large energies coupled to fast (but subrelativistic) radio-emitting ejecta. Here we report observations of minutes-duration optical flares in the aftermath of an AT2018cow-like transient, AT2022tsd (the "Tasmanian Devil"). The flares occur over a period of months, are highly energetic, and are likely nonthermal, implying that they arise from a near-relativistic outflow or jet. Our observations confirm that in some AT2018cow-like transients the embedded energy source is a compact object, either a magnetar or an accreting black hole. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 79 pages, 3 figures (main text) + 7 figures (extended data) + 2 figures (supplementary information). Published online in Nature on 15 November 2023

arXiv:2311.08833 [pdf, ps, other]

Phase retrieval with semi-algebraic and ReLU neural network priors

Authors: Tamir Bendory, Nadav Dym, Dan Edidin, Arun Suresh

Abstract: The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative… ▽ More The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3). △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.15141 [pdf, other]

SpecTr: Fast Speculative Decoding via Optimal Transport

Authors: Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

Abstract: Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score a… ▽ More Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score all tokens in the draft by the large language model in parallel. A subset of the tokens in the draft are accepted (and the rest rejected) based on a statistical method to guarantee that the final output follows the distribution of the large model. In this work, we provide a principled understanding of speculative decoding through the lens of optimal transport (OT) with $\textit{membership cost}$. This framework can be viewed as an extension of the well-known $\textit{maximal-coupling}$ problem. This new formulation enables us to generalize the speculative decoding method to allow for a set of $k$ candidates at the token-level, which leads to an improved optimal membership cost. We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$. We then propose a valid draft selection algorithm whose acceptance probability is $(1-1/e)$-optimal multiplicatively. Moreover, it can be computed in time almost linear with size of domain of a single token. Using this $new draft selection$ algorithm, we develop a new autoregressive sampling algorithm called $\textit{SpecTr}$, which provides speedup in decoding while ensuring that there is no quality degradation in the decoded output. We experimentally demonstrate that for state-of-the-art large language models, the proposed approach achieves a wall clock speedup of 2.13X, a further 1.37X speedup over speculative decoding on standard benchmarks. △ Less

Submitted 17 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.11381 [pdf, other]

Studying Lobby Influence in the European Parliament

Authors: Aswin Suresh, Lazar Radojevic, Francesco Salvi, Antoine Magron, Victor Kristof, Matthias Grossglauser

Abstract: We present a method based on natural language processing (NLP), for studying the influence of interest groups (lobbies) in the law-making process in the European Parliament (EP). We collect and analyze novel datasets of lobbies' position papers and speeches made by members of the EP (MEPs). By comparing these texts on the basis of semantic similarity and entailment, we are able to discover interpr… ▽ More We present a method based on natural language processing (NLP), for studying the influence of interest groups (lobbies) in the law-making process in the European Parliament (EP). We collect and analyze novel datasets of lobbies' position papers and speeches made by members of the EP (MEPs). By comparing these texts on the basis of semantic similarity and entailment, we are able to discover interpretable links between MEPs and lobbies. In the absence of a ground-truth dataset of such links, we perform an indirect validation by comparing the discovered links with a dataset, which we curate, of retweet links between MEPs and lobbies, and with the publicly disclosed meetings of MEPs. Our best method achieves an AUC score of 0.77 and performs significantly better than several baselines. Moreover, an aggregate analysis of the discovered links, between groups of related lobbies and political groups of MEPs, correspond to the expectations from the ideology of the groups (e.g., center-left groups are associated with social causes). We believe that this work, which encompasses the methodology, datasets, and results, is a step towards enhancing the transparency of the intricate decision-making processes within democratic institutions. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 11 pages, 5 figures. Under review for presentation at ICWSM 2024

arXiv:2309.07884 [pdf, ps, other]

Single-soft emissions for amplitudes with two colored particles at three loops

Authors: Franz Herzog, Yao Ma, Bernhard Mistlberger, Adi Suresh

Abstract: We compute the three-loop correction to the universal single-soft emission current for the case of scattering amplitudes with two additional color-charged partons. We present results valid for QCD and $\mathcal{N}=4$ super-symmetric Yang-Mills theory. To achieve our results we develop a new integrand expansion technique for scattering amplitudes in the presence of soft emissions. Furthermore, we o… ▽ More We compute the three-loop correction to the universal single-soft emission current for the case of scattering amplitudes with two additional color-charged partons. We present results valid for QCD and $\mathcal{N}=4$ super-symmetric Yang-Mills theory. To achieve our results we develop a new integrand expansion technique for scattering amplitudes in the presence of soft emissions. Furthermore, we obtain contributions from single final-state parton matrix elements to the Higgs boson and Drell-Yan production cross section at next-to-next-to-next-to-next-to leading order (N$^4$LO) in perturbative QCD in the threshold limit. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 29 pages, 2 Figures

Report number: SLAC-PUB-17742

arXiv:2307.13347 [pdf, other]

Federated Heavy Hitter Recovery under Linear Sketching

Authors: Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh

Abstract: Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our… ▽ More Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $Θ(\sqrt{R})$. We also empirically demonstrate our findings. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.11106 [pdf, other]

The importance of feature preprocessing for differentially private linear optimization

Authors: Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

Abstract: Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question:… ▽ More Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? Towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs an optimality gap proportional to the maximum Euclidean norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs an optimality gap proportional to the diameter of the features $\max_{x, x' \in D} \|x - x'\|_2$. We finally demonstrate the practicality of our algorithm on image classification benchmarks. △ Less

Submitted 19 February, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.08139 [pdf, other]

It's All Relative: Interpretable Models for Scoring Bias in Documents

Authors: Aswin Suresh, Chi-Hsuan Wu, Matthias Grossglauser

Abstract: We propose an interpretable model to score the bias present in web documents, based only on their textual content. Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article, where one version is more biased than the other. While prior approaches based on absolute bias classification have struggled to obtain a high… ▽ More We propose an interpretable model to score the bias present in web documents, based only on their textual content. Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article, where one version is more biased than the other. While prior approaches based on absolute bias classification have struggled to obtain a high accuracy for the task, we are able to develop a useful model for scoring bias by learning to perform pairwise comparisons of bias accurately. We show that we can interpret the parameters of the trained model to discover the words most indicative of bias. We also apply our model in three different settings - studying the temporal evolution of bias in Wikipedia articles, comparing news sources based on bias, and scoring bias in law amendments. In each case, we demonstrate that the outputs of the model can be explained and validated, even for the two domains that are outside the training-data domain. We also use the model to compare the general level of bias between domains, where we see that legal texts are the least biased and news media are the most biased, with Wikipedia articles in between. Given its high performance, simplicity, interpretability, and wide applicability, we hope the model will be useful for a large community, including Wikipedia and news editors, political and social scientists, and the general public. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: 12 pages

arXiv:2307.06835 [pdf, ps, other]

The generic crystallographic phase retrieval problem

Authors: Dan Edidin, Arun Suresh

Abstract: In this paper we consider the problem of recovering a signal $x \in \mathbb{R}^N$ from its power spectrum assuming that the signal is sparse with respect to a generic basis for $\mathbb{R}^N$. Our main result is that if the sparsity level is at most $\sim\! N/2$ in this basis then the generic sparse vector is uniquely determined up to sign from its power spectrum. We also prove that if the sparsit… ▽ More In this paper we consider the problem of recovering a signal $x \in \mathbb{R}^N$ from its power spectrum assuming that the signal is sparse with respect to a generic basis for $\mathbb{R}^N$. Our main result is that if the sparsity level is at most $\sim\! N/2$ in this basis then the generic sparse vector is uniquely determined up to sign from its power spectrum. We also prove that if the sparsity level is $\sim\! N/4$ then every sparse vector is determined up to sign from its power spectrum. Analogous results are also obtained for the power spectrum of a vector in $\mathbb{C}^N$ which extend earlier results of Wang and Xu \cite{arXiv:1310.0873}. △ Less

Submitted 24 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 20 pages

MSC Class: 42A10; 94A12; 94A15

arXiv:2307.04905 [pdf, other]

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

Abstract: The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkab… ▽ More The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 20 pages, 18 figures

arXiv:2306.08428 [pdf, other]

Black Hole Mergers in Holographic Space-time (HST) Models of Inflation

Authors: Anish Suresh, Tom Banks

Abstract: We perform a crude computer simulation to show that no problematic black holes are formed by mergers in the early matter dominated phase of the HST models of inflation. These are black holes whose decays could have been seen as signals in the CMB. We also conclude that tiny "black hole galaxies" form. Since black hole decay products are mostly massive standard model particles, and perhaps their su… ▽ More We perform a crude computer simulation to show that no problematic black holes are formed by mergers in the early matter dominated phase of the HST models of inflation. These are black holes whose decays could have been seen as signals in the CMB. We also conclude that tiny "black hole galaxies" form. Since black hole decay products are mostly massive standard model particles, and perhaps their superpartners, the fate of these proto-galaxies is a complicated dynamical problem. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 8 pages, 3 figures

arXiv:2305.18527 [pdf, other]

doi 10.3847/1538-3881/acccf0

A 4-8 GHz Galactic Center Search for Periodic Technosignatures

Authors: Akshay Suresh, Vishal Gajjar, Pranav Nagarajan, Sofia Z. Sheikh, Andrew P. V. Siemion, Matt Lebofsky, David H. E. MacMahon, Danny C. Price, Steve Croft

Abstract: Radio searches for extraterrestrial intelligence have mainly targeted the discovery of narrowband continuous-wave beacons and artificially dispersed broadband bursts. Periodic pulse trains, in comparison to the above technosignature morphologies, offer an energetically efficient means of interstellar transmission. A rotating beacon at the Galactic Center (GC), in particular, would be highly advant… ▽ More Radio searches for extraterrestrial intelligence have mainly targeted the discovery of narrowband continuous-wave beacons and artificially dispersed broadband bursts. Periodic pulse trains, in comparison to the above technosignature morphologies, offer an energetically efficient means of interstellar transmission. A rotating beacon at the Galactic Center (GC), in particular, would be highly advantageous for galaxy-wide communications. Here, we present blipss, a CPU-based open-source software that uses a fast folding algorithm (FFA) to uncover channel-wide periodic signals in radio dynamic spectra. Running blipss on 4.5 hours of 4-8 GHz data gathered with the Robert C. Byrd Green Bank Telescope, we searched the central 6' of our Galaxy for kHz-wide signals with periods between 11-100 s and duty cycles ($δ$) between 10-50%. Our searches, to our knowledge, constitute the first FFA exploration for periodic alien technosignatures. We report a non-detection of channel-wide periodic signals in our data. Thus, we constrain the abundance of 4-8 GHz extraterrestrial transmitters of kHz-wide periodic pulsed signals to fewer than one in about 600,000 stars at the GC above a 7$σ$ equivalent isotropic radiated power of $\approx 2 \times 10^{18}$ W at $δ\simeq 10\%$. From an astrophysics standpoint, blipss, with its utilization of a per-channel FFA, can enable the discovery of signals with exotic radio frequency sweeps departing from the standard cold plasma dispersion law. △ Less

Submitted 2 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 20 pages, 11 figures, published in AJ, in press (http://seti.berkeley.edu/blipss/)

Journal ref: AJ 2023 165 255

arXiv:2305.13643 [pdf, other]

Hardware Trojans in Power Conversion Circuits

Authors: Jacob Sillman, Ajay Suresh

Abstract: This report investigates the potential impact of a Trojan attack on power conversion circuits, specifically a switching signal attack designed to trigger a locking of the pulse width modulation (PWM) signal that goes to a power field-effect transistor (FET). The first simulation shows that this type of attack can cause severe overvoltage, potentially leading to functional failure. The report propo… ▽ More This report investigates the potential impact of a Trojan attack on power conversion circuits, specifically a switching signal attack designed to trigger a locking of the pulse width modulation (PWM) signal that goes to a power field-effect transistor (FET). The first simulation shows that this type of attack can cause severe overvoltage, potentially leading to functional failure. The report proposes a solution using a large bypass capacitor to force signal parity, effectively negating the Trojan circuit. The simulation results demonstrate that the proposed solution can effectively thwart the Trojan attack. However, several caveats must be considered, such as the size of the capacitor, possible current leakage, and the possibility that the solution can be circumvented by an adversary with knowledge of the protection strategy. Overall, the findings suggest that proper protection mechanisms, such as the proposed signal-parity solution, must be considered when designing power conversion circuits to mitigate the risk of Trojan attacks. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 4 pages, 6 figures, will not be submitted to any journals

arXiv:2305.06020 [pdf, other]

doi 10.1103/PhysRevB.108.085435

Giant spin Nernst effect in a two-dimensional antiferromagnet due to magnetoelastic coupling-induced gaps and interband transitions between magnon-like bands

Authors: D. -Q. To, C. Y. Ameyaw, A. Suresh, S. Bhatt, M. J. H. Ku, M. B. Jungfleisch, J. Q. Xiao, J. M. O. Zide, B. K. Nikolic, M. F. Doty

Abstract: We analyze theoretically the origin of the spin Nernst and thermal Hall effects in FePS3 as a realization of two-dimensional antiferromagnet (2D AFM). We find that a strong magnetoelastic coupling, hybridizing magnetic excitation (magnon) and elastic excitation (phonon), combined with time-reversal-symmetry-breaking, results in a Berry curvature hotspots in the region of anticrossing between the t… ▽ More We analyze theoretically the origin of the spin Nernst and thermal Hall effects in FePS3 as a realization of two-dimensional antiferromagnet (2D AFM). We find that a strong magnetoelastic coupling, hybridizing magnetic excitation (magnon) and elastic excitation (phonon), combined with time-reversal-symmetry-breaking, results in a Berry curvature hotspots in the region of anticrossing between the two distinct hybridized bands. Furthermore, large spin Berry curvature emerges due to interband transitions between two magnon-like bands, where a small energy gap is induced by magnetoelastic coupling between such bands that are energetically distant from anticrossing of hybridized bands. These nonzero Berry curvatures generate topological transverse transport (i.e., the thermal Hall effect) of hybrid excitations, dubbed magnon-polaron, as well as of spin (i.e., the spin Nernst effect) carried by them, in response to applied longitudinal temperature gradient. We investigate the dependence of the spin Nernst and thermal Hall conductivities on the applied magnetic field and temperature, unveiling very large spin Nernst conductivity even at zero magnetic field. Our results suggest FePS3 AFM, which is already available in 2D form experimentally, as a promising platform to explore the topological transport of the magnon-polaron quasiparticles at THz frequencies. △ Less

Submitted 16 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 11 pages, 7 figures; Supplemental Materials is available from https://mrsec.udel.edu/publications/

Journal ref: Physical Review B, 2023

arXiv:2305.02096 [pdf, other]

doi 10.1063/5.0159448

Counterdiabatic driving for long-lived singlet state preparation

Authors: Abhinav Suresh, Vishal Varma, Priya Batra, T S Mahesh

Abstract: The quantum adiabatic method, which maintains populations in their instantaneous eigenstates throughout the state evolution, is an established and often a preferred choice for state preparation and manipulation. Though it minimizes the driving cost significantly, its slow speed is a severe limitation in noisy intermediate-scale quantum (NISQ) era technologies. Since adiabatic paths are extensive i… ▽ More The quantum adiabatic method, which maintains populations in their instantaneous eigenstates throughout the state evolution, is an established and often a preferred choice for state preparation and manipulation. Though it minimizes the driving cost significantly, its slow speed is a severe limitation in noisy intermediate-scale quantum (NISQ) era technologies. Since adiabatic paths are extensive in many physical processes, it is of broader interest to achieve adiabaticity at a much faster rate. Shortcuts to adiabaticity techniques which overcome the slow adiabatic process by driving the system faster through non-adiabatic paths, have seen increased attention recently. The extraordinarily long lifetime of the long-lived singlet states (LLS) in nuclear magnetic resonance, established over the past decade, has opened several important applications ranging from spectroscopy to biomedical imaging. Various methods, including adiabatic methods, are already being used to prepare LLS. In this article, we report the use of counterdiabatic driving (CD) to speed up LLS preparation with faster drives. Using NMR experiments, we show that CD can give stronger LLS order in shorter durations than conventional adiabatic driving. △ Less

Submitted 28 June, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.09740 [pdf, ps, other]

doi 10.1103/PhysRevA.108.013711

Intensity effects of light coupling to one- or two-atom arrays of infinite extent

Authors: F. Robicheaux, Deepak A. Suresh

Abstract: We theoretically and computationally investigate the behavior of infinite atom arrays when illuminated by nearly resonant light. We use higher order mean field equations to investigate the coherent reflection and transmission and incoherent scattering of photons from a single array and from a pair of arrays as a function of detuning for different values of the Rabi frequency. For the single array… ▽ More We theoretically and computationally investigate the behavior of infinite atom arrays when illuminated by nearly resonant light. We use higher order mean field equations to investigate the coherent reflection and transmission and incoherent scattering of photons from a single array and from a pair of arrays as a function of detuning for different values of the Rabi frequency. For the single array case, we show how increasing the light intensity changes the probabilities for these different processes. For example, the incoherent scattering probability initially increases with light intensity before decreasing at higher values. For a pair of parallel arrays at near resonant separation, the effects from increasing light intensity can become apparent with incredibly low intensity light. In addition, we derive the higher order mean field equations for these infinite arrays giving a representation that can be evaluated with a finite number of equations. △ Less

Submitted 25 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Journal ref: Phys. Rev. A 108, 013711 (2023)

arXiv:2303.08284 [pdf, other]

Robot Navigation in Risky, Crowded Environments: Understanding Human Preferences

Authors: Aamodh Suresh, Angelique Taylor, Laurel D. Riek, Sonia Martinez

Abstract: Risky and crowded environments (RCE) contain abstract sources of risk and uncertainty, which are perceived differently by humans, leading to a variety of behaviors. Thus, robots deployed in RCEs, need to exhibit diverse perception and planning capabilities in order to interpret other human agents' behavior and act accordingly in such environments. To understand this problem domain, we conducted a… ▽ More Risky and crowded environments (RCE) contain abstract sources of risk and uncertainty, which are perceived differently by humans, leading to a variety of behaviors. Thus, robots deployed in RCEs, need to exhibit diverse perception and planning capabilities in order to interpret other human agents' behavior and act accordingly in such environments. To understand this problem domain, we conducted a study to explore human path choices in RCEs, enabling better robotic navigational explainable AI (XAI) designs. We created a novel COVID-19 pandemic grocery shop** scenario which had time-risk tradeoffs, and acquired users' path preferences. We found that participants showcase a variety of path preferences: from risky and urgent to safe and relaxed. To model users' decision making, we evaluated three popular risk models (Cumulative Prospect Theory (CPT), Conditional Value at Risk (CVAR), and Expected Risk (ER). We found that CPT captured people's decision making more accurately than CVaR and ER, corroborating theoretical results that CPT is more expressive and inclusive than CVaR and ER. We also found that people's self assessments of risk and time-urgency do not correlate with their path preferences in RCEs. Finally, we conducted thematic analysis of open-ended questions, providing crucial design insights for robots is RCE. Thus, through this study, we provide novel and critical insights about human behavior and perception to help design better navigational explainable AI (XAI) in RCEs. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Under review

arXiv:2303.01262 [pdf, other]

Subset-Based Instance Optimality in Private Estimation

Authors: Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

Abstract: We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well w… ▽ More We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions. △ Less

Submitted 28 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.14127 [pdf, other]

doi 10.3847/1538-4357/aca977

X3: a high-mass Young Stellar Object close to the supermassive black hole Sgr~A*

Authors: Florian Peißker, Michal Zajacek, Nadeen B. Sabha, Masato Tsuboi, Jihane Moultaka, Lucas Labadie, Andreas Eckart, Vladimir Karas, Lukas Steiniger, Matthias Subroweit, Anjana Suresh, Maria Melamed, Yann Clenet

Abstract: To date, the proposed observation of Young Stellar Objects (YSOs) in the Galactic center (GC) still raises the question where and how these objects could have formed due to the violent vicinity of Sgr~A*. Here, we report the multi-wavelength detection of a highly dynamic YSO close to Sgr~A* that might be a member of the IRS13 cluster. We observe the beforehand known coreless bow-shock source X3 in… ▽ More To date, the proposed observation of Young Stellar Objects (YSOs) in the Galactic center (GC) still raises the question where and how these objects could have formed due to the violent vicinity of Sgr~A*. Here, we report the multi-wavelength detection of a highly dynamic YSO close to Sgr~A* that might be a member of the IRS13 cluster. We observe the beforehand known coreless bow-shock source X3 in the near- and mid-infrared (NIR/MIR) with SINFONI (VLT), NACO (VLT), ISAAC (VLT), VISIR (VLT), SHARP (NTT), and NIRCAM2 (KECK). In the radio domain, we use CO continuum and H30$α$ ALMA observations to identify system components at different temperatures and locations concerning the central stellar source. It is suggested that these radio/submm observations in combination with the NIR Br$γ$ line can be associated with a protoplanetary disk of the YSO which is consistent with manifold VISIR observations that reveal complex molecules and elements such as PAH, SIV, NeII and ArIII in a dense and compact region. Based on the photometric multi-wavelength analysis, we infer the mass of $15^{+10}_{-5} M_{\odot}$ for the YSO with a related age of a few $10^4$ yr. Due to this age estimate and the required relaxation time scales for high-mass stars, this finding is an indication for ongoing star formation in the inner parsec. The proper motion and 3d distance imply a relation of X3 and IRS13. We argue that IRS13 may serve as a birthplace for young stars that are ejected due to the evaporation of the cluster. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 36 pages, 28 figures, accepted by ApJ

arXiv:2302.11783 [pdf, other]

A Semantics for Counterfactuals in Quantum Causal Models

Authors: Ardra Kooderi Suresh, Markus Frembs, Eric G. Cavalcanti

Abstract: We introduce a formalism for the evaluation of counterfactual queries in the framework of quantum causal models, by generalising the three-step procedure of abduction, action, and prediction in Pearl's classical formalism of counterfactuals. To this end, we define a suitable extension of Pearl's notion of a "classical structural causal model", which we denote analogously by "quantum structural cau… ▽ More We introduce a formalism for the evaluation of counterfactual queries in the framework of quantum causal models, by generalising the three-step procedure of abduction, action, and prediction in Pearl's classical formalism of counterfactuals. To this end, we define a suitable extension of Pearl's notion of a "classical structural causal model", which we denote analogously by "quantum structural causal model". We show that every classical (probabilistic) structural causal model can be extended to a quantum structural causal model, and prove that counterfactual queries that can be formulated within a classical structural causal model agree with their corresponding queries in the quantum extension - but the latter is more expressive. Counterfactuals in quantum causal models come in different forms: we distinguish between active and passive counterfactual queries, depending on whether or not an intervention is to be performed in the action step. This is in contrast to the classical case, where counterfactuals are always interpreted in the active sense. As a consequence of this distinction, we observe that quantum causal models break the connection between causal and counterfactual dependence that exists in the classical case: (passive) quantum counterfactuals allow counterfactual dependence without causal dependence. This illuminates an important distinction between classical and quantum causal models, which underlies the fact that the latter can reproduce quantum correlations that violate Bell inequalities while being faithful to the relativistic causal structure. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 22+4 pages, 11 figures

arXiv:2302.09904 [pdf, other]

WW-FL: Secure and Private Large-Scale Federated Learning

Authors: Felix Marx, Thomas Schneider, Ajith Suresh, Tobias Wehrle, Christian Weinert, Hossein Yalame

Abstract: Federated learning (FL) is an efficient approach for large-scale distributed machine learning that promises data privacy by kee** training data on client devices. However, recent research has uncovered vulnerabilities in FL, impacting both security and privacy through poisoning attacks and the potential disclosure of sensitive information in individual model updates as well as the aggregated glo… ▽ More Federated learning (FL) is an efficient approach for large-scale distributed machine learning that promises data privacy by kee** training data on client devices. However, recent research has uncovered vulnerabilities in FL, impacting both security and privacy through poisoning attacks and the potential disclosure of sensitive information in individual model updates as well as the aggregated global model. This paper explores the inadequacies of existing FL protection measures when applied independently, and the challenges of creating effective compositions. Addressing these issues, we propose WW-FL, an innovative framework that combines secure multi-party computation (MPC) with hierarchical FL to guarantee data and global model privacy. One notable feature of WW-FL is its capability to prevent malicious clients from directly poisoning model parameters, confining them to less destructive data poisoning attacks. We furthermore provide a PyTorch-based FL implementation integrated with Meta's CrypTen MPC framework to systematically measure the performance and robustness of WW-FL. Our extensive evaluation demonstrates that WW-FL is a promising solution for secure and private large-scale federated learning. △ Less

Submitted 30 May, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: WWFL combines private training and inference with secure aggregation and hierarchical FL to provide end-to-end protection and to facilitate large-scale global deployment

arXiv:2302.06869 [pdf, other]

Concentration Bounds for Discrete Distribution Estimation in KL Divergence

Authors: Clément L. Canonne, Ziteng Sun, Ananda Theertha Suresh

Abstract: We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior result of $k/n$. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors. We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior result of $k/n$. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors. △ Less

Submitted 12 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: Updated discussion of previous work

arXiv:2301.02250 [pdf, other]

doi 10.3847/1538-3881/aca8a2

The Colorado Ultraviolet Transit Experiment (CUTE) Mission Overview

Authors: Kevin France, Brian Fleming, Arika Egan, Jean-Michel Desert, Luca Fossati, Tommi T. Koskinen, Nicholas Nell, Pascal Petit, Aline A. Vidotto, Matthew Beasley, Nicholas DeCicco, Aickara Gopinathan Sreejith, Ambily Suresh, Jared Baumert, P. Wilson Cauley, Carolina Villarreal DAngelo, Keri Hoadley, Robert Kane, Richard Kohnert, Julian Lambert, Stefan Ulrich

Abstract: Atmospheric escape is a fundamental process that affects the structure, composition, and evolution of many planets. The signatures of escape are detectable on close-in, gaseous exoplanets orbiting bright stars, owing to the high levels of extreme-ultraviolet irradiation from their parent stars. The Colorado Ultraviolet Transit Experiment (CUTE) is a CubeSat mission designed to take advantage of th… ▽ More Atmospheric escape is a fundamental process that affects the structure, composition, and evolution of many planets. The signatures of escape are detectable on close-in, gaseous exoplanets orbiting bright stars, owing to the high levels of extreme-ultraviolet irradiation from their parent stars. The Colorado Ultraviolet Transit Experiment (CUTE) is a CubeSat mission designed to take advantage of the near-ultraviolet stellar brightness distribution to conduct a survey of the extended atmospheres of nearby close-in planets. The CUTE payload is a magnifying NUV (2479~--~3306 Ang) spectrograph fed by a rectangular Cassegrain telescope (206mm x 84mm); the spectrogram is recorded on a back-illuminated, UV-enhanced CCD. The science payload is integrated into a 6U Blue Canyon Technology XB1 bus. CUTE was launched into a polar, low-Earth orbit on 27 September 2021 and has been conducting this transit spectroscopy survey following an on-orbit commissioning period. This paper presents the mission motivation, development path, and demonstrates the potential for small satellites to conduct this type of science by presenting initial on-orbit science observations. The primary science mission is being conducted in 2022~--~2023, with a publicly available data archive coming on line in 2023. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 12 pages, 5 figures, AJ - accepted

arXiv:2301.01307 [pdf, other]

doi 10.3847/1538-3881/aca8a3

The on-orbit performance of the Colorado Ultraviolet Transit Experiment (CUTE) Mission

Authors: Arika Egan, Nicholas Nell, Ambily Suresh, Kevin France, Brian Fleming, A. G. Sreejith, Julian Lambert, Nicholas DeCicco

Abstract: We present the on-orbit performance of the Colorado Ultraviolet Transit Experiment ($CUTE$). $CUTE$ is a 6U CubeSat that launched on September 27th, 2021 and is obtaining near-ultraviolet (NUV, 2480 A -- 3306 A) transit spectroscopy of short-period exoplanets. The instrument comprises a 20 cm $\times$ 8 cm rectangular Cassegrain telescope, an NUV spectrograph with a holographically ruled aberratio… ▽ More We present the on-orbit performance of the Colorado Ultraviolet Transit Experiment ($CUTE$). $CUTE$ is a 6U CubeSat that launched on September 27th, 2021 and is obtaining near-ultraviolet (NUV, 2480 A -- 3306 A) transit spectroscopy of short-period exoplanets. The instrument comprises a 20 cm $\times$ 8 cm rectangular Cassegrain telescope, an NUV spectrograph with a holographically ruled aberration-correcting diffraction grating, and a passively cooled, back-illuminated NUV-optimized CCD detector. The telescope feeds the spectrograph through an 18$'$ $\times$ 60$''$ slit. The spacecraft bus is a Blue Canyon Technologies XB1, which has demonstrated $\leq$ 6$''$ jitter in 56% of $CUTE$ science exposures. Following spacecraft commissioning, an on-orbit calibration program was executed to characterize the $CUTE$ instrument's on-orbit performance. The results of this calibration indicate that the effective area of $CUTE$ is $\approx$ 19.0 -- 27.5 cm$^{2}$ and that the average intrinsic resolution element is 2.9 A across the bandpass. This paper describes the measurement of the science instrument performance parameters as well as the thermal and pointing characteristics of the observatory. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2211.15913 [pdf, other]

Branch-Well-Structured Transition Systems and Extensions

Authors: Benedikt Bollig, Alain Finkel, Amrita Suresh

Abstract: We propose a relaxation to the definition of well-structured transition systems (\WSTS) while retaining the decidability of boundedness and non-termination. In this class, the well-quasi-ordered (wqo) condition is relaxed such that it is applicable only between states that are reachable one from another. Furthermore, the monotony condition is relaxed in the same way. While this retains the decidab… ▽ More We propose a relaxation to the definition of well-structured transition systems (\WSTS) while retaining the decidability of boundedness and non-termination. In this class, the well-quasi-ordered (wqo) condition is relaxed such that it is applicable only between states that are reachable one from another. Furthermore, the monotony condition is relaxed in the same way. While this retains the decidability of non-termination and boundedness, it appears that the coverability problem is undecidable. To this end, we define a new notion of monotony, called cover-monotony, which is strictly more general than the usual monotony and still allows us to decide a restricted form of the coverability problem. △ Less

Submitted 11 June, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.09727 [pdf, other]

A Survey on Evaluation Metrics for Synthetic Material Micro-Structure Images from Generative Models

Authors: Devesh Shah, Anirudh Suresh, Alemayehu Admasu, Devesh Upadhyay, Kalyanmoy Deb

Abstract: The evaluation of synthetic micro-structure images is an emerging problem as machine learning and materials science research have evolved together. Typical state of the art methods in evaluating synthetic images from generative models have relied on the Fréchet Inception Distance. However, this and other similar methods, are limited in the materials domain due to both the unique features that char… ▽ More The evaluation of synthetic micro-structure images is an emerging problem as machine learning and materials science research have evolved together. Typical state of the art methods in evaluating synthetic images from generative models have relied on the Fréchet Inception Distance. However, this and other similar methods, are limited in the materials domain due to both the unique features that characterize physically accurate micro-structures and limited dataset sizes. In this study we evaluate a variety of methods on scanning electron microscope (SEM) images of graphene-reinforced polyurethane foams. The primary objective of this paper is to report our findings with regards to the shortcomings of existing methods so as to encourage the machine learning community to consider enhancements in metrics for assessing quality of synthetic images in the material science domain. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Accepted in Neural Information Processing Systems (NeurIPS) 2022 Workshop on AI for Accelerated Materials Design (AI4Mat). Selected as spotlight paper for workshop

ACM Class: I.2.m; J.2

arXiv:2211.08450 [pdf, other]

Geometry Optimization for Long-lived Particle Detectors

Authors: Thomas Gorordo, Simon Knapen, Benjamin Nachman, Dean J. Robinson, Adi Suresh

Abstract: The proposed designs of many auxiliary long-lived particle (LLP) detectors at the LHC call for the instrumentation of a large surface area inside the detector volume, in order to reliably reconstruct tracks and LLP decay vertices. Taking the CODEX-b detector as an example, we provide a proof-of-concept optimization analysis that demonstrates the required instrumented surface area can be substantia… ▽ More The proposed designs of many auxiliary long-lived particle (LLP) detectors at the LHC call for the instrumentation of a large surface area inside the detector volume, in order to reliably reconstruct tracks and LLP decay vertices. Taking the CODEX-b detector as an example, we provide a proof-of-concept optimization analysis that demonstrates the required instrumented surface area can be substantially reduced for many LLP models, while only marginally affecting the LLP signal efficiency. This optimization permits a significant reduction in cost and installation time, and may also inform the installation order for modular detector elements. We derive a branch-and-bound based optimization algorithm that permits highly computationally efficient determination of optimal detector configurations, subject to any specified LLP vertex and track reconstruction requirements. We outline the features of a newly-developed generalized simulation framework, for the computation of LLP signal efficiencies across a range of LLP models and detector geometries. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 46 pages, 11 figures, 3 tables

arXiv:2211.04367 [pdf, other]

Much Easier Said Than Done: Falsifying the Causal Relevance of Linear Decoding Methods

Authors: Lucas Hayne, Abhijit Suresh, Hunar Jain, Rahul Kumar, R. McKell Carter

Abstract: Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectiv… ▽ More Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we systematically ablate groups of units in subregions of activation space. Here, we find a weak relationship between neurons identified by probes and those identified by ablation. More specifically, we find that an interaction between selectivity and the average activity of the unit better predicts ablation performance deficits for groups of units in AlexNet, VGG16, MobileNetV2, and ResNet101. Linear decoders are likely somewhat effective because they overlap with those units that are causally important for network function. Interpretability methods could be improved by focusing on causally important units. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 6 pages, 3 figures, to be published in I Can't Believe It's Note Better Workshop at NeurIPS 2022

arXiv:2211.03645 [pdf, other]

doi 10.1103/PhysRevB.107.174421

Quantum-classical approach to spin and charge pum** and the ensuing radiation in THz spintronics: Example of ultrafast-light-driven Weyl antiferromagnet Mn$_3$Sn

Authors: Abhin Suresh, Branislav K. Nikolic

Abstract: The interaction of fs light pulses with magnetic materials has been intensely studied for more than two decades in order to understand ultrafast demagnetization in single magnetic layers or THz emission from their bilayers with nonmagnetic spin-orbit (SO) materials. Here we develop a multiscale quantum-classical formalism -- where conduction electrons are described by quantum master equation of th… ▽ More The interaction of fs light pulses with magnetic materials has been intensely studied for more than two decades in order to understand ultrafast demagnetization in single magnetic layers or THz emission from their bilayers with nonmagnetic spin-orbit (SO) materials. Here we develop a multiscale quantum-classical formalism -- where conduction electrons are described by quantum master equation of the Lindblad type; classical dynamics of local magnetization is described by the Landau-Lifshitz-Gilbert (LLG) equation; and incoming light is described by classical vector potential while outgoing electromagnetic radiation is computed using Jefimenko equations for retarded electric and magnetic fields -- and apply it a bilayer of antiferromagnetic Weyl semimetal Mn$_3$Sn with noncollinear local magnetization in contact with SO-coupled nonmagnetic material. Our QME+LLG+Jefimenko scheme makes it possible to understand how fs light pulse generates directly spin and charge pum** and electromagnetic radiation by the latter, including both odd and even high harmonics (of the pulse center frequency) up to order $n \le 7$. The directly pumped spin current then exert spin torque on local magnetization whose dynamics, in turn, pumps additional spin and charge currents radiating in the THz range. By switching on and off LLG dynamics and SO couplings, we unravel which microscopic mechanism contribute the most to emitted THz radiation -- charge pum** by local magnetization of Mn$_3$Sn in the presence of its intrinsic SO coupling is far more important than standardly assumed (for other types of magnetic layers) spin pum** and subsequent spin-to-charge conversion within the neighboring nonmagnetic SO-coupled material. △ Less

Submitted 11 April, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: 15 pages, 8 figures; three supplemental movies are available from this https://wiki.physics.udel.edu/qttg/Publications

Journal ref: Phys. Rev. B 107, 174421 (2023)

arXiv:2210.07376 [pdf, other]

doi 10.1109/SaTML59370.2024.00031

ScionFL: Efficient and Robust Secure Quantized Aggregation

Authors: Yaniv Ben-Itzhak, Helen Möllering, Benny Pinkas, Thomas Schneider, Ajith Suresh, Oleksandr Tkachenko, Shay Vargaftik, Christian Weinert, Hossein Yalame, Avishay Yanai

Abstract: Secure aggregation is commonly used in federated learning (FL) to alleviate privacy concerns related to the central aggregator seeing all parameter updates in the clear. Unfortunately, most existing secure aggregation schemes ignore two critical orthogonal research directions that aim to (i) significantly reduce client-server communication and (ii) mitigate the impact of malicious clients. However… ▽ More Secure aggregation is commonly used in federated learning (FL) to alleviate privacy concerns related to the central aggregator seeing all parameter updates in the clear. Unfortunately, most existing secure aggregation schemes ignore two critical orthogonal research directions that aim to (i) significantly reduce client-server communication and (ii) mitigate the impact of malicious clients. However, both of these additional properties are essential to facilitate cross-device FL with thousands or even millions of (mobile) participants. In this paper, we unite both research directions by introducing ScionFL, the first secure aggregation framework for FL that operates efficiently on quantized inputs and simultaneously provides robustness against malicious clients. Our framework leverages (novel) multi-party computation (MPC) techniques and supports multiple linear (1-bit) quantization schemes, including ones that utilize the randomized Hadamard transform and Kashin's representation. Our theoretical results are supported by extensive evaluations. We show that with no overhead for clients and moderate overhead for the server compared to transferring and processing quantized updates in plaintext, we obtain comparable accuracy for standard FL benchmarks. Moreover, we demonstrate the robustness of our framework against state-of-the-art poisoning attacks. △ Less

Submitted 17 May, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Published in 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

arXiv:2210.06634 [pdf, other]

doi 10.1103/PhysRevA.109.022414

Electron-mediated entanglement of two distant macroscopic ferromagnets within a nonequilibrium spintronic device

Authors: A. Suresh, R. D. Soares, P. Mondal, J. P. Santos Pires, J. M. Viana Parente Lopes, Aires Ferreira, A. E. Feiguin, P. Plecháč, B. K. Nikolić

Abstract: Using the nascent concept of quantum spin-transfer torque [A. Zholud et al., Phys. Rev. Lett. {\bf 119}, 257201 (2017); M. D. Petrović {\em et al.}, Phys. Rev. X {\bf 11}, 021062 (2021)], we demonstrate that a current pulse can be harnessed to entangle quantum localized spins of two spatially separated ferromagnets (FMs) which are initially unentangled. The envisaged setup comprises a spin-polariz… ▽ More Using the nascent concept of quantum spin-transfer torque [A. Zholud et al., Phys. Rev. Lett. {\bf 119}, 257201 (2017); M. D. Petrović {\em et al.}, Phys. Rev. X {\bf 11}, 021062 (2021)], we demonstrate that a current pulse can be harnessed to entangle quantum localized spins of two spatially separated ferromagnets (FMs) which are initially unentangled. The envisaged setup comprises a spin-polarizer (FM$_p$) and a spin-analyzer (FM$_a$) FM layers separated by normal metal (NM) spacer. The injection of a current pulse into the device leads to a time-dependent superposition of many-body states characterized by a high degree of entanglement between the spin degrees of freedom of the two distant FM layers. The non-equilibrium dynamics are due to the transfer of spin angular momentum from itinerant electrons to the localized spins via a quantum spin-torque mechanism that remains active even for {\em collinear but antiparallel} arrangements of the FM$_p$ and FM$_a$ magnetizations (a situation in which the conventional spin-torque is absent). We quantify the mixed-state entanglement generated between the FM layers by tracking the time-evolution of the full density matrix and analyzing the build-up of the mutual logarithmic negativity over time. The effect of decoherence and dissipation in the FM layers due to coupling to bosonic baths at finite temperature, the use of multi-electron current pulses and the dependence on the number of spins are also considered in an effort to ascertain the robustness of our predictions under realistic conditions. Finally, we propose a ``current-pump/X-ray-probe'' scheme, utilizing ultrafast X-ray spectroscopy, that can witness nonequilibrium and transient entanglement of the FM layers by extracting its time-dependent quantum Fisher information. △ Less

Submitted 20 December, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 15 pages, 7 figures, new quantum master equations for open quantum spin system employed; supplemental movie available from this https://wiki.physics.udel.edu/qttg/Publications

Journal ref: Phys. Rev. A 109, 024413 (2024)

arXiv:2210.03461 [pdf, other]

FastCLIPstyler: Optimisation-free Text-based Image Style Transfer Using Style Representations

Authors: Ananda Padhmanabhan Suresh, Sanjana Jain, Pavit Noinongyao, Ankush Ganguly, Ukrit Watchareeruetai, Aubin Samacoits

Abstract: In recent years, language-driven artistic style transfer has emerged as a new type of style transfer technique, eliminating the need for a reference style image by using natural language descriptions of the style. The first model to achieve this, called CLIPstyler, has demonstrated impressive stylisation results. However, its lengthy optimisation procedure at runtime for each query limits its suit… ▽ More In recent years, language-driven artistic style transfer has emerged as a new type of style transfer technique, eliminating the need for a reference style image by using natural language descriptions of the style. The first model to achieve this, called CLIPstyler, has demonstrated impressive stylisation results. However, its lengthy optimisation procedure at runtime for each query limits its suitability for many practical applications. In this work, we present FastCLIPstyler, a generalised text-based image style transfer model capable of stylising images in a single forward pass for arbitrary text inputs. Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for compatibility with resource-constrained devices. Through quantitative and qualitative comparisons with state-of-the-art approaches, we demonstrate that our models achieve superior stylisation quality based on measurable metrics while offering significantly improved runtime efficiency, particularly on edge devices. △ Less

Submitted 14 November, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted at the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

arXiv:2208.06135 [pdf, other]

Private Domain Adaptation from a Public Source

Authors: Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

Abstract: A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data. In regression problems with no privacy constraints on the source or target data, a discrepancy… ▽ More A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data. In regression problems with no privacy constraints on the source or target data, a discrepancy minimization algorithm based on several theoretical guarantees was shown to outperform a number of other adaptation algorithm baselines. Building on that approach, we design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. The design and analysis of our private algorithms critically hinge upon several key properties we prove for a smooth approximation of the weighted discrepancy, such as its smoothness with respect to the $\ell_1$-norm and the sensitivity of its gradient. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms. We show that our adaptation algorithms benefit from strong generalization and privacy guarantees and report the results of experiments demonstrating their effectiveness. △ Less

Submitted 12 August, 2022; originally announced August 2022.

arXiv:2207.11250 [pdf, other]

Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing

Authors: Sai Mitheran, Anushri Suresh, Nisha J. S., Varun P. Gopi

Abstract: Single-image haze removal is a long-standing hurdle for computer vision applications. Several works have been focused on transferring advances from image classification, detection, and segmentation to the niche of image dehazing, primarily focusing on contrastive learning and knowledge distillation. However, these approaches prove computationally expensive, raising concern regarding their applicab… ▽ More Single-image haze removal is a long-standing hurdle for computer vision applications. Several works have been focused on transferring advances from image classification, detection, and segmentation to the niche of image dehazing, primarily focusing on contrastive learning and knowledge distillation. However, these approaches prove computationally expensive, raising concern regarding their applicability to on-the-edge use-cases. This work introduces a simple, lightweight, and efficient framework for single-image haze removal, exploiting rich "dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation. We designed a feature affinity module to maximize the flow of rich feature semantics from the super-resolution teacher to the student dehazing network. In order to evaluate the efficacy of our proposed framework, its performance as a plug-and-play setup to a baseline model is examined. Our experiments are carried out on the RESIDE-Standard dataset to demonstrate the robustness of our framework to the synthetic and real-world domains. The extensive qualitative and quantitative results provided establish the effectiveness of the framework, achieving gains of upto 15\% (PSNR) while reducing the model size by $\sim$20 times. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Preprint version. Accepted at Optik

arXiv:2206.12224 [pdf, other]

MPClan: Protocol Suite for Privacy-Conscious Computations

Authors: Nishat Koti, Shravani Patil, Arpita Patra, Ajith Suresh

Abstract: The growing volumes of data being collected and its analysis to provide better services are creating worries about digital privacy. To address privacy concerns and give practical solutions, the literature has relied on secure multiparty computation. However, recent research has mostly focused on the small-party honest-majority setting of up to four parties, noting efficiency concerns. In this work… ▽ More The growing volumes of data being collected and its analysis to provide better services are creating worries about digital privacy. To address privacy concerns and give practical solutions, the literature has relied on secure multiparty computation. However, recent research has mostly focused on the small-party honest-majority setting of up to four parties, noting efficiency concerns. In this work, we extend the strategies to support a larger number of participants in an honest-majority setting with efficiency at the center stage. Cast in the preprocessing paradigm, our semi-honest protocol improves the online complexity of the decade-old state-of-the-art protocol of Damgård and Nielson (CRYPTO'07). In addition to having an improved online communication cost, we can shut down almost half of the parties in the online phase, thereby saving up to 50% in the system's operational costs. Our maliciously secure protocol also enjoys similar benefits and requires only half of the parties, except for one-time verification, towards the end. To showcase the practicality of the designed protocols, we benchmark popular applications such as deep neural networks, graph neural networks, genome sequence matching, and biometric matching using prototype implementations. Our improved protocols aid in bringing up to 60-80% savings in monetary cost over prior work. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Showing 1–50 of 167 results for author: Suresh, A