-
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Authors:
Riccardo Fogliato,
Pratik Patil,
Mathew Monfort,
Pietro Perona
Abstract:
Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framew…
▽ More
Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framework for model evaluation that includes stratification, sampling, and estimation components. We examine the statistical properties of each component and evaluate their efficiency (precision). One key result of our work is that stratification via k-means clustering based on accurate predictions of model performance yields efficient estimators. Our experiments on computer vision datasets show that this method consistently provides more precise accuracy estimates than the traditional simple random sampling, even with substantial efficiency gains of 10x. We also find that model-assisted estimators, which leverage predictions of model accuracy on the unlabeled portion of the dataset, are generally more efficient than the traditional estimates based solely on the labeled data.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
An improved version of Kac's Central Limit Theorem
Authors:
Suprio Bhar,
Ritwik Mukherjee,
Prathmesh Patil
Abstract:
The classical Central Limit Theorem (CLT) states that for a sequence of independent and identically distributed (i.i.d) random variables with finite mean and variance, the normalized sample mean converges to the standard normal distribution.
In $1946$, Victor Kac proved a Central Limit type theorem for a sequence of random variables that were not independent. The random variables under considera…
▽ More
The classical Central Limit Theorem (CLT) states that for a sequence of independent and identically distributed (i.i.d) random variables with finite mean and variance, the normalized sample mean converges to the standard normal distribution.
In $1946$, Victor Kac proved a Central Limit type theorem for a sequence of random variables that were not independent. The random variables under consideration were obtained from the angle-doubling map. The idea behind Kac's proof was to show that although the random variables under consideration were not independent, they were what he calls \textit{statistically independent} (in modern terminology, this concept is called long range independence). The final conclusion of his paper was that the sample averages of the random variables, suitably normalized converges to the standard normal distribution.
In the 1970's, Charles Stein revolutionized the field of probability by discovering a new method to obtain the limiting distribution for a sequence of random variables. Among other things, his method gave an alternative proof of the classical Central Limit Theorem.
We obtain an improvement of Victor Kac's result by applying Stein's method. We show that the normalized sample averages converge to the standard normal distribution in the Wasserstein metric, which is stronger than the convergence in distribution.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Reimplementation of Learning to Reweight Examples for Robust Deep Learning
Authors:
Parth Patil,
Ben Boardley,
Jack Gardner,
Emily Loiselle,
Deerajkumar Parthipan
Abstract:
Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. DNNs are a popular tool within machine learning due to their ability to model complex patterns and distributions. However, the performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these…
▽ More
Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. DNNs are a popular tool within machine learning due to their ability to model complex patterns and distributions. However, the performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these sets, noisy labels and training set biases, are known to frequently cause poor generalization performance as a result of overfitting to the training set. This paper aims to solve this problem using the approach proposed by Ren et al. (2018) using meta-training and online weight approximation. We will first implement a toy-problem to crudely verify the claims made by the authors of Ren et al. (2018) and then venture into using the approach to solve a real world problem of Skin-cancer detection using an imbalanced image dataset.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Optimal Ridge Regularization for Out-of-Distribution Prediction
Authors:
Pratik Patil,
**-Hong Du,
Ryan J. Tibshirani
Abstract:
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal struc…
▽ More
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally-tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Hadamard Regularization of the Graviton Stress Tensor
Authors:
Anna Negro,
Subodh P. Patil
Abstract:
We present the details for the covariant renormalization of the stress tensor for vacuum tensor perturbations at the level of the effective action, adopting Hadamard regularization techniques to isolate short distance divergences and gauge fixing via the Faddeev-Popov procedure. The subsequently derived renormalized stress tensor can be related to more familiar forms reliant upon an averaging pres…
▽ More
We present the details for the covariant renormalization of the stress tensor for vacuum tensor perturbations at the level of the effective action, adopting Hadamard regularization techniques to isolate short distance divergences and gauge fixing via the Faddeev-Popov procedure. The subsequently derived renormalized stress tensor can be related to more familiar forms reliant upon an averaging prescription, such as the Isaacson or Misner-Thorne-Wheeler forms. The latter, however, are premised on a prior scale separation (beyond which the averaging is invoked) and therefore unsuited for the purposes of renormalization. This can lead to potentially unphysical conclusions when taken as a starting point for the computation of any observable that needs regularization, such as the energy density associated to a stochastic background. Any averaging prescription, if needed, should only be invoked at the end of the renormalization procedure. The latter necessarily involves the imposition of renormalization conditions via a physical measurement at some fixed scale, which we retrace for primordial gravitational waves sourced from vacuum fluctuations through direct or indirect observation.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
Authors:
Pratik Patil,
Yuchen Wu,
Ryan J. Tibshirani
Abstract:
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we sh…
▽ More
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Probing the inverse moment of $B_s$-meson distribution amplitude via $B_s \to η_s$ form factors
Authors:
Rusa Mandal,
Praveen S Patil,
Ipsita Ray
Abstract:
We investigate the inverse moment of the $B_s$-meson light-cone distribution amplitude (LCDA), denoted as $λ_{B_s}$ and defined within the heavy quark effective theory, through the calculation of $B_s \to η_s$ form factors. The presence of the $s$-quark inside the $B_s$-meson dictates a notable departure of approximately $20\%$ in the $λ_{B_s}$ value compared to the non-strange case $λ_{B_q}$, as…
▽ More
We investigate the inverse moment of the $B_s$-meson light-cone distribution amplitude (LCDA), denoted as $λ_{B_s}$ and defined within the heavy quark effective theory, through the calculation of $B_s \to η_s$ form factors. The presence of the $s$-quark inside the $B_s$-meson dictates a notable departure of approximately $20\%$ in the $λ_{B_s}$ value compared to the non-strange case $λ_{B_q}$, as computed within the QCD sum rule approach, albeit with significant uncertainty. First, we compute the decay constant of the $η_s$-meson utilizing two-point sum rules while retaining finite $s$-quark mass contributions. Next, we constrain the parameter $λ_{B_s}$ by calculating $B_s \to η_s$ form factors within the light-cone sum rule approach, using $B_s$-meson LCDAs, and leveraging Lattice QCD estimates at zero momentum transfer from the HPQCD collaboration. Our findings yield $λ_{B_s}$ = 480 $\pm$ 92 MeV when expressing the $B_s$-meson LCDAs in the Exponential model, consistent with previous QCD sum rule estimate yet exhibiting a 1.5-fold improvement in uncertainty. Furthermore, we compare the form factor predictions, based on the extracted $λ_{B_s}$ value, with earlier analyses for other channels such as $B_s \to D_s$ and $B_s \to K$.
△ Less
Submitted 3 July, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Parallel Approximate Maximum Flows in Near-Linear Work and Polylogarithmic Depth
Authors:
Arpit Agarwal,
Sanjeev Khanna,
Huan Li,
Prathamesh Patil,
Chen Wang,
Nathan White,
Peilin Zhong
Abstract:
We present a parallel algorithm for the $(1-ε)$-approximate maximum flow problem in capacitated, undirected graphs with $n$ vertices and $m$ edges, achieving $O(ε^{-3}\text{polylog} n)$ depth and $O(m ε^{-3} \text{polylog} n)$ work in the PRAM model. Although near-linear time sequential algorithms for this problem have been known for almost a decade, no parallel algorithms that simultaneously achi…
▽ More
We present a parallel algorithm for the $(1-ε)$-approximate maximum flow problem in capacitated, undirected graphs with $n$ vertices and $m$ edges, achieving $O(ε^{-3}\text{polylog} n)$ depth and $O(m ε^{-3} \text{polylog} n)$ work in the PRAM model. Although near-linear time sequential algorithms for this problem have been known for almost a decade, no parallel algorithms that simultaneously achieved polylogarithmic depth and near-linear work were known.
At the heart of our result is a polylogarithmic depth, near-linear work recursive algorithm for computing congestion approximators. Our algorithm involves a recursive step to obtain a low-quality congestion approximator followed by a "boosting" step to improve its quality which prevents a multiplicative blow-up in error. Similar to Peng [SODA'16], our boosting step builds upon the hierarchical decomposition scheme of Räcke, Shah, and Täubig [SODA'14].
A direct implementation of this approach, however, leads only to an algorithm with $n^{o(1)}$ depth and $m^{1+o(1)}$ work. To get around this, we introduce a new hierarchical decomposition scheme, in which we only need to solve maximum flows on subgraphs obtained by contracting vertices, as opposed to vertex-induced subgraphs used in Räcke, Shah, and Täubig [SODA'14]. In particular, we are able to directly extract congestion approximators for the subgraphs from a congestion approximator for the entire graph, thereby avoiding additional recursion on those subgraphs. Along the way, we also develop a parallel flow-decomposition algorithm that is crucial to achieving polylogarithmic depth and may be of independent interest.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Autoencoder with Ordered Variance for Nonlinear Model Identification
Authors:
Midhun T. Augustine,
Parag Patil,
Mani Bhushan,
Sharad Bhartiya
Abstract:
This paper presents a novel autoencoder with ordered variance (AEO) in which the loss function is modified with a variance regularization term to enforce order in the latent space. Further, the autoencoder is modified using ResNets, which results in a ResNet AEO (RAEO). The paper also illustrates the effectiveness of AEO and RAEO in extracting nonlinear relationships among input variables in an un…
▽ More
This paper presents a novel autoencoder with ordered variance (AEO) in which the loss function is modified with a variance regularization term to enforce order in the latent space. Further, the autoencoder is modified using ResNets, which results in a ResNet AEO (RAEO). The paper also illustrates the effectiveness of AEO and RAEO in extracting nonlinear relationships among input variables in an unsupervised setting.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Misalignment, Learning, and Ranking: Harnessing Users Limited Attention
Authors:
Arpit Agarwal,
Rad Niazadeh,
Prathamesh Patil
Abstract:
In digital health and EdTech, recommendation systems face a significant challenge: users often choose impulsively, in ways that conflict with the platform's long-term payoffs. This misalignment makes it difficult to effectively learn to rank items, as it may hinder exploration of items with greater long-term payoffs. Our paper tackles this issue by utilizing users' limited attention spans. We prop…
▽ More
In digital health and EdTech, recommendation systems face a significant challenge: users often choose impulsively, in ways that conflict with the platform's long-term payoffs. This misalignment makes it difficult to effectively learn to rank items, as it may hinder exploration of items with greater long-term payoffs. Our paper tackles this issue by utilizing users' limited attention spans. We propose a model where a platform presents items with unknown payoffs to the platform in a ranked list to $T$ users over time. Each user selects an item by first considering a prefix window of these ranked items and then picking the highest preferred item in that window (and the platform observes its payoff for this item). We study the design of online bandit algorithms that obtain vanishing regret against hindsight optimal benchmarks.
We first consider adversarial window sizes and stochastic iid payoffs. We design an active-elimination-based algorithm that achieves an optimal instance-dependent regret bound of $O(\log(T))$, by showing matching regret upper and lower bounds. The key idea is using the combinatorial structure of the problem to either obtain a large payoff from each item or to explore by getting a sample from that item. This method systematically narrows down the item choices to enhance learning efficiency and payoff.
Second, we consider adversarial payoffs and stochastic iid window sizes. We start from the full-information problem of finding the permutation that maximizes the expected payoff. By a novel combinatorial argument, we characterize the polytope of admissible item selection probabilities by a permutation and show it has a polynomial-size representation. Using this representation, we show how standard algorithms for adversarial online linear optimization in the space of admissible probabilities can be used to obtain a polynomial-time algorithm with $O(\sqrt{T})$ regret.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Maximum Likelihood Quantum Error Mitigation for Algorithms with a Single Correct Output
Authors:
Dror Baron,
Hrushikesh Pramod Patil,
Huiyang Zhou
Abstract:
Quantum error mitigation is an important technique to reduce the impact of noise in quantum computers. With more and more qubits being supported on quantum computers, there are two emerging fundamental challenges. First, the number of shots required for quantum algorithms with large numbers of qubits needs to increase in order to obtain a meaningful distribution or expected value of an observable.…
▽ More
Quantum error mitigation is an important technique to reduce the impact of noise in quantum computers. With more and more qubits being supported on quantum computers, there are two emerging fundamental challenges. First, the number of shots required for quantum algorithms with large numbers of qubits needs to increase in order to obtain a meaningful distribution or expected value of an observable. Second, although steady progress has been made in improving the fidelity of each qubit, circuits with a large number of qubits are likely to produce erroneous results. This low-shot, high-noise regime calls for highly scalable error mitigation techniques. In this paper, we propose a simple and effective mitigation scheme, qubit-wise majority vote, for quantum algorithms with a single correct output. We show that our scheme produces the maximum likelihood (ML) estimate under certain assumptions, and bound the number of shots required. Our experimental results on real quantum devices confirm that our proposed approach requires fewer shots than existing ones, and can sometimes recover the correct answers even when they are not observed from the measurement results.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
An Étude on the Regularization and Renormalization of Divergences in Primordial Observables
Authors:
Anna Negro,
Subodh P. Patil
Abstract:
Many cosmological observables of interest derive from primordial vacuum fluctuations evolved to late times. These observables represent statistical draws from some underlying quantum or statistical field theoretic framework where infinities arise and require regularization. After subtracting divergences, renormalization conditions must be imposed by measurements or observations at some scale, mind…
▽ More
Many cosmological observables of interest derive from primordial vacuum fluctuations evolved to late times. These observables represent statistical draws from some underlying quantum or statistical field theoretic framework where infinities arise and require regularization. After subtracting divergences, renormalization conditions must be imposed by measurements or observations at some scale, mindful of scheme and background dependence. We review this process on backgrounds that transition from finite duration inflation to radiation domination, and show how in spite of the ubiquity of scaleless integrals, UV divergences can still be meaningfully extracted from quantities that nominally vanish when dimensionally regularized. In this way, one can contextualize calculations with hard cutoffs, distinguishing between UV and IR scales corresponding to the beginning and end of inflation from UV and IR scales corresponding the unknown completion of the theory and its observables. This distinction has significance as observable quantities cannot depend on the latter although they will certainly depend on the former. One can also explicitly show the scheme independence of the coefficients of UV divergent logarithms. Furthermore, certain IR divergences can be shown to be an artifact of the de Sitter limit and are cured for finite duration inflation. For gravitational wave observables, we stress the need to regularize stress tensors that do not presume a prior scale separation in their construction (as with the standard Isaacson form), deriving an improved stress tensor fit to purpose. We conclude by highlighting the inextricable connection between inferring $N_{\rm eff}$ bounds from vacuum tensor perturbations and the process of background renormalization.
△ Less
Submitted 3 June, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Magnetic Penrose process in the magnetized Kerr spacetime
Authors:
Chandrachur Chakraborty,
Parth Patil,
G. Akash
Abstract:
Magnetic Penrose process (MPP) could be highly efficient (efficiency can even exceed $100\%$) for extracting the energy from a Kerr black hole, if it is immersed in a mG order magnetic field. Considering the exact solution of the magnetized Kerr spacetime, here we derive the exact expression of efficiency ($η_{\rm MPP}$) for MPP, which is valid for both the Kerr black hole (BH) as well as Kerr sup…
▽ More
Magnetic Penrose process (MPP) could be highly efficient (efficiency can even exceed $100\%$) for extracting the energy from a Kerr black hole, if it is immersed in a mG order magnetic field. Considering the exact solution of the magnetized Kerr spacetime, here we derive the exact expression of efficiency ($η_{\rm MPP}$) for MPP, which is valid for both the Kerr black hole (BH) as well as Kerr superspinar (SS), and also from the weak magnetic field to an ultra-strong magnetic field $(B)$ which can even distort the original Kerr geometry. We show that although the value of $η_{\rm MPP}$ increases upto a certain value of ultra-strong magnetic field ($B_p$), it decreases to zero for $B > B_p$, in case of the Kerr BHs. On the other hand, $η_{\rm MPP}$ shows the opposite behavior in case of the Kerr SSs. One intriguing feature that emerges is, $η_{\rm MPP}$ acquires the maximum value for the Kerr parameter $a_* \approx 0.786$ (unlike $a_*=1$ for the ordinary PP), decreases for $0.786 < a_* \leq 1$. This indicates that the BH starts to expel the effect of magnetic field for $a_* > 0.786$, and is fully expelled from the extremal Kerr BH due to the gravitational Meissner effect. As a special case of MPP, we also study the ordinary Penrose process (PP) for magnetized Kerr spacetime. We show that MPP for Kerr BHs, Kerr SSs and ordinary PP for Kerr SSs can be superefficient for the astrophysical applications to powering engines in the high-energy sources like active galactic nuclei and quasars, in the weak magnetic fields. Our strong magnetic field result of MPP could be important to the primordial BHs in the early Universe immersed in the primordial magnetic fields, and to the transmuted BHs which are formed by collapsing and/or by merging of the magnetized progenitors. It is almost impossible to extract the energy from a BH (SS) through MPP (PP) in the ultra-strong magnetic fields.
△ Less
Submitted 4 March, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Authors:
Avi Singh,
John D. Co-Reyes,
Rishabh Agarwal,
Ankesh Anand,
Piyush Patil,
Xavier Garcia,
Peter J. Liu,
James Harrison,
Jaehoon Lee,
Kelvin Xu,
Aaron Parisi,
Abhishek Kumar,
Alex Alemi,
Alex Rizkowsky,
Azade Nova,
Ben Adlam,
Bernd Bohnet,
Gamaleldin Elsayed,
Hanie Sedghi,
Igor Mordatch,
Isabelle Simpson,
Izzeddin Gur,
Jasper Snoek,
Jeffrey Pennington,
Jiri Hron
, et al. (16 additional authors not shown)
Abstract:
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig…
▽ More
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.
△ Less
Submitted 17 April, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Enhancing Virtual Distillation with Circuit Cutting for Quantum Error Mitigation
Authors:
Peiyi Li,
Ji Liu,
Hrushikesh Pramod Patil,
Paul Hovland,
Huiyang Zhou
Abstract:
Virtual distillation is a technique that aims to mitigate errors in noisy quantum computers. It works by preparing multiple copies of a noisy quantum state, bridging them through a circuit, and conducting measurements. As the number of copies increases, this process allows for the estimation of the expectation value with respect to a state that approaches the ideal pure state rapidly. However, vir…
▽ More
Virtual distillation is a technique that aims to mitigate errors in noisy quantum computers. It works by preparing multiple copies of a noisy quantum state, bridging them through a circuit, and conducting measurements. As the number of copies increases, this process allows for the estimation of the expectation value with respect to a state that approaches the ideal pure state rapidly. However, virtual distillation faces a challenge in realistic scenarios: preparing multiple copies of a quantum state and bridging them through a circuit in a noisy quantum computer will significantly increase the circuit size and introduce excessive noise, which will degrade the performance of virtual distillation. To overcome this challenge, we propose an error mitigation strategy that uses circuit-cutting technology to cut the entire circuit into fragments. With this approach, the fragments responsible for generating the noisy quantum state can be executed on a noisy quantum device, while the remaining fragments are efficiently simulated on a noiseless classical simulator. By running each fragment circuit separately on quantum and classical devices and recombining their results, we can reduce the noise accumulation and enhance the effectiveness of the virtual distillation technique. Our strategy has good scalability in terms of both runtime and computational resources. We demonstrate our strategy's effectiveness through noisy simulation and experiments on a real quantum device.
△ Less
Submitted 9 October, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning
Authors:
Pratik Patil,
Daniel LeJeune
Abstract:
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomp…
▽ More
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally optimized by only tuning sketch size in infinite ensembles. For general subquadratic prediction risk functionals, we extend GCV to construct consistent risk estimators, and thereby obtain distributional convergence of the GCV-corrected predictions in Wasserstein-2 metric. This in particular allows construction of prediction intervals with asymptotically correct coverage conditional on the training data. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles. We empirically validate our theoretical results using both synthetic and real large-scale datasets with practical sketches including CountSketch and subsampled randomized discrete cosine transforms.
△ Less
Submitted 19 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Optimization-based frequentist confidence intervals for functionals in constrained inverse problems: Resolving the Burrus conjecture
Authors:
Pau Batlle,
Pratik Patil,
Michael Stanley,
Houman Owhadi,
Mikael Kuusela
Abstract:
We present an optimization-based framework to construct confidence intervals for functionals in constrained inverse problems, ensuring valid one-at-a-time frequentist coverage guarantees. Our approach builds upon the now-called strict bounds intervals, originally pioneered by Burrus (1965) and Rust and Burrus (1972), which offer ways to directly incorporate any side information about the parameter…
▽ More
We present an optimization-based framework to construct confidence intervals for functionals in constrained inverse problems, ensuring valid one-at-a-time frequentist coverage guarantees. Our approach builds upon the now-called strict bounds intervals, originally pioneered by Burrus (1965) and Rust and Burrus (1972), which offer ways to directly incorporate any side information about the parameters during inference without introducing external biases. This family of methods allows for uncertainty quantification in ill-posed inverse problems without needing to select a regularizing prior. By tying optimization-based intervals to an inversion of a constrained likelihood ratio test, we translate interval coverage guarantees into type I error control and characterize the resulting interval via solutions to optimization problems. Along the way, we refute the Burrus conjecture, which posited that, for possibly rank-deficient linear Gaussian models with positivity constraints, a correction based on the quantile of the chi-squared distribution with one degree of freedom suffices to shorten intervals while maintaining frequentist coverage guarantees. Our framework provides a novel approach to analyzing the conjecture, and we construct a counterexample employing a stochastic dominance argument, which we also use to disprove a general form of the conjecture. We illustrate our framework with several numerical examples and provide directions for extensions beyond the Rust-Burrus method for nonlinear, non-Gaussian settings with general constraints.
△ Less
Submitted 16 April, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Corrected generalized cross-validation for finite ensembles of penalized estimators
Authors:
Pierre C. Bellec,
**-Hong Du,
Takuya Koriyama,
Pratik Patil,
Kai Tan
Abstract:
Generalized cross-validation (GCV) is a widely-used method for estimating the squared out-of-sample prediction risk that employs a scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsi…
▽ More
Generalized cross-validation (GCV) is a widely-used method for estimating the squared out-of-sample prediction risk that employs a scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.
△ Less
Submitted 21 April, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Projections of totally disconnected thin fractals with very thick shadows on ${\mathbb R}^d$
Authors:
Chun-Kit Lai,
Lekha Priya Patil
Abstract:
We study an extreme scenario of the Mastrand projection theorem for which a fractal has the property that its orthogonal projection is the same as the orthogonal projection of its convex hull. We extend results in current literature and establish checkable criteria for self-affine sets to have such property. Using this, we show that every convex polytope on $\R^d$ contains a totally disconnected c…
▽ More
We study an extreme scenario of the Mastrand projection theorem for which a fractal has the property that its orthogonal projection is the same as the orthogonal projection of its convex hull. We extend results in current literature and establish checkable criteria for self-affine sets to have such property. Using this, we show that every convex polytope on $\R^d$ contains a totally disconnected compact set, which is a union of self-affine sets, of dimension as close to 1 as possible, as well as a rectifiable 1-set, such that the fractal projects to an interval in every 1-dimensional subspace and its convex hull is the given polytope. Other convex sets and projections onto higher dimensional subspaces will also be discussed.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Disentangling the primordial nature of stochastic gravitational wave backgrounds with CMB spectral distortions
Authors:
Bryce Cyr,
Thomas Kite,
Jens Chluba,
J. Colin Hill,
Donghui Jeong,
Sandeep Kumar Acharya,
Boris Bolliet,
Subodh P. Patil
Abstract:
The recent detection of a stochastic gravitational wave background (SGWB) at nanohertz frequencies by pulsar timing arrays (PTAs) has sparked a flurry of interest. Beyond the standard interpretation that the progenitor is a network of supermassive black hole binaries, many exotic models have also been proposed, some of which can potentially offer a better fit to the data. We explore how the variou…
▽ More
The recent detection of a stochastic gravitational wave background (SGWB) at nanohertz frequencies by pulsar timing arrays (PTAs) has sparked a flurry of interest. Beyond the standard interpretation that the progenitor is a network of supermassive black hole binaries, many exotic models have also been proposed, some of which can potentially offer a better fit to the data. We explore how the various connections between gravitational waves and CMB spectral distortions can be leveraged to help determine whether a SGWB was generated primordially or astrophysically. To this end, we present updated $k$-space window functions which can be used for distortion parameter estimation on enhancements to the primordial scalar power spectrum. These same enhancements can also source gravitational waves (GWs) directly at second order in perturbation theory, so-called scalar-induced GWs (SIGWs), and indirectly through the formation of primordial black holes (PBHs). We perform a map** of scalar power spectrum constraints into limits on the GW parameter space of SIGWs for $δ$-function features. We highlight that broader features in the scalar spectrum can explain the PTA results while simultaneously producing a spectral distortion (SD) within reach of future experiments. We additionally update PBH constraints from $μ$- and $y$-type spectral distortions. Refined treatments of the distortion window functions widen existing SD constraints, and we find that a future CMB spectrometer could play a pivotal role in unraveling the origin of GWs imprinted at or below CMB anisotropy scales.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Quantum Monte Carlo simulations in the restricted Hilbert space of Rydberg atom arrays
Authors:
Pranay Patil
Abstract:
Rydberg atom arrays have emerged as a powerful platform to simulate a number of exotic quantum ground states and phase transitions. To verify these capabilities numerically, we develop a versatile quantum Monte Carlo sampling technique which operates in the reduced Hilbert space generated by enforcing the constraint of a Rydberg blockade. We use the framework of stochastic series expansion and sho…
▽ More
Rydberg atom arrays have emerged as a powerful platform to simulate a number of exotic quantum ground states and phase transitions. To verify these capabilities numerically, we develop a versatile quantum Monte Carlo sampling technique which operates in the reduced Hilbert space generated by enforcing the constraint of a Rydberg blockade. We use the framework of stochastic series expansion and show that in the restricted space, the configuration space of operator strings can be understood as a hard rod gas in $d+1$ dimensions. We use this map** to develop cluster algorithms which can be visualized as various non-local movements of rods. We study the efficiency of each of our updates individually and collectively. To elucidate the utility of the algorithm, we show that it can efficiently generate the phase diagram of a Rydberg atom array, to temperatures much smaller than all energy scales involved, on a Kagomé link lattice. This is of broad interest as the presence of a $Z_2$ spin liquid has been hypothesized recently.
△ Less
Submitted 13 September, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Enhancing Low Resource NER Using Assisting Language And Transfer Learning
Authors:
Maithili Sabane,
Aparna Ranade,
Onkar Litake,
Parth Patil,
Raviraj Joshi,
Dipali Kadam
Abstract:
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot-filling methods have been widely deployed for popular languages. NER is used in applications such as human resources, customer service, search engines, content classification,…
▽ More
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot-filling methods have been widely deployed for popular languages. NER is used in applications such as human resources, customer service, search engines, content classification, and academia. In this paper, we draw focus on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. We use various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. We also compare multilingual models with monolingual models and establish a baseline. In this work, we show the assisting capabilities of the Hindi and Marathi languages for the NER task. We show that models trained using multiple languages perform better than a single language. However, we also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations
Authors:
Riccardo Fogliato,
Pratik Patil,
Pietro Perona
Abstract:
Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the l…
▽ More
Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the literature. In this work, we review methods for constructing confidence intervals for error rates in 1:1 matching tasks. We derive and examine the statistical properties of these methods, demonstrating how coverage and interval width vary with sample size, error rates, and degree of data dependence on both analysis and experiments with synthetic and real-world datasets. Based on our findings, we provide recommendations for best practices for constructing confidence intervals for error rates in 1:1 matching tasks.
△ Less
Submitted 26 April, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Multi-Study R-Learner for Estimating Heterogeneous Treatment Effects Across Studies Using Statistical Machine Learning
Authors:
Cathy Shyr,
Boyu Ren,
Prasad Patil,
Giovanni Parmigiani
Abstract:
Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, stu…
▽ More
Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, study populations, and data collection protocols, among others. To this end, we propose a framework for multi-study HTE estimation that accounts for between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with machine learning (ML) in the multi-study setting. It involves a data-adaptive objective function that links study-specific treatment effects with nuisance functions through membership probabilities, which enable information to be borrowed across potentially heterogeneous studies. The multi-study R-learner framework can combine data from randomized controlled trials, observational studies, or a combination of both. It's easy to implement and flexible in its ability to incorporate ML for estimating HTEs, nuisance functions, and membership probabilities. In the series estimation framework, we show that the multi-study R-learner is asymptotically normal and more efficient than the R-learner when there is between-study heterogeneity in the propensity score model under homoscedasticity. We illustrate using cancer data that the proposed method performs favorably compared to existing approaches in the presence of between-study heterogeneity.
△ Less
Submitted 24 April, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Generalized equivalences between subsampling and ridge regularization
Authors:
Pratik Patil,
**-Hong Du
Abstract:
We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $λ$ and subsample aspect ratios $ψ$, are asymptotically equivalent along specific paths in the $(λ,ψ)$-plane (where $ψ$ is…
▽ More
We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $λ$ and subsample aspect ratios $ψ$, are asymptotically equivalent along specific paths in the $(λ,ψ)$-plane (where $ψ$ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of $(λ,ψ)$. An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.
△ Less
Submitted 17 October, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Topological and conventional nano-photonic waveguides for chiral integrated quantum optics
Authors:
N. J Martin,
M. Jalali Mehrabad,
X. Chen,
R. Dost,
E. Nussbaum,
D. Hallett,
L. Hallacy,
A. Foster,
E. Clarke,
P. K. Patil,
S. Hughes,
M. Hafezi,
A. M Fox,
M. S. Skolnick,
L. R. Wilson
Abstract:
Chirality in integrated quantum photonics has emerged as a promising route towards achieving scalable quantum technologies with quantum nonlinearity effects. Topological photonic waveguides, which utilize helical optical modes, have been proposed as a novel approach to harnessing chiral light-matter interactions on-chip. However, uncertainties remain regarding the nature and strength of the chiral…
▽ More
Chirality in integrated quantum photonics has emerged as a promising route towards achieving scalable quantum technologies with quantum nonlinearity effects. Topological photonic waveguides, which utilize helical optical modes, have been proposed as a novel approach to harnessing chiral light-matter interactions on-chip. However, uncertainties remain regarding the nature and strength of the chiral coupling to embedded quantum emitters, hindering the scalability of these systems. In this work, we present a comprehensive investigation of chiral coupling in topological photonic waveguides using a combination of experimental, theoretical, and numerical analyses. We quantitatively characterize the position-dependence nature of the light-matter coupling on several topological photonic waveguides and benchmark their chiral coupling performance against conventional line defect waveguides for chiral quantum optical applications. Our results provide crucial insights into the degree and characteristics of chiral light-matter interactions in topological photonic quantum circuits and pave the way towards the implementation of quantitatively-predicted quantum nonlinear effects on-chip.
△ Less
Submitted 20 January, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Protecting Hilbert space fragmentation through quantum Zeno dynamics
Authors:
Pranay Patil,
Ayushi Singhania,
Jad C. Halimeh
Abstract:
Hilbert space fragmentation is an intriguing paradigm of ergodicity breaking in interacting quantum many-body systems with applications to quantum information technology, but it is usually adversely compromised in the presence of perturbations. In this work, we demonstrate the protection of constrained dynamics arising due to a combination of mirror symmetry and Hilbert space fragmentation by empl…
▽ More
Hilbert space fragmentation is an intriguing paradigm of ergodicity breaking in interacting quantum many-body systems with applications to quantum information technology, but it is usually adversely compromised in the presence of perturbations. In this work, we demonstrate the protection of constrained dynamics arising due to a combination of mirror symmetry and Hilbert space fragmentation by employing the concept of quantum Zeno dynamics. We focus on an Ising spin ladder with carefully chosen quantum fluctuations, which in the ideal case guarantee a perfect disentanglement under Hamiltonian dynamics for a large class of initial conditions. This is known to be a consequence of the interplay of Hilbert space fragmentation with a mirror symmetry, and we show numerically the effect of breaking the latter. To evince the power of this perfect disentanglement, we study the effect of generic perturbations around the fine-tuned model, and show that we can protect against the undesirable growth of entanglement entropy by using a local Ising interaction on the rungs of the ladder. This allows us to suppress the entanglement entropy to an \textit{arbitrarily} small value for an \textit{arbitrarily} long time by controlling the strength of the rung interaction. Our work demonstrates the experimentally feasible viability of quantum Zeno dynamics in the protection of quantum information against thermalization.
△ Less
Submitted 26 May, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Worm Blobs as Entangled Living Polymers: From Topological Active Matter to Flexible Soft Robot Collectives
Authors:
Antoine Deblais,
K. R. Prathyusha,
Rosa Sinaasappel,
Harry Tuazon,
Ishant Tiwari,
Vishal P. Patil,
M. Saad Bhamla
Abstract:
Recently, long and slender living worms have garnered significant interest because of their impressive ability to exhibit diverse emergent behaviors in highly entangled physical and topological conditions. These worms can form an active viscoelastic, three-dimensional soft entity known as the 'blob', which can behave like a solid, flow like a liquid, and even respond to external stimuli such as li…
▽ More
Recently, long and slender living worms have garnered significant interest because of their impressive ability to exhibit diverse emergent behaviors in highly entangled physical and topological conditions. These worms can form an active viscoelastic, three-dimensional soft entity known as the 'blob', which can behave like a solid, flow like a liquid, and even respond to external stimuli such as light to locomote or change shape. To understand the behavior of the blob, it is crucial to consider the high degree of conformational entanglement that individual units can achieve because of their high aspect ratio and tunable activity. This topologically active collective necessitates reevaluating established soft matter concepts in polymer physics to advance the development of active polymer-like materials. Our understanding of the complex emergent dynamics of the worm blob promises to catalyze further research into the behavior of entangled active polymers and guide the design of synthetic topological active matter and bioinspired tangling soft robot collectives.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation
Authors:
**-Hong Du,
Pratik Patil,
Arun Kumar Kuchibhotla
Abstract:
We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we…
▽ More
We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(λ, φ_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.
△ Less
Submitted 16 July, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Self-learning mechanical circuits
Authors:
Vishal P. Patil,
Ian Ho,
Manu Prakash
Abstract:
Computation, mechanics and materials merge in biological systems, which can continually self-optimize through internal adaptivity across length scales, from cytoplasm and biofilms to animal herds. Recent interest in such material-based computation uses the principles of energy minimization, inertia and dissipation to solve optimization problems. Although specific computations can be performed usin…
▽ More
Computation, mechanics and materials merge in biological systems, which can continually self-optimize through internal adaptivity across length scales, from cytoplasm and biofilms to animal herds. Recent interest in such material-based computation uses the principles of energy minimization, inertia and dissipation to solve optimization problems. Although specific computations can be performed using dynamical systems, current implementations of material computation lack the ability to self-learn. In particular, the inverse problem of designing self-learning mechanical systems which can use physical computations to continuously self-optimize remains poorly understood. Here we introduce the concept of self-learning mechanical circuits, capable of taking mechanical inputs from changing environments and constantly updating their internal state in response, thus representing an entirely mechanical information processing unit. Our circuits are composed of a new mechanical construct: an adaptive directed spring (ADS), which changes its stiffness in a directional manner, enabling neural network-like computations. We provide both a theoretical foundation and experimental realization of these elastic learning units and demonstrate their ability to autonomously uncover patterns hidden in environmental inputs. By implementing computations in an embodied physical manner, the system directly interfaces with its environment, thus broadening the scope of its learning behavior. Our results pave the way towards the construction of energy-harvesting, adaptive materials which can autonomously and continuously sense and self-optimize to gain function in different environments.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Primordial black holes from single-field inflation: a fine-tuning audit
Authors:
Philippa S. Cole,
Andrew D. Gow,
Christian T. Byrnes,
Subodh P. Patil
Abstract:
All single-field inflationary models invoke varying degrees of tuning in order to account for cosmological observations. Mechanisms that generate primordial black holes (PBHs) from enhancement of primordial power at small scales posit inflationary potentials that transiently break scale invariance and possibly adiabaticity over a range of modes. This requires additional tuning on top of that requi…
▽ More
All single-field inflationary models invoke varying degrees of tuning in order to account for cosmological observations. Mechanisms that generate primordial black holes (PBHs) from enhancement of primordial power at small scales posit inflationary potentials that transiently break scale invariance and possibly adiabaticity over a range of modes. This requires additional tuning on top of that required to account for observations at scales probed by cosmic microwave background (CMB) anisotropies. In this paper we study the parametric dependence of various single-field models of inflation that enhance power at small scales and quantify the degree to which coefficients in the model construction have to be tuned in order for certain observables to lie within specified ranges. We find significant tuning: changing the parameters of the potentials by between one part in a hundred and one part in $10^8$ (depending on the model) is enough to change the power spectrum peak amplitude by an order one factor. The fine-tuning of the PBH abundance is larger still by 1-2 orders of magnitude. We highlight the challenges imposed by this tuning on any given model construction. Furthermore, polynomial potentials appear to require significant additional fine-tuning to also match the CMB observations.
△ Less
Submitted 16 August, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Extrapolated cross-validation for randomized ensembles
Authors:
**-Hong Du,
Pratik Patil,
Kathryn Roeder,
Arun Kumar Kuchibhotla
Abstract:
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our m…
▽ More
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $δ$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.
△ Less
Submitted 15 December, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
A fresh look at symmetric traffic assignment and algorithm convergence
Authors:
Priyadarshan N. Patil
Abstract:
Extensions of the static traffic assignment problem with link interactions were studied extensively in the past. Much of the network modeling community has since shifted to dynamic traffic assignment incorporating these interactions. We believe there are several reasons to re-examine static assignment with link interactions. First, if link interactions can be captured in a symmetric, monotone mann…
▽ More
Extensions of the static traffic assignment problem with link interactions were studied extensively in the past. Much of the network modeling community has since shifted to dynamic traffic assignment incorporating these interactions. We believe there are several reasons to re-examine static assignment with link interactions. First, if link interactions can be captured in a symmetric, monotone manner, equilibrium always exists and is unique, and provably-correct algorithms exist. We show that several of the most efficient algorithms for the separable traffic assignment problem can be readily applied with symmetric interactions. We discuss how the (asymmetric) Daganzo merge model can be approximated by symmetric linear cost functions. Second, we present computational evidence suggesting that convergence to equilibrium is faster when symmetric, monotone link interactions are present. This is true even when interactions are asymmetric, despite the lack of a provable convergence result. Lastly, we present convergence behavior analysis for commonly used network and link metrics. For these reasons, we think static assignment with link interactions deserves additional attention in research and practice.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
KILDST: Effective Knowledge-Integrated Learning for Dialogue State Tracking using Gazetteer and Speaker Information
Authors:
Hyungtak Choi,
Hyeonmok Ko,
Gurpreet Kaur,
Lohith Ravuru,
Kiranmayi Gandikota,
Manisha Jhawar,
Simma Dharani,
Pranamya Patil
Abstract:
Dialogue State Tracking (DST) is core research in dialogue systems and has received much attention. In addition, it is necessary to define a new problem that can deal with dialogue between users as a step toward the conversational AI that extracts and recommends information from the dialogue between users. So, we introduce a new task - DST from dialogue between users about scheduling an event (DST…
▽ More
Dialogue State Tracking (DST) is core research in dialogue systems and has received much attention. In addition, it is necessary to define a new problem that can deal with dialogue between users as a step toward the conversational AI that extracts and recommends information from the dialogue between users. So, we introduce a new task - DST from dialogue between users about scheduling an event (DST-USERS). The DST-USERS task is much more challenging since it requires the model to understand and track dialogue states in the dialogue between users and to understand who suggested the schedule and who agreed to the proposed schedule. To facilitate DST-USERS research, we develop dialogue datasets between users that plan a schedule. The annotated slot values which need to be extracted in the dialogue are date, time, and location. Previous approaches, such as Machine Reading Comprehension (MRC) and traditional DST techniques, have not achieved good results in our extensive evaluations. By adopting the knowledge-integrated learning method, we achieve exceptional results. The proposed model architecture combines gazetteer features and speaker information efficiently. Our evaluations of the dialogue datasets between users that plan a schedule show that our model outperforms the baseline model.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Anomalous relaxation of density waves in a ring-exchange system
Authors:
Pranay Patil,
Markus Heyl,
Fabien Alet
Abstract:
We present the analysis of the slowing down exhibited by stochastic dynamics of a ring-exchange model on a square lattice, by means of numerical simulations. We find the preservation of coarse-grained memory of initial state of density-wave types for unexpectedly long times. This behavior is inconsistent with the prediction from a low frequency continuum theory developed by assuming a mean-field s…
▽ More
We present the analysis of the slowing down exhibited by stochastic dynamics of a ring-exchange model on a square lattice, by means of numerical simulations. We find the preservation of coarse-grained memory of initial state of density-wave types for unexpectedly long times. This behavior is inconsistent with the prediction from a low frequency continuum theory developed by assuming a mean-field solution. Through a detailed analysis of correlation functions of the dynamically active regions, we exhibit an unconventional transient long ranged structure formation in a direction which is featureless for the initial condition, and argue that its slow melting plays a crucial role in the slowing-down mechanism. We expect our results to be relevant also for the dynamics of quantum ring-exchange dynamics of hard-core bosons and more generally for dipole moment conserving models
△ Less
Submitted 15 April, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Evidence of a new shell closed nucleus governing slow quasi-fission
Authors:
A. Pal,
S. Santra,
A. Kundu,
D. Chattopadhyay,
P. C. Rout,
Ramandeep Gandhi,
P. N. Patil,
R. Tripathi,
B. J. Roy,
Y. Sawant,
T. N. Nag,
Abhijit Baishya,
T. Santhosh,
P. K. Rath,
N. Deshmukh
Abstract:
Mass distributions of fission fragments arising from the slow quasi-fission process have been derived by comparing the measured distributions with the theoretical distributions based on compound nuclear fission model for several reactions. The mass-distributions corresponding to quasi-fission events for all the systems show the following common features: (1) they are double peaked with fixed peak-…
▽ More
Mass distributions of fission fragments arising from the slow quasi-fission process have been derived by comparing the measured distributions with the theoretical distributions based on compound nuclear fission model for several reactions. The mass-distributions corresponding to quasi-fission events for all the systems show the following common features: (1) they are double peaked with fixed peak-centroids and nearly same width at different incident energies, (2) the yield of quasi-fission events decreases with the increasing projectile energy, and (3) peak corresponding to lighter fragment is observed at A $\sim$ 96 for all the systems, whereas the peak of heavier fragment increases linearly with the mass of the di-nuclear system. All the above observations are quite similar to the ones observed in well known asymmetric fission of actinides, thus providing clear evidences of shell effect in slow quasi-fission where the lighter fragment is possibly nuclei around $^{96}$Zr, a new doubly magic nucleus. This finding has great implications in the study of nuclear reactions, structure and particularly in super-heavy element synthesis where quasi-fission is synonymous.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Asymptotics of the Sketched Pseudoinverse
Authors:
Daniel LeJeune,
Pratik Patil,
Hamid Javadi,
Richard G. Baraniuk,
Ryan J. Tibshirani
Abstract:
We take a random matrix theory approach to random sketching and show an asymptotic first-order equivalence of the regularized sketched pseudoinverse of a positive semidefinite matrix to a certain evaluation of the resolvent of the same matrix. We focus on real-valued regularization and extend previous results on an asymptotic equivalence of random matrices to the real setting, providing a precise…
▽ More
We take a random matrix theory approach to random sketching and show an asymptotic first-order equivalence of the regularized sketched pseudoinverse of a positive semidefinite matrix to a certain evaluation of the resolvent of the same matrix. We focus on real-valued regularization and extend previous results on an asymptotic equivalence of random matrices to the real setting, providing a precise characterization of the equivalence even under negative regularization, including a precise characterization of the smallest nonzero eigenvalue of the sketched matrix, which may be of independent interest. We then further characterize the second-order equivalence of the sketched pseudoinverse. We also apply our results to the analysis of the sketch-and-project method and to sketched ridge regression. Lastly, we prove that these results generalize to asymptotically free sketching matrices, obtaining the resulting equivalence for orthogonal sketching matrices and comparing our results to several common sketches used in practice.
△ Less
Submitted 6 October, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Bagging in overparameterized learning: Risk characterization and risk monotonization
Authors:
Pratik Patil,
**-Hong Du,
Arun Kumar Kuchibhotla
Abstract:
Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to…
▽ More
Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.
△ Less
Submitted 24 October, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Ultrafast reversible self-assembly of living tangled matter
Authors:
Vishal P. Patil,
Harry Tuazon,
Emily Kaufman,
Tuhin Chakrabortty,
David Qin,
Jörn Dunkel,
M. Saad Bhamla
Abstract:
Tangled active filaments are ubiquitous in nature, from chromosomal DNA and cilia carpets to root networks and worm blobs. How activity and elasticity facilitate collective topological transformations in living tangled matter is not well understood. Here, we report an experimental and theoretical study of California blackworms (Lumbriculus variegatus), which slowly form tangles over minutes but ca…
▽ More
Tangled active filaments are ubiquitous in nature, from chromosomal DNA and cilia carpets to root networks and worm blobs. How activity and elasticity facilitate collective topological transformations in living tangled matter is not well understood. Here, we report an experimental and theoretical study of California blackworms (Lumbriculus variegatus), which slowly form tangles over minutes but can untangle in milliseconds. Combining ultrasound imaging, theoretical analysis and simulations, we develop and validate a mechanistic model that explains how the kinematics of individual active filaments determines their emergent collective topological dynamics. The model reveals that resonantly alternating helical waves enable both tangle formation and ultrafast untangling. By identifying generic dynamical principles of topological self-transformations, our results can provide guidance for designing new classes of topologically tunable active materials.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Threat Detection In Self-Driving Vehicles Using Computer Vision
Authors:
Umang Goenka,
Aaryan Jagetia,
Param Patil,
Akshay Singh,
Taresh Sharma,
Poonam Saini
Abstract:
On-road obstacle detection is an important field of research that falls in the scope of intelligent transportation infrastructure systems. The use of vision-based approaches results in an accurate and cost-effective solution to such systems. In this research paper, we propose a threat detection mechanism for autonomous self-driving cars using dashcam videos to ensure the presence of any unwanted o…
▽ More
On-road obstacle detection is an important field of research that falls in the scope of intelligent transportation infrastructure systems. The use of vision-based approaches results in an accurate and cost-effective solution to such systems. In this research paper, we propose a threat detection mechanism for autonomous self-driving cars using dashcam videos to ensure the presence of any unwanted obstacle on the road that falls within its visual range. This information can assist the vehicle's program to en route safely. There are four major components, namely, YOLO to identify the objects, advanced lane detection algorithm, multi regression model to measure the distance of the object from the camera, the two-second rule for measuring the safety, and limiting speed. In addition, we have used the Car Crash Dataset(CCD) for calculating the accuracy of the model. The YOLO algorithm gives an accuracy of around 93%. The final accuracy of our proposed Threat Detection Model (TDM) is 82.65%.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Classification of Electroencephalograms during Mathematical Calculations Using Deep Learning
Authors:
Umang Goenka,
Param Patil,
Kush Gosalia,
Aaryan Jagetia
Abstract:
Classifying Electroencephalogram(EEG) signals helps in understanding Brain-Computer Interface (BCI). EEG signals are vital in studying how the human mind functions. In this paper, we have used an Arithmetic Calculation dataset consisting of Before Calculation Signals (BCS) and During Calculation Signals (DCS). The dataset consisted of 36 participants. In order to understand the functioning of neur…
▽ More
Classifying Electroencephalogram(EEG) signals helps in understanding Brain-Computer Interface (BCI). EEG signals are vital in studying how the human mind functions. In this paper, we have used an Arithmetic Calculation dataset consisting of Before Calculation Signals (BCS) and During Calculation Signals (DCS). The dataset consisted of 36 participants. In order to understand the functioning of neurons in the brain, we classified BCS vs DCS. For this classification, we extracted various features such as Mutual Information (MI), Phase Locking Value (PLV), and Entropy namely Permutation entropy, Spectral entropy, Singular value decomposition entropy, Approximate entropy, Sample entropy. The classification of these features was done using RNN-based classifiers such as LSTM, BLSTM, ConvLSTM, and CNN-LSTM. The model achieved an accuracy of 99.72% when entropy was used as a feature and ConvLSTM as a classifier.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
Observation of large spontaneous emission rate enhancement of quantum dots in a broken-symmetry slow-light waveguide
Authors:
Hamidreza Siampour,
Christopher O'Rourke,
Alistair J. Brash,
Maxim N. Makhonin,
René Dost,
Dominic J. Hallett,
Edmund Clarke,
Pallavi K. Patil,
Maurice S. Skolnick,
A. Mark Fox
Abstract:
Quantum states of light and matter can be manipulated on the nanoscale to provide a technological resource for aiding the implementation of scalable photonic quantum technologies [1-3]. Experimental progress relies on the quality and efficiency of the coupling between photons and internal states of quantum emitters [4-6]. Here we demonstrate a nanophotonic waveguide platform with embedded quantum…
▽ More
Quantum states of light and matter can be manipulated on the nanoscale to provide a technological resource for aiding the implementation of scalable photonic quantum technologies [1-3]. Experimental progress relies on the quality and efficiency of the coupling between photons and internal states of quantum emitters [4-6]. Here we demonstrate a nanophotonic waveguide platform with embedded quantum dots (QDs) that enables both Purcell-enhanced emission and strong chiral coupling. The design uses slow-light effects in a glide-plane photonic crystal waveguide with QD tuning to match the emission frequency to the slow-light region. Simulations were used to map the chirality and Purcell enhancement depending on the position of a dipole emitter relative to the air holes. The highest Purcell factors and chirality occur in separate regions, but there is still a significant area where high values of both can be obtained. Based on this, we first demonstrate a record large radiative decay rate of 17 ns^-1 (60 ps lifetime) corresponding to a 20 fold Purcell enhancement. This was achieved by electric-field tuning of the QD to the slow-light region and quasi-resonant phonon-sideband excitation. We then demonstrate a 5 fold Purcell enhancement for a dot with high degree of chiral coupling to waveguide modes, substantially surpassing all previous measurements. Together these demonstrate the excellent prospects for using QDs in scalable implementations of on-chip spin-photonics relying on chiral quantum optics.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling
Authors:
Cathy Shyr,
Pragya Sur,
Giovanni Parmigiani,
Prasad Patil
Abstract:
Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study…
▽ More
Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.
△ Less
Submitted 12 July, 2022; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Galaxy number-count dipole and superhorizon fluctuations
Authors:
Guillem Domènech,
Roya Mohayaee,
Subodh P. Patil,
Subir Sarkar
Abstract:
In view of the growing tension between the dipole anisotropy of number counts of cosmologically distant sources and of the cosmic microwave background (CMB), we investigate the number count dipole induced by primordial perturbations with wavelength comparable to or exceeding the Hubble radius today. First, we find that neither adiabatic nor isocurvature superhorizon modes can generate an intrinsic…
▽ More
In view of the growing tension between the dipole anisotropy of number counts of cosmologically distant sources and of the cosmic microwave background (CMB), we investigate the number count dipole induced by primordial perturbations with wavelength comparable to or exceeding the Hubble radius today. First, we find that neither adiabatic nor isocurvature superhorizon modes can generate an intrinsic number count dipole. However a superhorizon isocurvature mode does induce a relative velocity between the CMB and the (dark) matter rest frames and thereby affects the CMB dipole. We revisit the possibility that it has an intrinsic component due to such a mode, thus enabling consistency with the galaxy number count dipole if the latter is actually kinematic in origin. Although this scenario is not particularly natural, there are possible links with other anomalies and it predicts a concommitant galaxy number count quadrupole which may be measurable in future surveys. We also investigate the number count dipole induced by modes smaller than the Hubble radius, finding that subject to CMB constraints this is too small to reconcile the dipole tension.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Computationally-Efficient Decomposition Heuristic for the Static Traffic Assignment Problem
Authors:
Venktesh Pandey,
Priyadarshan N. Patil
Abstract:
Applications such as megaregional planning require efficient methods for solving traffic assignment problems (TAPs) on large-scale networks. We propose a decomposition heuristic that generates approximate TAP solutions by partitioning the complete network into subnetworks which are solved in parallel and use an iterative-refinement algorithm for improving the network partitions. A novel network tr…
▽ More
Applications such as megaregional planning require efficient methods for solving traffic assignment problems (TAPs) on large-scale networks. We propose a decomposition heuristic that generates approximate TAP solutions by partitioning the complete network into subnetworks which are solved in parallel and use an iterative-refinement algorithm for improving the network partitions. A novel network transformation and three-stage algorithm are also proposed to solve a constrained shortest path problem as a subproblem of the heuristic. Experiments on various networks show that the heuristic can generate 15.1-67.8% computational savings in finding solutions with initial relative gap of 0.02. The performance benefits of the proposed heuristic when warmstarting standard TAP algorithms are demonstrated with an average computational savings of 10-35% over a TAP solver without warmstarting.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Sublinear Algorithms for Hierarchical Clustering
Authors:
Arpit Agarwal,
Sanjeev Khanna,
Huan Li,
Prathamesh Patil
Abstract:
Hierarchical clustering over graphs is a fundamental task in data mining and machine learning with applications in domains such as phylogenetics, social network analysis, and information retrieval. Specifically, we consider the recently popularized objective function for hierarchical clustering due to Dasgupta. Previous algorithms for (approximately) minimizing this objective function require line…
▽ More
Hierarchical clustering over graphs is a fundamental task in data mining and machine learning with applications in domains such as phylogenetics, social network analysis, and information retrieval. Specifically, we consider the recently popularized objective function for hierarchical clustering due to Dasgupta. Previous algorithms for (approximately) minimizing this objective function require linear time/space complexity. In many applications the underlying graph can be massive in size making it computationally challenging to process the graph even using a linear time/space algorithm. As a result, there is a strong interest in designing algorithms that can perform global computation using only sublinear resources. The focus of this work is to study hierarchical clustering for massive graphs under three well-studied models of sublinear computation which focus on space, time, and communication, respectively, as the primary resources to optimize: (1) (dynamic) streaming model where edges are presented as a stream, (2) query model where the graph is queried using neighbor and degree queries, (3) MPC model where the graph edges are partitioned over several machines connected via a communication channel.
We design sublinear algorithms for hierarchical clustering in all three models above. At the heart of our algorithmic results is a view of the objective in terms of cuts in the graph, which allows us to use a relaxed notion of cut sparsifiers to do hierarchical clustering while introducing only a small distortion in the objective function. Our main algorithmic contributions are then to show how cut sparsifiers of the desired form can be efficiently constructed in the query model and the MPC model. We complement our algorithmic results by establishing nearly matching lower bounds that rule out the possibility of designing better algorithms in each of these models.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Observations and Simulations of Radio Emission and Magnetic Fields in Minkowski's Object
Authors:
C. Nolting,
M. Lacy,
S. Croft,
P. C. Fragile,
S. T. Linden,
K. Nyland,
P. Patil
Abstract:
We combine new data from the Karl G. Jansky Very Large Array with previous radio observations to create a more complete picture of the ongoing interactions between the radio jet from galaxy NGC 541 and the star-forming system known as Minkowski's Object (MO). We then compare those observations with synthetic radio data generated from a new set of magnetohydrodynamic simulations of a jet-cloud inte…
▽ More
We combine new data from the Karl G. Jansky Very Large Array with previous radio observations to create a more complete picture of the ongoing interactions between the radio jet from galaxy NGC 541 and the star-forming system known as Minkowski's Object (MO). We then compare those observations with synthetic radio data generated from a new set of magnetohydrodynamic simulations of a jet-cloud interaction specifically tailored to the parameters of MO. The combination of radio intensity, polarization, and spectral index measurements all convincingly support the interaction scenario and provide additional constraints on the local dynamical state of the intracluster medium and the time since the jet-cloud interaction first began. In particular, we show that only a simulation with a bent radio jet can reproduce the observations.
△ Less
Submitted 23 August, 2022; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.