Search | arXiv e-print repository

Fast Rates for Bandit PAC Multiclass Classification

Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

Abstract: We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(\varepsilon,δ)$-PAC version of the problem, with sample complexity of… ▽ More We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(\varepsilon,δ)$-PAC version of the problem, with sample complexity of $O\big( (\operatorname{poly}(K) + 1 / \varepsilon^2) \log (|H| / δ) \big)$ for any finite hypothesis class $H$. In terms of the leading dependence on $\varepsilon$, this improves upon existing bounds for the problem, that are of the form $O(K/\varepsilon^2)$. We also provide an extension of this result to general classes and establish similar sample complexity bounds in which $\log |H|$ is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is $Θ(K)$. We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only $O(1)$ as $\varepsilon \to 0$. Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over $H$. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.15576 [pdf, other]

Online Changepoint Detection via Dynamic Mode Decomposition

Authors: Victor K. Khamesi, Niall M. Adams, Dean A. Bodenham, Edward A. K. Cohen

Abstract: Detecting changes in data streams is a vital task in many applications. There is increasing interest in changepoint detection in the online setting, to enable real-time monitoring and support prompt responses and informed decision-making. Many approaches assume stationary sequences before encountering an abrupt change in the mean or variance. Notably less attention has focused on the challenging c… ▽ More Detecting changes in data streams is a vital task in many applications. There is increasing interest in changepoint detection in the online setting, to enable real-time monitoring and support prompt responses and informed decision-making. Many approaches assume stationary sequences before encountering an abrupt change in the mean or variance. Notably less attention has focused on the challenging case where the monitored sequences exhibit trend, periodicity and seasonality. Dynamic mode decomposition is a data-driven dimensionality reduction technique that extracts the essential components of a dynamical system. We propose a changepoint detection method that leverages this technique to sequentially model the dynamics of a moving window of data and produce a low-rank reconstruction. A change is identified when there is a significant difference between this reconstruction and the observed data, and we provide theoretical justification for this approach. Extensive simulations demonstrate that our approach has superior detection performance compared to other methods for detecting small changes in mean, variance, periodicity, and second-order structure, among others, in data that exhibits seasonality. Results on real-world datasets also show excellent performance compared to contemporary approaches. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages, 4 figures, and supplementary (57 pages total)

arXiv:2405.13346 [pdf, other]

Convergence of the Deep Galerkin Method for Mean Field Control Problems

Authors: William Hofgard, **gruo Sun, Asaf Cohen

Abstract: We establish the convergence of the deep Galerkin method (DGM), a deep learning-based scheme for solving high-dimensional nonlinear PDEs, for Hamilton-Jacobi-Bellman (HJB) equations that arise from the study of mean field control problems (MFCPs). Based on a recent characterization of the value function of the MFCP as the unique viscosity solution of an HJB equation on the simplex, we establish bo… ▽ More We establish the convergence of the deep Galerkin method (DGM), a deep learning-based scheme for solving high-dimensional nonlinear PDEs, for Hamilton-Jacobi-Bellman (HJB) equations that arise from the study of mean field control problems (MFCPs). Based on a recent characterization of the value function of the MFCP as the unique viscosity solution of an HJB equation on the simplex, we establish both an existence and convergence result for the DGM. First, we show that the loss functional of the DGM can be made arbitrarily small given that the value function of the MFCP possesses sufficient regularity. Then, we show that if the loss functional of the DGM converges to zero, the corresponding neural network approximators must converge uniformly to the true value function on the simplex. We also provide numerical experiments demonstrating the DGM's ability to generalize to high-dimensional HJB equations. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 27 pages, 6 figures

MSC Class: 91A07; 35Q89; 68T07; 49L12; 49N10; 35A35; 60J27

arXiv:2405.12703 [pdf, other]

Constructions of bounded solutions of $div\, {\mathbf u}=f$ in critical spaces

Authors: Albert Cohen, Ronald DeVore, Eitan Tadmor

Abstract: We construct uniformly bounded solutions of the equation $div\, {\mathbf u}=f$ for arbitrary data $f$ in the critical spaces $L^d(Ω)$, where $Ω$ is a domain of ${\mathbb R}^d$. This question was addressed by Bourgain & Brezis, [On the equation ${\rm div}\, Y=f$ and application to control of phases, JAMS 16(2) (2003) 393-426], who proved that although the problem has a uniformly bounded solution, i… ▽ More We construct uniformly bounded solutions of the equation $div\, {\mathbf u}=f$ for arbitrary data $f$ in the critical spaces $L^d(Ω)$, where $Ω$ is a domain of ${\mathbb R}^d$. This question was addressed by Bourgain & Brezis, [On the equation ${\rm div}\, Y=f$ and application to control of phases, JAMS 16(2) (2003) 393-426], who proved that although the problem has a uniformly bounded solution, it is critical in the sense that there exists no linear solution operator for general $L^d$-data. We first discuss the validity of this existence result under weaker conditions than $f\in L^d(Ω)$, and then focus our work on constructive processes for such uniformly bounded solutions. In the $d=2$ case, we present a direct one-step explicit construction, which generalizes for $d>2$ to a $(d-1)$-step construction based on induction. An explicit construction is proposed for compactly supported data in $L^{2,\infty}(Ω)$ in the $d=2$ case. We also present constructive approaches based on optimization of a certain loss functional adapted to the problem. This approach provides a two-step construction in the $d=2$ case. This optimization is used as the building block of a hierarchical multistep process introduced in [E. Tadmor, Hierarchical construction of bounded solutions in critical regularity spaces, CPAM 69(6) (2016) 1087-1109] that converges to a solution in more general situations. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.11244 [pdf, ps, other]

Strided Difference Bound Matrices

Authors: Arjun Pitchanathan, Albert Cohen, Oleksandr Zinenko, Tobias Grosser

Abstract: A wide range of symbolic analysis and optimization problems can be formalized using polyhedra. Sub-classes of polyhedra, also known as sub-polyhedral domains, are sought for their lower space and time complexity. We introduce the Strided Difference Bound Matrix (SDBM) domain, which represents a sweet spot in the context of optimizing compilers. Its expressiveness and efficient algorithms are parti… ▽ More A wide range of symbolic analysis and optimization problems can be formalized using polyhedra. Sub-classes of polyhedra, also known as sub-polyhedral domains, are sought for their lower space and time complexity. We introduce the Strided Difference Bound Matrix (SDBM) domain, which represents a sweet spot in the context of optimizing compilers. Its expressiveness and efficient algorithms are particularly well suited to the construction of machine learning compilers. We present decision algorithms, abstract domain operators and computational complexity proofs for SDBM. We also conduct an empirical study with the MLIR compiler framework to validate the domain's practical applicability. We characterize a sub-class of SDBMs that frequently occurs in practice, and demonstrate even faster algorithms on this sub-class. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: Preprint and extended from the CAV 2024 conference version

arXiv:2405.11109 [pdf, other]

Watermarking Language Models for Many Adaptive Users

Authors: Aloni Cohen, Alexander Hoover, Gabe Schoenbach

Abstract: We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but… ▽ More We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness -- that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text "approximates enough blocks" of model-generated output. △ Less

Submitted 28 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 39 pages

arXiv:2405.10027 [pdf, ps, other]

The Real Price of Bandit Information in Multiclass Classification

Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be… ▽ More We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetildeΘ\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes. △ Less

Submitted 19 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.06773 [pdf, ps, other]

A Monotone Circuit Construction for Individually-Secure Multi-Secret Sharing

Authors: Cailyn Bass, Alejandro Cohen, Rafael G. L. D'Oliveira, Muriel Médard

Abstract: In this work, we introduce a new technique for taking a single-secret sharing scheme with a general access structure and transforming it into an individually secure multi-secret sharing scheme where every secret has the same general access structure. To increase the information rate, we consider Individual Security which guarantees zero mutual information with each secret individually, for any una… ▽ More In this work, we introduce a new technique for taking a single-secret sharing scheme with a general access structure and transforming it into an individually secure multi-secret sharing scheme where every secret has the same general access structure. To increase the information rate, we consider Individual Security which guarantees zero mutual information with each secret individually, for any unauthorized subsets. Our approach involves identifying which shares of the single-secret sharing scheme can be replaced by linear combinations of messages. When $m-1$ shares are replaced, our scheme obtains an information rate of $m/|S|$, where $S$ is the set of shares. This provides an improvement over the information rate of $1/|S|$ in the original single-secret sharing scheme. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05107 [pdf, other]

Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT Systems

Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin D. Kim, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Ken R. Duffy, Muriel Médard

Abstract: The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed… ▽ More The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed for secure encryption in IoT systems. Our study explores an innovative use of AES, by repurposing AES padding bits for error correction and thus introducing a dual-functional method that seamlessly integrates error-correcting capabilities into the standard encryption process. The integration of the state-of-the-art Guessing Random Additive Noise Decoder (GRAND) in the receiver's architecture facilitates the joint decoding and decryption process. This strategic approach not only preserves the existing structure of the transmitter but also significantly enhances communication reliability in noisy environments, achieving a notable over 3 dB gain in Block Error Rate (BLER). Remarkably, this enhanced performance comes with a minimal power overhead at the receiver - less than 15% compared to the traditional decryption-only process, underscoring the efficiency of our hardware design for IoT applications. This paper discusses a comprehensive analysis of our approach, particularly in energy efficiency and system performance, presenting a novel and practical solution for reliable IoT communications. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.01495 [pdf, other]

Error Correction Capabilities of Non-Linear Cryptographic Hash Functions

Authors: Alejandro Cohen, Rafael G. L. D'Oliveira

Abstract: Linear hashes are known to possess error-correcting capabilities. However, in most applications, non-linear hashes with pseudorandom outputs are utilized instead. It has also been established that classical non-systematic random codes, both linear and non-linear, are capacity achieving in the asymptotic regime. Thus, it is reasonable to expect that non-linear hashes might also exhibit good error-c… ▽ More Linear hashes are known to possess error-correcting capabilities. However, in most applications, non-linear hashes with pseudorandom outputs are utilized instead. It has also been established that classical non-systematic random codes, both linear and non-linear, are capacity achieving in the asymptotic regime. Thus, it is reasonable to expect that non-linear hashes might also exhibit good error-correcting capabilities. In this paper, we show this to be the case. Our proof is based on techniques from multiple access channels. As a consequence, we show that Systematic Random Non-Linear Codes (S-RNLC) are capacity achieving in the asymptotic regime. We validate our results by comparing the performance of the Secure Hash Algorithm (SHA) with that of Systematic Random Linear Codes (SRLC) and S-RNLC, demonstrating that SHA performs equally. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.17686 [pdf, other]

On the Benefits of Coding for Network Slicing

Authors: Homa Esfahanizadeh, Vipindev Adat Vasudevan, Benjamin D. Kim, Shruti Siva, Jennifer Kim, Alejandro Cohen, Muriel Médard

Abstract: Network slicing has emerged as an integral concept in 5G, aiming to partition the physical network infrastructure into isolated slices, customized for specific applications. We theoretically formulate the key performance metrics of an application, in terms of goodput and delivery delay, at a cost of network resources in terms of bandwidth. We explore an un-coded communication protocol that uses fe… ▽ More Network slicing has emerged as an integral concept in 5G, aiming to partition the physical network infrastructure into isolated slices, customized for specific applications. We theoretically formulate the key performance metrics of an application, in terms of goodput and delivery delay, at a cost of network resources in terms of bandwidth. We explore an un-coded communication protocol that uses feedback-based repetitions, and a coded protocol, implementing random linear network coding and using coding-aware acknowledgments. We find that coding reduces the resource demands of a slice to meet the requirements for an application, thereby serving more applications efficiently. Coded slices thus free up resources for other slices, be they coded or not. Based on these results, we propose a hybrid approach, wherein coding is introduced selectively in certain network slices. This approach not only facilitates a smoother transition from un-coded systems to coded systems but also reduces costs across all slices. Theoretical findings in this paper are validated and expanded upon through real-time simulations of the network. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.11695 [pdf, other]

Asymptotic Nash Equilibria of Finite-State Ergodic Markovian Mean Field Games

Authors: Asaf Cohen, Ethan Zell

Abstract: Mean field games (MFGs) model equilibria in games with a continuum of weakly interacting players as limiting systems of symmetric $n$-player games. We consider the finite-state, infinite-horizon problem with ergodic cost. Assuming Markovian strategies, we first prove that any solution to the MFG system gives rise to a $(C/\sqrt{n})$-Nash equilibrium in the $n$-player game. We follow this result by… ▽ More Mean field games (MFGs) model equilibria in games with a continuum of weakly interacting players as limiting systems of symmetric $n$-player games. We consider the finite-state, infinite-horizon problem with ergodic cost. Assuming Markovian strategies, we first prove that any solution to the MFG system gives rise to a $(C/\sqrt{n})$-Nash equilibrium in the $n$-player game. We follow this result by proving the same is true for the strategy profile derived from the master equation. We conclude the main theoretical portion of the paper by establishing a large deviation principle for empirical measures associated with the asymptotic Nash equilibria. Then, we contrast the asymptotic Nash equilibria using an example. We solve the MFG system directly and numerically solve the ergodic master equation by adapting the deep Galerkin method of Sirignano and Spiliopoulos. We use these results to derive the strategies of the asymptotic Nash equilibria and compare them. Finally, we derive an explicit form for the rate functions in dimension two. △ Less

Submitted 17 April, 2024; originally announced April 2024.

MSC Class: 49; 60; 90

arXiv:2404.09090 [pdf, other]

DEX Specs: A Mean Field Approach to DeFi Currency Exchanges

Authors: Erhan Bayraktar, Asaf Cohen, April Nellis

Abstract: We investigate the behavior of liquidity providers (LPs) by modeling a decentralized cryptocurrency exchange (DEX) based on Uniswap v3. LPs with heterogeneous characteristics choose optimal liquidity positions subject to uncertainty regarding the size of exogenous incoming transactions and the prices of assets in the wider market. They engage in a game among themselves, and the resulting liquidity… ▽ More We investigate the behavior of liquidity providers (LPs) by modeling a decentralized cryptocurrency exchange (DEX) based on Uniswap v3. LPs with heterogeneous characteristics choose optimal liquidity positions subject to uncertainty regarding the size of exogenous incoming transactions and the prices of assets in the wider market. They engage in a game among themselves, and the resulting liquidity distribution determines the exchange rate dynamics and potential arbitrage opportunities of the pool. We calibrate the distribution of LP characteristics based on Uniswap data and the equilibrium strategy resulting from this mean-field game produces pool exchange rate dynamics and liquidity evolution consistent with observed pool behavior. We subsequently introduce Maximal Extractable Value (MEV) bots who perform Just-In-Time (JIT) liquidity attacks, and develop a Stackelberg game between LPs and bots. This addition results in more accurate simulated pool exchange rate dynamics and stronger predictive power regarding the evolution of the pool liquidity distribution. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07945 [pdf, ps, other]

Existence of Optimal Stationary Singular Controls and Mean Field Game Equilibria

Authors: Asaf Cohen, Chuhao Sun

Abstract: In this paper, we examine the stationary relaxed singular control problem within a multi-dimensional framework for a single agent, as well as its Mean Field Game (MFG) equivalent. We demonstrate that optimal relaxed controls exist for both maximization and minimization cases. These relaxed controls are defined by random measures across the state and control spaces, with the state process described… ▽ More In this paper, we examine the stationary relaxed singular control problem within a multi-dimensional framework for a single agent, as well as its Mean Field Game (MFG) equivalent. We demonstrate that optimal relaxed controls exist for both maximization and minimization cases. These relaxed controls are defined by random measures across the state and control spaces, with the state process described as a solution to the associated martingale problem. By leveraging findings from [Kurtz-Stockbridge 2001], we establish the equivalence between the martingale problem and the stationary forward equation. This allows us to reformulate the relaxed control problem into a linear programming problem within the measure space. We prove the sequential compactness of these measures, thereby confirming the feasibility of achieving an optimal solution. Subsequently, our focus shifts to Mean Field Games. Drawing on insights from the single-agent problem and employing Kakutani--Glicksberg--Fan fixed point theorem, we derive the existence of a mean field game equilibria. △ Less

Submitted 1 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.19628 [pdf, other]

Anyonic statistics and slow quasiparticle dynamics in a graphene fractional quantum Hall interferometer

Authors: Noah L. Samuelson, Liam A. Cohen, Will Wang, Simon Blanch, Takashi Taniguchi, Kenji Watanabe, Michael P. Zaletel, Andrea F. Young

Abstract: Anyons are two dimensional particles with fractional exchange statistics that emerge as elementary excitations of fractional quantum Hall phases. Experimentally, anyonic statistics manifest directly in the edge-state Fabry-Pérot interferometer geometry, where the presence of $N_{qp}$ localized anyons in the interferometer bulk contributes a phase $N_{qp} θ_a$ to the observed interference pattern,… ▽ More Anyons are two dimensional particles with fractional exchange statistics that emerge as elementary excitations of fractional quantum Hall phases. Experimentally, anyonic statistics manifest directly in the edge-state Fabry-Pérot interferometer geometry, where the presence of $N_{qp}$ localized anyons in the interferometer bulk contributes a phase $N_{qp} θ_a$ to the observed interference pattern, where $θ_a$ is twice the statistical exchange phase. Here, we report a measurement of $θ_a$ in a monolayer graphene Fabry-Pérot interferometer at $ν$ = 1/3. We find a preponderance of phase slips with magnitudes $Δθ\approx 2 π/ 3$, confirming the result of past experiments in GaAs quantum wells and consistent with expectations for the tunneling of Abelian anyons into the interferometer bulk. In contrast to prior work, however, single anyon tunneling events manifest as instantaneous and irreversible phase slips, indicative of quasiparticle equilibration times exceeding 20 minutes in some cases. We use the discrepancy between the quasiparticle equilibration rate and our measurement speed to vary the interferometer area and $N_{qp}$ independently, allowing us to precisely determine the interferometer phase and monitor the entry and exit of individual anyons to the interferometer loop in the time domain. Besides providing a replication of previous interferometric measurements sensitive to $θ_a$ in GaAs, our results bring anyon dynamics into the experimental regime and suggest that the average `topological charge' of a mesoscopic quantum Hall device can be held constant over hour long timescales. △ Less

Submitted 28 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Main Text: 9 pages, 4 figures. Supplementary Info: 20 pages, 15 figures

arXiv:2403.18375 [pdf, other]

Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

Authors: Natalie Lang, Alejandro Cohen, Nir Shlezinger

Abstract: Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches… ▽ More Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17011 [pdf, other]

SUDO: a framework for evaluating clinical artificial intelligence systems without ground-truth annotations

Authors: Dani Kiyasseh, Aaron Cohen, Chengsheng Jiang, Nicholas Altieri

Abstract: A clinical artificial intelligence (AI) system is often validated on a held-out set of data which it has not been exposed to before (e.g., data from a different hospital with a distinct electronic health record system). This evaluation process is meant to mimic the deployment of an AI system on data in the wild; those which are currently unseen by the system yet are expected to be encountered in a… ▽ More A clinical artificial intelligence (AI) system is often validated on a held-out set of data which it has not been exposed to before (e.g., data from a different hospital with a distinct electronic health record system). This evaluation process is meant to mimic the deployment of an AI system on data in the wild; those which are currently unseen by the system yet are expected to be encountered in a clinical setting. However, when data in the wild differ from the held-out set of data, a phenomenon referred to as distribution shift, and lack ground-truth annotations, it becomes unclear the extent to which AI-based findings can be trusted on data in the wild. Here, we introduce SUDO, a framework for evaluating AI systems without ground-truth annotations. SUDO assigns temporary labels to data points in the wild and directly uses them to train distinct models, with the highest performing model indicative of the most likely label. Through experiments with AI systems developed for dermatology images, histopathology patches, and clinical reports, we show that SUDO can be a reliable proxy for model performance and thus identify unreliable predictions. We also demonstrate that SUDO informs the selection of models and allows for the previously out-of-reach assessment of algorithmic bias for data in the wild without ground-truth annotations. The ability to triage unreliable predictions for further inspection and assess the algorithmic bias of AI systems can improve the integrity of research findings and contribute to the deployment of ethical AI systems in medicine. △ Less

Submitted 2 January, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04975 [pdf, other]

Deep Backward and Galerkin Methods for the Finite State Master Equation

Authors: Asaf Cohen, Mathieu Laurière, Ethan Zell

Abstract: This paper proposes and analyzes two neural network methods to solve the master equation for finite-state mean field games (MFGs). Solving MFGs provides approximate Nash equilibria for stochastic, differential games with finite but large populations of agents. The master equation is a partial differential equation (PDE) whose solution characterizes MFG equilibria for any possible initial distribut… ▽ More This paper proposes and analyzes two neural network methods to solve the master equation for finite-state mean field games (MFGs). Solving MFGs provides approximate Nash equilibria for stochastic, differential games with finite but large populations of agents. The master equation is a partial differential equation (PDE) whose solution characterizes MFG equilibria for any possible initial distribution. The first method we propose relies on backward induction in a time component while the second method directly tackles the PDE without discretizing time. For both approaches, we prove two types of results: there exist neural networks that make the algorithms' loss functions arbitrarily small, and conversely, if the losses are small, then the neural networks are good approximations of the master equation's solution. We conclude the paper with numerical experiments on benchmark problems from the literature up to dimension 15, and a comparison with solutions computed by a classical method for fixed initial distributions. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03230 [pdf, other]

Large language models surpass human experts in predicting neuroscience results

Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors. △ Less

Submitted 21 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2403.02992 [pdf, other]

Adaptive Integrate-and-Fire Time Encoding Machine with Quantization

Authors: Aseel Omar, Alejandro Cohen

Abstract: An integrate-and-fire time-encoding machine (IF-TEM) is an effective asynchronous sampler that translates amplitude information into non-uniform time sequences. In this work, we propose a novel Adaptive IF-TEM (AIF-TEM) approach. This design dynamically adjusts the TEM's sensitivity to changes in the input signal's amplitude and frequency in real-time. We provide a comprehensive analysis of AIF-TE… ▽ More An integrate-and-fire time-encoding machine (IF-TEM) is an effective asynchronous sampler that translates amplitude information into non-uniform time sequences. In this work, we propose a novel Adaptive IF-TEM (AIF-TEM) approach. This design dynamically adjusts the TEM's sensitivity to changes in the input signal's amplitude and frequency in real-time. We provide a comprehensive analysis of AIF-TEM's oversampling and distortion properties. By the adaptive adjustments, AIF-TEM as we show can achieve significant performance improvements in terms of sampling rate-distortion in a practical finite regime. We demonstrate empirically that in the scenarios tested AIF-TEM outperforms classical IF-TEM and traditional Nyquist (i.e., periodic) sampling methods for band-limited signals. In terms of Mean Square Error (MSE), the reduction reaches at least 12dB (fixing the oversampling rate). Additionally, we investigate the quantization process for AIF-TEM and analyze the quantization MSE bound. Empirical results show that classic quantization for AIF-TEM improves performance by at least 14 dB compared to IF-TEM. We introduce a dynamic quantization technique for AIF-TEM, which further improves performance compared to classic quantization. Empirically, this reduction reaches at least 10 dB compared to classic quantization for AIF-TEM. △ Less

Submitted 3 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17294 [pdf, ps, other]

Advancing Continuous Distribution Generation: An Exponentiated Odds Ratio Generator Approach

Authors: Xinyu Chen, Yuanqi Xie, Achraf Cohen, Shusen Pu

Abstract: This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of distribution models to effectively address the complexities inherent in contemporary datasets. The core of this advancement is illustrated by introducing a particul… ▽ More This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of distribution models to effectively address the complexities inherent in contemporary datasets. The core of this advancement is illustrated by introducing a particular subfamily, the "Type-2 Gumbel Weibull-G Family of Distributions." We provide a comprehensive analysis of the mathematical properties of these distributions, encompassing statistical properties such as density functions, moments, hazard rate and quantile functions, Rényi entropy, order statistics, and the concept of stochastic ordering. To establish the robustness of our approach, we apply five distinct methods for parameter estimation. The practical applicability of the Type-2 Gumbel Weibull-G distributions is further supported through the analysis of three real-world datasets. These empirical applications illustrate the exceptional statistical precision of our distributions compared to existing models, thereby reinforcing their significant value in both theoretical and practical statistical applications. △ Less

Submitted 27 February, 2024; originally announced February 2024.

MSC Class: 62E99; 60E05

arXiv:2402.11119 [pdf, ps, other]

Private PAC Learning May be Harder than Online Learning

Authors: Mark Bun, Aloni Cohen, Rathin Desai

Abstract: We continue the study of the computational complexity of differentially private PAC learning and how it is situated within the foundations of machine learning. A recent line of work uncovered a qualitative equivalence between the private PAC model and Littlestone's mistake-bounded model of online learning, in particular, showing that any concept class of Littlestone dimension $d$ can be privately… ▽ More We continue the study of the computational complexity of differentially private PAC learning and how it is situated within the foundations of machine learning. A recent line of work uncovered a qualitative equivalence between the private PAC model and Littlestone's mistake-bounded model of online learning, in particular, showing that any concept class of Littlestone dimension $d$ can be privately PAC learned using $\mathrm{poly}(d)$ samples. This raises the natural question of whether there might be a generic conversion from online learners to private PAC learners that also preserves computational efficiency. We give a negative answer to this question under reasonable cryptographic assumptions (roughly, those from which it is possible to build indistinguishability obfuscation for all circuits). We exhibit a concept class that admits an online learner running in polynomial time with a polynomial mistake bound, but for which there is no computationally-efficient differentially private PAC learner. Our construction and analysis strengthens and generalizes that of Bun and Zhandry (TCC 2016-A), who established such a separation between private and non-private PAC learner. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10018 [pdf, other]

Multi-Stage Algorithm for Group Testing with Prior Statistics

Authors: Ayelet C. Portnoy, Alejandro Cohen

Abstract: In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than o… ▽ More In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal Maximum A Posteriori (MAP) performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least $25\%$ compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08407 [pdf, other]

Coding-Based Hybrid Post-Quantum Cryptosystem for Non-Uniform Information

Authors: Saar Tarnopolsky, Alejandro Cohen

Abstract: We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC) in the finite blocklength regime that provides Post-Quantum (PQ) security at high communication rates. Recently, hybrid cryptosystems offered PQ security by premixing the data using secure coding schemes and encrypting only a small portion of it, assuming the data is uniformly distributed. An assu… ▽ More We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC) in the finite blocklength regime that provides Post-Quantum (PQ) security at high communication rates. Recently, hybrid cryptosystems offered PQ security by premixing the data using secure coding schemes and encrypting only a small portion of it, assuming the data is uniformly distributed. An assumption that is often challenging to enforce. Standard fixed-length lossless source coding and compression schemes guarantee a uniform output in normalized divergence. Yet, his is not sufficient to guarantee security. We consider an efficient almost uniform compression scheme in non-normalized variational distance for the proposed hybrid cryptosystem, that by utilizing uniform sub-linear shared seed, guarantees PQ security. Specifically, for the proposed PQ cryptosystem, first, we provide an end-to-end coding scheme, NU-HUNCC, for non-uniform messages. Second, we show that NU-HUNCC is information-theoretic individually secured (IS) against an eavesdropper with access to any subset of the links. Third, we introduce a modified security definition, individually semantically secure under a chosen ciphertext attack (ISS-CCA1), and show that against an all-observing eavesdropper, NU-HUNCC satisfies its conditions. Finally, we provide an analysis that shows the high communication rate of NU-HUNCC and the negligibility of the shared seed size. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07229 [pdf, other]

Successive Refinement in Large-Scale Computation: Advancing Model Inference Applications

Authors: Homa Esfahanizadeh, Alejandro Cohen, Shlomo Shamai, Muriel Medard

Abstract: Modern computationally-intensive applications often operate under time constraints, necessitating acceleration methods and distribution of computational workloads across multiple entities. However, the outcome is either achieved within the desired timeline or not, and in the latter case, valuable resources are wasted. In this paper, we introduce solutions for layered-resolution computation. These… ▽ More Modern computationally-intensive applications often operate under time constraints, necessitating acceleration methods and distribution of computational workloads across multiple entities. However, the outcome is either achieved within the desired timeline or not, and in the latter case, valuable resources are wasted. In this paper, we introduce solutions for layered-resolution computation. These solutions allow lower-resolution results to be obtained at an earlier stage than the final result. This innovation notably enhances the deadline-based systems, as if a computational job is terminated due to time constraints, an approximate version of the final result can still be generated. Moreover, in certain operational regimes, a high-resolution result might be unnecessary, because the low-resolution result may already deviate significantly from the decision threshold, for example in AI-based decision-making systems. Therefore, operators can decide whether higher resolution is needed or not based on intermediate results, enabling computations with adaptive resolution. We present our framework for two critical and computationally demanding jobs: distributed matrix multiplication (linear) and model inference in machine learning (nonlinear). Our theoretical and empirical results demonstrate that the execution delay for the first resolution is significantly shorter than that for the final resolution, while maintaining overall complexity comparable to the conventional one-shot approach. Our experiments further illustrate how the layering feature increases the likelihood of meeting deadlines and enables adaptability and transparency in massive, large-scale computations. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 13 pages, partially appeared in proceedings of IEEE Cloudnet 2022, submitted and under review for IEEE Transactions on Signal Processing

arXiv:2402.01568 [pdf, other]

Do** Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar Es-sghir, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1300 additional authors not shown)

Abstract: Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN… ▽ More Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 770 t of total liquid argon mass with 410 t of fiducial mass. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon do** can substantially recover light losses due to contamination of the liquid argon by nitrogen. △ Less

Submitted 9 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 20 figures

Report number: CERN-EP-2024-024; FERMILAB-PUB-23-0819-LBNF

arXiv:2402.00946 [pdf, other]

High order recovery of geometric interfaces from cell-average data

Authors: Albert Cohen, Olga Mula, Agustín Somacal

Abstract: We consider the problem of recovering characteristic functions $u:=χ_Ω$ from cell-average data on a coarse grid, and where $Ω$ is a compact set of $\mathbb{R}^d$. This task arises in very different contexts such as image processing, inverse problems, and the accurate treatment of interfaces in finite volume schemes. While linear recovery methods are known to perform poorly, nonlinear strategies ba… ▽ More We consider the problem of recovering characteristic functions $u:=χ_Ω$ from cell-average data on a coarse grid, and where $Ω$ is a compact set of $\mathbb{R}^d$. This task arises in very different contexts such as image processing, inverse problems, and the accurate treatment of interfaces in finite volume schemes. While linear recovery methods are known to perform poorly, nonlinear strategies based on local reconstructions of the jump interface $Γ:=\partialΩ$ by geometrically simpler interfaces may offer significant improvements. We study two main families of local reconstruction schemes, the first one based on nonlinear least-squares fitting, the second one based on the explicit computation of a polynomial-shaped curve fitting the data, which yields simpler numerical computations and high order geometric fitting. For each of them, we derive a general theoretical framework which allows us to control the recovery error by the error of best approximation up to a fixed multiplicative constant. Numerical tests in 2d illustrate the expected approximation order of these strategies. Several extensions are discussed, in particular the treatment of piecewise smooth interfaces with corners. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.15912 [pdf, other]

An Efficient, High-Rate Scheme for Private Information Retrieval over the Gaussian MAC

Authors: Or Elimelech, Asaf Cohen

Abstract: This paper revisited the problem of Private Information Retrieval (PIR), where there are $N$ replicated non-communicating databases containing the same $M$ messages and a user who wishes to retrieve one of the messages without revealing the wanted message's index to the databases. However, we assume a block-fading additive white Gaussian noise multiple access channel (AWGN MAC) linking the user an… ▽ More This paper revisited the problem of Private Information Retrieval (PIR), where there are $N$ replicated non-communicating databases containing the same $M$ messages and a user who wishes to retrieve one of the messages without revealing the wanted message's index to the databases. However, we assume a block-fading additive white Gaussian noise multiple access channel (AWGN MAC) linking the user and the databases. Previous work \cite{shmuel2021private} presented a joint channel-PIR scheme, utilizing the Compute and Forward protocol, showing the potential of a joint channel-PIR scheme over a separated one. This paper proposes an improved joint scheme tailored for the PIR problem with $N$ databases over a block-fading AWGN. Unlike the C\&F protocol, our scheme offers reduced computational complexity while improving the scaling laws governing the achievable rate. Specifically, the achievable rate scales with the number of databases $N$ and the power $P$ similarly to the channel capacity without the privacy constraint and outperforms the C\&F-based approach. Furthermore, the analysis demonstrates that the improved rate exhibits only a finite gap from the unconstrained channel capacity -- one bit per second per Hz as $N$ increases. △ Less

Submitted 13 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15910 [pdf, ps, other]

Correction to "Private Information Retrieval Over Gaussian MAC"

Authors: Or Elimelech, Ori Shmuel, Asaf Cohen

Abstract: In the above article \cite{shmuel2021private}, the authors introduced a PIR scheme for the Additive White Gaussian Noise (AWGN) Multiple Access Channel (MAC), both with and without fading. The authors utilized the additive nature of the channel and leveraged the linear properties and structure of lattice codes to retrieve the desired message without the servers acquiring any knowledge on the retri… ▽ More In the above article \cite{shmuel2021private}, the authors introduced a PIR scheme for the Additive White Gaussian Noise (AWGN) Multiple Access Channel (MAC), both with and without fading. The authors utilized the additive nature of the channel and leveraged the linear properties and structure of lattice codes to retrieve the desired message without the servers acquiring any knowledge on the retrieved message's index. Theorems 3 and 4 in \cite{shmuel2021private} contain an error arising from the incorrect usage of the modulo operator. Moreover, the proofs assume a one-to-one map** function, $φ(\cdot)$, between a message $W_j\in\mathbb{F}_p^L$ and the elements of $\mathcal{C}$, mistakenly suggesting that the user possesses all the required information in advance. However, this is not the case. Herein, we present the corrected versions of these theorems. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.10433 [pdf, other]

Spontaneous localization at a potential saddle point from edge state reconstruction in a quantum Hall point contact

Authors: Liam A. Cohen, Noah L. Samuelson, Taige Wang, Kai Klocke, Cian C. Reeves, Takashi Taniguchi, Kenji Watanabe, Sagar Vijay, Michael P. Zaletel, Andrea F. Young

Abstract: Quantum point contacts (QPCs) are an essential component in mesoscopic devices. Here, we study the transmission of quantum Hall edge modes through a gate-defined QPC in monolayer graphene. We observe resonant tunneling peaks and a nonlinear conductance pattern characteristic of Coulomb-blockaded localized states. The in-plane electric polarizability reveals the states are localized at a classicall… ▽ More Quantum point contacts (QPCs) are an essential component in mesoscopic devices. Here, we study the transmission of quantum Hall edge modes through a gate-defined QPC in monolayer graphene. We observe resonant tunneling peaks and a nonlinear conductance pattern characteristic of Coulomb-blockaded localized states. The in-plane electric polarizability reveals the states are localized at a classically-unstable electrostatic saddle point. We explain this unexpected finding within a self-consistent Thomas-Fermi model, finding that localization of a zero-dimensional state at the saddle point is favored whenever the applied confinement potential is sufficiently soft compared to the Coulomb energy. Our results provide a direct demonstration of Coulomb-driven reconstruction at the boundary of a quantum Hall system. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:2204.10296

arXiv:2401.07978 [pdf, other]

Noncollinear electric dipoles in a polar, chiral phase of CsSnBr$_3$ perovskite

Authors: Douglas H. Fabini, Kedar Honasoge, Adi Cohen, Sebastian Bette, Kyle M. McCall, Constantinos C. Stoumpos, Steffen Klenner, Mirjam Zipkat, Le Phuong Hoang, Jürgen Nuss, Reinhard K. Kremer, Mercouri G. Kanatzidis, Omer Yaffe, Stefan Kaiser, Bettina V. Lotsch

Abstract: Polar and chiral crystal symmetries confer a variety of potentially useful functionalities upon solids by coupling otherwise noninteracting mechanical, electronic, optical, and magnetic degrees of freedom. We describe two unstudied phases of the 3D perovskite, CsSnBr$_3$, which emerge below 85 K due to the formation of Sn(II) lone pairs and their interaction with extant octahedral tilts. Phase II… ▽ More Polar and chiral crystal symmetries confer a variety of potentially useful functionalities upon solids by coupling otherwise noninteracting mechanical, electronic, optical, and magnetic degrees of freedom. We describe two unstudied phases of the 3D perovskite, CsSnBr$_3$, which emerge below 85 K due to the formation of Sn(II) lone pairs and their interaction with extant octahedral tilts. Phase II (77 K<$T$<85 K, space group $P2_1/m$) exhibits ferroaxial order driven by a noncollinear pattern of lone pair-driven distortions within the plane normal to the unique octahedral tilt axis, preserving the inversion symmetry observed at higher temperatures. Phase I ($T$<77 K, space group $P2_1$) additionally exhibits ferroelectric order due to distortions along the unique tilt axis, breaking both inversion and mirror symmetries. This polar and chiral phase exhibits second harmonic generation from the bulk and a large, intrinsic polarization$-$electrostriction coefficient along the polar axis ($Q_{22}\approx$1.1 m$^4$ C$^{-2}$), resulting in acute negative thermal expansion ($α_V=-9\times10^{-5}$ K$^{-1}$) through the onset of spontaneous polarization. The unprecedented structures of phases I and II were predicted by recursively following harmonic phonon instabilities to generate a tree of candidate structures and subsequently corroborated by synchrotron X-ray powder diffraction and polarized Raman and $^{81}$Br nuclear quadrupole resonance spectroscopies. Relativistic electronic structure scenarios compatible with reported photoluminescence measurements are discussed. Together, the polar symmetry, small bandgap, large spin-orbit splitting of Sn 5$p$ orbitals, and predicted strain sensitivity of the symmetry-breaking distortions suggest bulk samples and epitaxial films of CsSnBr$_3$ or its neighboring solid solutions as strong candidates for bulk Rashba effects. △ Less

Submitted 25 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.06149 [pdf, other]

Image Classifier Based Generative Method for Planar Antenna Design

Authors: Yang Zhong, Wei** Dou, Andrew Cohen, Dia'a Bisharat, Yuandong Tian, Jiang Zhu, Qing Huo Liu

Abstract: To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in se… ▽ More To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in selecting among dimension candidates. A novel image-based classifier using a convolutional neural network (CNN) is introduced to further determine the positions of these fixed-dimension components. Two examples from wearable products have been chosen to examine the entire workflow. Their final designs are realistic and their performance metrics are not inferior to the ones designed by experienced engineers. △ Less

Submitted 16 December, 2023; originally announced January 2024.

Comments: 13 pages, 18 figures

arXiv:2401.02501 [pdf, other]

The cell signaling structure function

Authors: Layton Aho, Mark Winter, Marc DeCarlo, Agne Frismantiene, Yannick Blum, Paolo Armando Gagliardi, Olivier Pertz, Andrew R. Cohen

Abstract: Live cell microscopy captures 5-D $(x,y,z,channel,time)$ movies that display patterns of cellular motion and signaling dynamics. We present here an approach to finding spatiotemporal patterns of cell signaling dynamics in 5-D live cell microscopy movies unique in requiring no a priori knowledge of expected pattern dynamics, and no training data. The proposed cell signaling structure function (SSF)… ▽ More Live cell microscopy captures 5-D $(x,y,z,channel,time)$ movies that display patterns of cellular motion and signaling dynamics. We present here an approach to finding spatiotemporal patterns of cell signaling dynamics in 5-D live cell microscopy movies unique in requiring no a priori knowledge of expected pattern dynamics, and no training data. The proposed cell signaling structure function (SSF) is a Kolmogorov structure function that optimally measures cell signaling state as nuclear intensity w.r.t. surrounding cytoplasm, a significant improvement compared to the current state-of-the-art cytonuclear ratio. SSF kymographs store at each spatiotemporal cell centroid the SSF value, or a functional output such as velocity. Patterns of similarity are identified via the metric normalized compression distance (NCD). The NCD is a reproducing kernel for a Hilbert space that represents the input SSF kymographs as points in a low dimensional embedding that optimally captures the pattern similarity identified by the NCD throughout the space. The only parameter is the expected cell radii ($μm$). A new formulation of the cluster structure function optimally estimates how meaningful an embedding from the RKHS representation. Results are presented quantifying the impact of ERK and AKT signaling between different oncogenic mutations, and by the relation between ERK signaling and cellular velocity patterns for movies of 2-D monolayers of human breast epithelial (MCF10A) cells, 3-D MCF10A spheroids under optogenetic manipulation of ERK, and human induced pluripotent stem cells . △ Less

Submitted 11 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.16179 [pdf, other]

European Football Player Valuation: Integrating Financial Models and Network Theory

Authors: Albert Cohen, Jimmy Risk

Abstract: This paper presents a new framework for player valuation in European football by fusing principles from financial mathematics and network theory. The valuation model leverages a "passing matrix" to encapsulate player interactions on the field, utilizing centrality measures to quantify individual influence. Unlike traditional approaches, this model is both metric-driven and cohort-free, providing a… ▽ More This paper presents a new framework for player valuation in European football by fusing principles from financial mathematics and network theory. The valuation model leverages a "passing matrix" to encapsulate player interactions on the field, utilizing centrality measures to quantify individual influence. Unlike traditional approaches, this model is both metric-driven and cohort-free, providing a dynamic and individualized framework for ascertaining a player's fair market value. The methodology is empirically validated through a case study in European football, employing real-world match and financial data. The paper advances the disciplines of sports analytics and financial mathematics by offering a cross-disciplinary mechanism for player valuation, and also links together two well-known econometric methods in marginal revenue product and expected present valuation. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 15 pages, 4 figures, 2 tables

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.03130 [pdf, other]

The DUNE Far Detector Vertical Drift Technology, Technical Design Report

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1304 additional authors not shown)

Abstract: DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precisi… ▽ More DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precision measurements of the PMNS matrix parameters, including the CP-violating phase. It will also stand ready to observe supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model. The DUNE far detector implements liquid argon time-projection chamber (LArTPC) technology, and combines the many tens-of-kiloton fiducial mass necessary for rare event searches with the sub-centimeter spatial resolution required to image those events with high precision. The addition of a photon detection system enhances physics capabilities for all DUNE physics drivers and opens prospects for further physics explorations. Given its size, the far detector will be implemented as a set of modules, with LArTPC designs that differ from one another as newer technologies arise. In the vertical drift LArTPC design, a horizontal cathode bisects the detector, creating two stacked drift volumes in which ionization charges drift towards anodes at either the top or bottom. The anodes are composed of perforated PCB layers with conductive strips, enabling reconstruction in 3D. Light-trap-style photon detection modules are placed both on the cryostat's side walls and on the central cathode where they are optically powered. This Technical Design Report describes in detail the technical implementations of each subsystem of this LArTPC that, together with the other far detector modules and the near detector, will enable DUNE to achieve its physics goals. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 425 pages; 281 figures Central editing team: A. Heavey, S. Kettell, A. Marchionni, S. Palestini, S. Rajogopalan, R. J. Wilson

Report number: Fermilab Report no: TM-2813-LBNF

arXiv:2311.16977 [pdf, ps, other]

Bidirectional Reactive Programming for Machine Learning

Authors: Dumitru Potop Butucaru, Albert Cohen, Gordon Plotkin, Hugo Pompougnac

Abstract: Reactive languages are dedicated to the programming of systems which interact continuously and concurrently with their environment. Values take the form of unbounded streams modeling the (discrete) passing of time or the sequence of concurrent interactions. While conventional reactivity models recurrences forward in time, we introduce a symmetric reactive construct enabling backward recurrences. C… ▽ More Reactive languages are dedicated to the programming of systems which interact continuously and concurrently with their environment. Values take the form of unbounded streams modeling the (discrete) passing of time or the sequence of concurrent interactions. While conventional reactivity models recurrences forward in time, we introduce a symmetric reactive construct enabling backward recurrences. Constraints on the latter allow to make the implementation practical. Machine Learning (ML) systems provide numerous motivations for all of this: we demonstrate that reverse-mode automatic differentiation, backpropagation, batch normalization, bidirectional recurrent neural networks, training and reinforcement learning algorithms, are all naturally captured as bidirectional reactive programs. △ Less

Submitted 28 November, 2023; originally announced November 2023.

ACM Class: D.3; D.3.1; I.2; I.2.5

arXiv:2311.13877 [pdf, other]

Locally Optimal Descent for Dynamic Stepsize Scheduling

Authors: Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

Abstract: We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method wit… ▽ More We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.12239 [pdf, other]

Quantum-inspired nonlinear Galerkin ansatz for high-dimensional HJB equations

Authors: Chuhao Sun, Asaf Cohen, James Stokes, Shravan Veerapaneni

Abstract: Neural networks are increasingly recognized as a powerful numerical solution technique for partial differential equations (PDEs) arising in diverse scientific computing domains, including quantum many-body physics. In the context of time-dependent PDEs, the dominant paradigm involves casting the approximate solution in terms of stochastic minimization of an objective function given by the norm of… ▽ More Neural networks are increasingly recognized as a powerful numerical solution technique for partial differential equations (PDEs) arising in diverse scientific computing domains, including quantum many-body physics. In the context of time-dependent PDEs, the dominant paradigm involves casting the approximate solution in terms of stochastic minimization of an objective function given by the norm of the PDE residual, viewed as a function of the neural network parameters. Recently, advancements have been made in the direction of an alternative approach which shares aspects of nonlinearly parametrized Galerkin methods and variational quantum Monte Carlo, especially for high-dimensional, time-dependent PDEs that extend beyond the usual scope of quantum physics. This paper is inspired by the potential of solving Hamilton-Jacobi-Bellman (HJB) PDEs using Neural Galerkin methods and commences the exploration of nonlinearly parametrized trial functions for which the evolution equations are analytically tractable. As a precursor to the Neural Galerkin scheme, we present trial functions with evolution equations that admit closed-form solutions, focusing on time-dependent HJB equations relevant to finance. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.10800 [pdf, other]

doi 10.1145/3640537.3641580

The Next 700 ML-Enabled Compiler Optimizations

Authors: S. VenkataKeerthy, Siddharth Jain, Umesh Kalvakuntla, Pranav Sai Gorantla, Rajiv Shailesh Chitale, Eugene Brevdo, Albert Cohen, Mircea Trofin, Ramakrishna Upadrasta

Abstract: There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Comp… ▽ More There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.09443 [pdf, other]

Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset

Authors: Brooklyn Sheppard, Anna Richter, Allison Cohen, Elizabeth Allyn Smith, Tamara Kneese, Carolyne Pelletier, Ioana Baldini, Yue Dong

Abstract: Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The dataset can be used for a range of… ▽ More Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, and provide baselines using common NLP algorithms in the context of misogyny detection and mitigation. We hope this work will promote AI for social good in NLP for bias detection, explanation, and removal. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures

arXiv:2311.07961 [pdf, other]

The ART of LLM Refinement: Ask, Refine, and Trust

Authors: Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, ** Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz

Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often st… ▽ More In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often struggle to accurately identify errors when reasoning is involved. To address this, we propose a reasoning with refinement objective called ART: Ask, Refine, and Trust, which asks necessary questions to decide when an LLM should refine its output, and either affirm or withhold trust in its refinement by ranking the refinement and the initial prediction. On two multistep reasoning tasks of mathematical word problems (GSM8K) and question answering (StrategyQA), ART achieves a performance gain of +5 points over self-refinement baselines, while using a much smaller model as the decision maker. We also demonstrate the benefit of using smaller models to make refinement decisions as a cost-effective alternative to fine-tuning a larger model. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.14282 [pdf, other]

NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval

Authors: Uri Katz, Matan Vetzler, Amir DN Cohen, Yoav Goldberg

Abstract: Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated model… ▽ More Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. Should NER be considered a solved problem? We argue to the contrary: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained -- and intersectional -- entity types. The second is a move towards zero-shot recognition and extraction of these fine-grained types based on entity-type labels. The third, and most challenging, is the move from the recognition setup to a novel retrieval setup, where the query is a zero-shot entity type, and the expected result is all the sentences from a large, pre-indexed corpus that contain entities of these types, and their corresponding spans. We show that all of these are far from being solved. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types, to facilitate research towards all of these three goals. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023

arXiv:2310.13112 [pdf]

Asteroid 2023 NT1: A Cautionary Tale

Authors: Brin K. Bailey, Alexander N. Cohen, Dharv Patel, Philip Lubin, Mark Boslough, Darrel Robertson, Sasha Egan, Jeeya Khetia, Teagan Costa, Elizabeth Silber, Irina Sagert, Oleg Korobkin, Glenn Sjoden

Abstract: We investigate a variety of short warning time, terminal mitigation scenarios via fragmentation for a hypothetical impact of asteroid 2023 NT1, a Near-Earth Object (NEO) that was discovered on July 15, 2023, two days after its closest approach to Earth on July 13. The asteroid passed by Earth within ~0.25 lunar distances with a closest approach distance of ~10$^{5}$ km and speed of 11.27 km/s. Its… ▽ More We investigate a variety of short warning time, terminal mitigation scenarios via fragmentation for a hypothetical impact of asteroid 2023 NT1, a Near-Earth Object (NEO) that was discovered on July 15, 2023, two days after its closest approach to Earth on July 13. The asteroid passed by Earth within ~0.25 lunar distances with a closest approach distance of ~10$^{5}$ km and speed of 11.27 km/s. Its size remains largely uncertain, with an estimated diameter range of 26 - 58 m and probable diameter estimate (weighted by the NEO size frequency distribution) of 34 m (JPL Sentry, September 12, 2023). The asteroid approached Earth from the direction of the Sun, as did both the Chelyabinsk asteroid in 2013 and comet NEOWISE in 2021. As a result, 2023 NT1 remained undetected until after its closest approach. If it had been on a collision course, it would have had an impact energy of ~1.5 Mt (assuming a spherical asteroid with the probable diameter estimate of 34 m, 2.6 g/cm$^{3}$ uniform density, and impact speed of 15.59 km/s). 2023 NT1 represents a threat that could have caused significant local damage (~3x Chelyabinsk airburst energy). We utilize the PI ("Pulverize It") method for planetary defense to model potential mitigation scenarios of an object like 2023 NT1 through simulations of hypervelocity asteroid disruption and atmospheric ground effects for the case of a terminal defense mode. Simulations suggest that PI is an effective multimodal approach for planetary defense that can operate in extremely short interdiction modes (with intercepts as short as hours prior to impact), in addition to long interdiction time scales with months to years of warning. Our simulations support the proposition that threats like 2023 NT1 can be effectively mitigated with intercepts of one day (or less) prior to impact, yielding minimal to no ground damage, using modest resources and existing technologies. △ Less

Submitted 7 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.08796 [pdf, ps, other]

End-to-end Story Plot Generator

Authors: Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian

Abstract: Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LL… ▽ More Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{E2EPlot}$ generates story plots of comparable quality to $\texttt{OpenPlot}$, and is > 10$\times$ faster (1k tokens in only 30 seconds on average). Finally, we obtain $\texttt{RLPlot}$ that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0$\%$ winning rate against $\texttt{E2EPlot}$ along the aspect of suspense and surprise. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 17 pages

arXiv:2310.03408 [pdf, other]

Disentangling the Effects of Structure and Lone-Pair Electrons in the Lattice Dynamics of Halide Perovskites

Authors: Sebastián Caicedo-Dávila, Adi Cohen, Silvia G. Motti, Masahiko Isobe, Kyle M. McCall, Manuel Grumet, Maksym V. Kovalenko, Omer Yaffe, Laura M. Herz, Douglas H. Fabini, David A. Egger

Abstract: Metal halide perovskites have shown great performance as solar energy materials, but their outstanding optoelectronic properties are paired with unusually strong anharmonic effects. It has been proposed that this intriguing combination of properties derives from the "lone pair" 6$s^2$ electron configuration of the Pb$^{2+}$ cations, and associated weak pseudo-Jahn-Teller effect, but the precise im… ▽ More Metal halide perovskites have shown great performance as solar energy materials, but their outstanding optoelectronic properties are paired with unusually strong anharmonic effects. It has been proposed that this intriguing combination of properties derives from the "lone pair" 6$s^2$ electron configuration of the Pb$^{2+}$ cations, and associated weak pseudo-Jahn-Teller effect, but the precise impact of this chemical feature remains unclear. Here we show that in fact an $ns^2$ electron configuration is not a prerequisite for the strong anharmonicity and low-energy lattice dynamics encountered in this class of materials. We combine X-ray diffraction, infrared and Raman spectroscopies, and first-principles molecular dynamics calculations to directly contrast the lattice dynamics of CsSrBr$_3$ with those of CsPbBr$_3$, two compounds which bear close structural similarity but with the former lacking the propensity to form lone pairs on the 5$s^0$ octahedral cation. We exploit low-frequency diffusive Raman scattering, nominally symmetry-forbidden in the cubic phase, as a fingerprint to detect anharmonicity and reveal that low-frequency tilting occurs irrespective of octahedral cation electron configuration. This work highlights the key role of structure in perovskite lattice dynamics, providing important design rules for the emerging class of soft perovskite semiconductors for optoelectronic and light-harvesting devices. △ Less

Submitted 29 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.03304 [pdf, other]

Learning Personalized Alignment for Evaluating Open-ended Text Generation

Authors: Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian

Abstract: With rapid progress made in language qualities such as fluency and consistency via large language models (LLMs), there has been increasing interest in assessing alignment with diverse human preferences. Traditional metrics heavily rely on lexical similarity with human-written references and have been observed to suffer from a poor correlation with human evaluation. Furthermore, they ignore the div… ▽ More With rapid progress made in language qualities such as fluency and consistency via large language models (LLMs), there has been increasing interest in assessing alignment with diverse human preferences. Traditional metrics heavily rely on lexical similarity with human-written references and have been observed to suffer from a poor correlation with human evaluation. Furthermore, they ignore the diverse preferences of humans, a key aspect in evaluating open-ended tasks like story generation. Inspired by these challenges, we introduce an interpretable open-ended evaluation framework PerSE to assess the alignment with a specific human preference. It is tuned to deduce the specific preference from a given personal profile and evaluate the alignment between the generation and the personal preference. PerSE also explains its assessment by a detailed comment or several fine-grained scores. This enhances its interpretability, making it more suitable to tailor a personalized generation. Our 13B LLaMA-2-based PerSE shows a 15.8% increase in Kendall correlation and a 13.7% rise in accuracy on zero-shot reviewers compared to GPT-4. It also outperforms GPT-4 by 46.01% in the Kendall correlation on new domains, indicating its transferability. △ Less

Submitted 19 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 19 pages

arXiv:2309.15028 [pdf, other]

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding

Authors: Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Ye** Choi, Hannaneh Hajishirzi, Asli Celikyilmaz

Abstract: Inference-time search algorithms such as Monte-Carlo Tree Search (MCTS) may seem unnecessary when generating natural language text based on state-of-the-art reinforcement learning such as Proximal Policy Optimization (PPO). In this paper, we demonstrate that it is possible to get extra mileage out of PPO by integrating MCTS on top. The key idea is not to throw out the value network, a byproduct of… ▽ More Inference-time search algorithms such as Monte-Carlo Tree Search (MCTS) may seem unnecessary when generating natural language text based on state-of-the-art reinforcement learning such as Proximal Policy Optimization (PPO). In this paper, we demonstrate that it is possible to get extra mileage out of PPO by integrating MCTS on top. The key idea is not to throw out the value network, a byproduct of PPO training for evaluating partial output sequences, when decoding text out of the policy network. More concretely, we present a novel value-guided decoding algorithm called PPO-MCTS, which can integrate the value network from PPO to work closely with the policy network during inference-time generation. Compared to prior approaches based on MCTS for controlled text generation, the key strength of our approach is to reduce the fundamental mismatch of the scoring mechanisms of the partial outputs between training and test. Evaluation on four text generation tasks demonstrate that PPO-MCTS greatly improves the preferability of generated text compared to the standard practice of using only the PPO policy. Our results demonstrate the promise of search algorithms even on top of the aligned language models from PPO, and the under-explored benefit of the value network. △ Less

Submitted 2 April, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Showing 1–50 of 522 results for author: Cohen, A