Search | arXiv e-print repository

Generalized Chern-Pontryagin models

Authors: J. R. Nascimento, A. Yu. Petrov, P. J. Porfírio, Ramires N. da Silva

Abstract: We formulate a new class of modified gravity models, that is, generalized four-dimensional Chern-Pontryagin models, whose action is characterized by an arbitrary function of the Ricci scalar $R$ and the Chern-Pontryagin topological term ${}^*RR$, i.e., $f(R, {}^*RR)$. Within this framework, we derive the gravitational field equations and solve them for a particular model,… ▽ More We formulate a new class of modified gravity models, that is, generalized four-dimensional Chern-Pontryagin models, whose action is characterized by an arbitrary function of the Ricci scalar $R$ and the Chern-Pontryagin topological term ${}^*RR$, i.e., $f(R, {}^*RR)$. Within this framework, we derive the gravitational field equations and solve them for a particular model, $f(R, {}^*RR)=R+β({}^*RR)^2$, considering two ansatzes: the slowly rotating metric and first-order perturbations of Gödel-type metrics. For the former, we find a first-order correction to the frame-dragging effect boosted by the parameter $L$, which characterizes the departures from general relativity results. For the latter, Gödel-type metrics hold unperturbed. We conclude this paper by displaying that generalized four-dimensional Chern-Pontryagin models admit a scalar-tensor representation, whose explicit form presents two scalar fields: $Φ$, a dynamical degree of freedom, while the second, $\vartheta$, a non-dynamical degree of freedom. In particular, the scalar field $\vartheta$ emerges coupled with the Chern-Pontryagin topological term ${}^*RR$, i.e., $\vartheta {}^*RR$, which is nothing more than Chern-Simons term. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 17 pages

arXiv:2406.12987 [pdf, ps, other]

Derivative four-fermion model, effective action and bumblebee generation

Authors: R. Araujo, T. Mariz, J. R. Nascimento, A. Yu. Petrov

Abstract: In this paper, we study the one-loop effective potential of a derivative four-fermion model. As a result, an exact bumblebee-like potential is radiatively generated. Afterwards, we generalize our study for a finite temperature case and explicitly demonstrate the possibility of phase transitions allowing for the restoration of the Lorentz symmetry. We also investigate the low-energy effective actio… ▽ More In this paper, we study the one-loop effective potential of a derivative four-fermion model. As a result, an exact bumblebee-like potential is radiatively generated. Afterwards, we generalize our study for a finite temperature case and explicitly demonstrate the possibility of phase transitions allowing for the restoration of the Lorentz symmetry. We also investigate the low-energy effective action, from which we obtain the usual kinetic term and the corresponding bumblebee potential. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.12015 [pdf, other]

Effects of non-commutative geometry on black hole properties

Authors: A. A. Araújo Filho, J. R. Nascimento, A. Yu. Petrov, P. J. Porfírio, Ali Övgün

Abstract: In this study, we investigate the signatures of a non-commutative black hole solution. Initially, we calculate the thermodynamic properties of the system, including entropy, heat capacity, and Hawking radiation. For the latter quantity, we employ two distinct methods: surface gravity and the topological approach. Additionally, we examine the emission rate and remnant mass within this context. Rema… ▽ More In this study, we investigate the signatures of a non-commutative black hole solution. Initially, we calculate the thermodynamic properties of the system, including entropy, heat capacity, and Hawking radiation. For the latter quantity, we employ two distinct methods: surface gravity and the topological approach. Additionally, we examine the emission rate and remnant mass within this context. Remarkably, the lifetime of the black hole, after reaching its final state due to the evaporation process, is expressed analytically up to a grey-body factor. We estimate the lifetime for specific initial and final mass configurations. Also, we analyze the tensorial quasinormal modes using the 6th-order WKB method. Finally, we study the deflection angle, i.e., gravitational lensing, in both the weak and strong deflection limits. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 26 pages, 13 figures and 4 tables

arXiv:2406.10288 [pdf, other]

Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models

Authors: Francisco Eiras, Aleksandar Petrov, Phillip H. S. Torr, M. Pawan Kumar, Adel Bibi

Abstract: Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined ta… ▽ More Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined tasks remains distinct from the instruction-following context due to structural differences in the data. Our work addresses the gap in our understanding of these risks across diverse types of data in closed models - where providers control how user data is utilized in the fine-tuning process. We demonstrate how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is more effective than existing baselines at re-establishing safety alignment while maintaining similar task performance. △ Less

Submitted 1 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08309 [pdf, other]

Dynamical Lorentz Symmetry Breaking in a Scale-free Theory of Gravity

Authors: A. C. Lehum, J. R. Nascimento, A. Yu. Petrov, P. J. Porfírio

Abstract: This paper explores the renormalization of scale-free quadratic gravity coupled to the bumblebee field and its potential for dynamically breaking Lorentz symmetry. We conduct one-loop renormalization of the model and calculate the associated renormalization group functions. Additionally, we compute the one-loop effective potential for the bumblebee field, revealing that it acquires a non-trivial v… ▽ More This paper explores the renormalization of scale-free quadratic gravity coupled to the bumblebee field and its potential for dynamically breaking Lorentz symmetry. We conduct one-loop renormalization of the model and calculate the associated renormalization group functions. Additionally, we compute the one-loop effective potential for the bumblebee field, revealing that it acquires a non-trivial vacuum expectation value induced by radiative corrections - a phenomenon known as the Coleman-Weinberg mechanism. This spontaneous breaking of scale invariance arises from the non-vanishing vacuum expectation value of the bumblebee field, implicating Lorentz symmetry violation. Consequently, the non-minimal coupling between the bumblebee and gravitational fields results in a spontaneous generation of the Einstein-Hilbert term due to radiative corrections, thereby linking the Planck scale to Lorentz violation phenomena. △ Less

Submitted 17 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, references added

arXiv:2406.01442 [pdf, other]

Rotational and Near-IR Spectra of PbF: Characterization of the Coupled $X_1\,^2Π_{1/2}$ and $X_2\,^2Π_{3/2}$ States

Authors: Sean Jackson, Luke Kim, Andreas Biekert, Alex Nguyen, Richard J Mawhorter, Trevor J. Sears, Leonid V. Skripnikov, Vera V. Baturo, Alexander N. Petrov, Jens-Uwe Grabow

Abstract: Observations of the rotational spectrum of lead monofluoride, PbF, have been extended up to transitions in the \textit{v} = 7 level for $^{208}$PbF in the lowest $X_1\,^2Π_{1/2}$ state of the radical and \textit{v} = 5 for the $^{207}$Pb and $^{206}$Pb isotopologues. The data also include a few measurements for $^{204}$PbF in \textit{v} = 0. These new measurements have been combined with existing… ▽ More Observations of the rotational spectrum of lead monofluoride, PbF, have been extended up to transitions in the \textit{v} = 7 level for $^{208}$PbF in the lowest $X_1\,^2Π_{1/2}$ state of the radical and \textit{v} = 5 for the $^{207}$Pb and $^{206}$Pb isotopologues. The data also include a few measurements for $^{204}$PbF in \textit{v} = 0. These new measurements have been combined with existing near-IR measurements of the $X_2 - X_1$ fine-structure transition and a simultaneous multi-isotope fit of the data to an effective isotope-independent ro-vibronic Hamiltonian has been carried out. The resulting parameters fully characterize the vibrational, rotational and hyperfine structure of the combined $X_1 \, / \, X_2$ state of the radical. A pair of opposite parity levels with total angular momentum quantum number, $F=1/2$, in the lowest rotational level, $J=1/2$ of \PbF \,are close in energy and their spacing decreases with vibrational excitation. The experimental results show the spacing decreases to less than 20 MHz at $v=7$ and 8. The experimental work is complemented by new \textit{ab initio} calculations which support the results and allow predictions outside the experimental data range. The calculated radiative lifetimes of the relevant vibrationally excited states are of the order of 50 ms. This work was motivated by interest in using \PbF\, as a vehicle for future probes of the standard model of physics such as placing limits on the electron's electric dipole moment (\eEDM), molecular charge-parity non-conservation and Born-Oppenheimer breakdown effects for example. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 36 pages, 4 figures, 5 page Appendix

arXiv:2406.01424 [pdf, other]

Universal In-Context Approximation By Prompting Fully Recurrent Models

Authors: Aleksandar Petrov, Tom A. Lamb, Alasdair Paren, Philip H. S. Torr, Adel Bibi

Abstract: Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for develo** generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these r… ▽ More Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for develo** generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve as universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20855 [pdf, ps, other]

Modified Euler-Heisenberg effective action and Proper-Time Method in Lorentz-Violating Scalar QED

Authors: L. C. T. Brito, J. C. C. Felipe, A. C. Lehum, J. R. Nascimento, A. Yu. Petrov

Abstract: Quantum photon effects in vacuum provide an interesting setting to test quantum electrodynamics, serving as a source for predictions about physics beyond the Standard Model. In this paper, we investigate these effects by calculating the one-loop Euler-Heisenberg-like effective action within a Lorentz-violating scalar quantum electrodynamics framework. In both CPT-even and CPT-odd scenarios, we obt… ▽ More Quantum photon effects in vacuum provide an interesting setting to test quantum electrodynamics, serving as a source for predictions about physics beyond the Standard Model. In this paper, we investigate these effects by calculating the one-loop Euler-Heisenberg-like effective action within a Lorentz-violating scalar quantum electrodynamics framework. In both CPT-even and CPT-odd scenarios, we obtain the exact result in all orders of the stress tensor $F_{μν}$ and evaluate explicitly the lower orders of this effective action. We identify the quantum effects coming from Lorentz violation in an explicitly gauge invariant way. Nonlinear Lorentz-violating contributions that may affect photon-photon scattering are explicitly evaluated. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 9 pages

arXiv:2405.11369 [pdf, ps, other]

Weak solutions for a singular beam equation

Authors: Olena Atlasiuk, Arnaud Heibig, Adrien Petrov

Abstract: This paper deals with a dynamic Gao beam of infinite length subjected to a moving concentrated Dirac mass. Under appropriate regularity assumptions on the initial data, the problem possesses a weak solution which is obtained as the limit of a sequence of solutions of regularized problems. This paper deals with a dynamic Gao beam of infinite length subjected to a moving concentrated Dirac mass. Under appropriate regularity assumptions on the initial data, the problem possesses a weak solution which is obtained as the limit of a sequence of solutions of regularized problems. △ Less

Submitted 18 May, 2024; originally announced May 2024.

MSC Class: 35B45; 35Q74; 74H20; 74K10; 74M15. 35B45; 35Q74; 74H20; 74K10; 74M15. 35B45; 35Q74; 74H20; 74K10; 74M15

arXiv:2405.08597 [pdf, other]

Risks and Opportunities of Open-Source Generative AI

Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster

Abstract: Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This reg… ▽ More Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: Extension of arXiv:2404.17047

arXiv:2405.05653 [pdf, other]

On supposed oscillations of differential cross sections in pp-scattering at sqrt{s} = 13 TeV

Authors: Vladimir A. Petrov, Nikolai P. Tkachenko

Abstract: The question of possible existence of oscillations in the region of the diffraction peak in pp-scattering is considered in detail at sqrt{s}=13 TeV. It is shown that within the framework of the available experimental data published by the TOTEM and ALFA/ATLAS collaborations, raising the question of searching for such a subtle effect looks premature. The question of possible existence of oscillations in the region of the diffraction peak in pp-scattering is considered in detail at sqrt{s}=13 TeV. It is shown that within the framework of the available experimental data published by the TOTEM and ALFA/ATLAS collaborations, raising the question of searching for such a subtle effect looks premature. △ Less

Submitted 19 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures

arXiv:2404.17047 [pdf, other]

Near to Mid-term Risks and Opportunities of Open-Source Generative AI

Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

Abstract: In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i… ▽ More In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact. △ Less

Submitted 24 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted to ICML'24 as a position paper

arXiv:2404.09642 [pdf, other]

Indirect constraints on third generation baryon number violation

Authors: Martin Beneke, Gael Finauri, Alexey A. Petrov

Abstract: The non-observation of baryon number violation suggests that the scale of baryon-number violating interactions at zero temperature is comparable to the GUT scale. However, the pertinent measurements involve hadrons made of the first-generation quarks, such as protons and neutrons. One may therefore entertain the idea that new flavour physics breaks baryon number at a much lower scale, but only in… ▽ More The non-observation of baryon number violation suggests that the scale of baryon-number violating interactions at zero temperature is comparable to the GUT scale. However, the pertinent measurements involve hadrons made of the first-generation quarks, such as protons and neutrons. One may therefore entertain the idea that new flavour physics breaks baryon number at a much lower scale, but only in the coupling to a third generation quark, leading to observable baryon-number violating $b$-hadron decay rates. In this paper we show that indirect constraints on the new physics scale $Λ_{\rm BNV}$ from the existing bounds on the proton lifetime do not allow for this possibility. For this purpose we consider the three dominant proton decay channels $p \to \ell^+ ν_\ell \barν$, $p \to π^+ \barν$ and $p \to π^0 \ell^+$ mediated by a virtual bottom quark. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 29 pages, 7 figures, 2 tables

Report number: TUM-HEP-1504/24, USC-TH-2024-01

arXiv:2404.06799 [pdf]

Chemical Interface Dam** by Electrochemical Gold Oxidation

Authors: Maurice Pfeiffer, Xinyan Wu, Fatemeh Ebrahimi, Nadiia Mameka, Manfred Eich, Alexander Petrov

Abstract: Chemical interface dam** is a change in the effective collision frequency of conduction band electrons in metal originating from a chemical change of the metal interface. In this work, we present in-situ ellipsometric measurements that reveal the chemical interface dam** effect from electrochemical oxidation of single crystal and polycrystalline gold films. We observe an increase in collision… ▽ More Chemical interface dam** is a change in the effective collision frequency of conduction band electrons in metal originating from a chemical change of the metal interface. In this work, we present in-situ ellipsometric measurements that reveal the chemical interface dam** effect from electrochemical oxidation of single crystal and polycrystalline gold films. We observe an increase in collision frequency of up to 21 meV for single-crystalline gold. To compare to results obtained with thiols and metal-oxides on gold nanoparticles, we normalize the collision frequency by the electron mean free path to the surface of the structure. We show that electrochemical gold oxidation provides a stronger effect on collision frequency than these coatings. Similar ellipsometric experiments have previously been conducted to investigate the optical properties of gold oxide, but without taking chemical interface dam** into account. The change in reflection from oxidation of gold was solely attributed to the oxide coating. We also show that the chemical interface dam** effect saturates at a larger effective oxide thickness, which is attributed to the stabilization of the gold-oxide interface. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04176 [pdf, other]

Gravitational lensing by a Lorentz-violating black hole

Authors: A. A. Araújo Filho, J. R. Nascimento, A. Yu. Petrov, P. J. Porfírio

Abstract: In this work, we study the gravitational lensing by a Lorentz-violating (LV) black hole inspired by the recent contribution [1]. Explicitly, we concentrate on a specific application: we perform the computation of gravitational lensing effects under the strong field limit. In particular, we analytically derive the deflection angle so that the lens equation can also be addressed. This methodological… ▽ More In this work, we study the gravitational lensing by a Lorentz-violating (LV) black hole inspired by the recent contribution [1]. Explicitly, we concentrate on a specific application: we perform the computation of gravitational lensing effects under the strong field limit. In particular, we analytically derive the deflection angle so that the lens equation can also be addressed. This methodological approach yields physically measurable outcomes, including the determination of relativistic image positions and their corresponding magnifications. As an application of this methodology, we consider the gravitational lensing by Sagittarius A${}^*$ and obtain the corresponding observables expressed as functions of the LV parameter. △ Less

Submitted 10 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: 22 pages, minor corrections, references added

arXiv:2404.03435 [pdf]

Size dependent photoemission study by electrochemical coarsening of nanoporous gold

Authors: Fatemeh Ebrahimi, Xinyan Wu, Maurice Pfeiffer, Hagen Renner, Nadiia Mameka, Manfred Eich, Alexander Petrov

Abstract: The generation and utilization of hot charge carriers in plasmonic materials have emerged as a topic of significant importance, with profound implications across multiple disciplines, including optoelectronics, photovoltaics, photocatalysis, and sensing. In this study, we investigate the hot electron transfer from nanoporous gold (npAu) in dependence of the structure size, utilizing both the nanos… ▽ More The generation and utilization of hot charge carriers in plasmonic materials have emerged as a topic of significant importance, with profound implications across multiple disciplines, including optoelectronics, photovoltaics, photocatalysis, and sensing. In this study, we investigate the hot electron transfer from nanoporous gold (npAu) in dependence of the structure size, utilizing both the nanoscale feature size and the interconnected nature of this material. We employ photoelectron injection from nanoporous gold into the electrolyte under UV illumination as a test electron transfer process. Nanoporous gold thin films with sub-10 nm initial ligament diameter are stepwise coarsened by potential cycles in a photoelectrochemical setup, thereby allowing us to precisely probe the influence of ligament diameter on the photocurrent response. The resulting ligament diameter variations are confirmed by scanning electron microscopy (SEM) analysis. As the ligament diameter increased from 8 to 16 nm, there was a corresponding decrease in quantum efficiency proportional to the inverse ligament diameter squared. Such dependency is expected for electrons excited by surface collisions. For the small ligament diameter of 10 nm we estimate an emission efficiency of excited 6sp electrons as 3.14%, reaching 23% for the surface excited electrons. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 30 pages, 13 figures

arXiv:2403.20222 [pdf, other]

doi 10.1007/978-3-031-56063-7_10

Shallow Cross-Encoders for Low-Latency Retrieval

Authors: Aleksandr V. Petrov, Sean MacAvaney, Craig Macdonald

Abstract: Transformer-based Cross-Encoders achieve state-of-the-art effectiveness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, kee** search latencies low is important for user satisfaction and energy usage. In this pape… ▽ More Transformer-based Cross-Encoders achieve state-of-the-art effectiveness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, kee** search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e., transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalized Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking query sets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25ms per query, MonoBERT-Large (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on TinyBERT trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-Large. We also show that shallow Cross-Encoders are effective even when used without a GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50ms latency), which makes Cross-Encoders practical to run even without specialized hardware acceleration. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by ECIR2024

arXiv:2403.04875 [pdf, other]

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

Authors: Aleksandr Petrov, Craig Macdonald

Abstract: Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve state-of-the-art performance in the sequential recommendation task according to accuracy-based metrics, such as NDCG. These models treat items as tokens and then utilise a score-and-rank approach (Top-K strategy), where the model first computes item scores and then ranks them according to this score. While this approach works… ▽ More Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve state-of-the-art performance in the sequential recommendation task according to accuracy-based metrics, such as NDCG. These models treat items as tokens and then utilise a score-and-rank approach (Top-K strategy), where the model first computes item scores and then ranks them according to this score. While this approach works well for accuracy-based metrics, it is hard to use it for optimising more complex beyond-accuracy metrics such as diversity. Recently, the GPTRec model, which uses a different Next-K strategy, has been proposed as an alternative to the Top-K models. In contrast with traditional Top-K recommendations, Next-K generates recommendations item-by-item and, therefore, can account for complex item-to-item interdependencies important for the beyond-accuracy measures. However, the original GPTRec paper focused only on accuracy in experiments and needed to address how to optimise the model for complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy goals is challenging because the interaction training data available for training recommender systems typically needs to be aligned with beyond-accuracy recommendation goals. To solve the misalignment problem, we train GPTRec using a 2-stage approach: in the first stage, we use a teacher-student approach to train GPTRec, mimicking the behaviour of traditional Top-K models; in the second stage, we use Reinforcement Learning to align the model for beyond-accuracy goals. In particular, we experiment with increasing recommendation diversity and reducing popularity bias. Our experiments on two datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted by the 2nd Workshop The 2nd Workshop on Recommendation with Generative Models, in conjunction with The Web Conference 2024

arXiv:2402.18760 [pdf, other]

Simple Calibration of Block Copolymer Melt Models

Authors: Artem Petrov, He** Huang, Alfredo Alexander-Katz

Abstract: According to the universality hypothesis, the phase behavior of different block copolymer melt models having fixed composition depends solely on two parameters: the invariant chain length $\bar{N}$ and the effective interaction parameter $χN$. If models behave universally, they can be compared to each other and can predict experiment quantitatively. Here, we present a simple way to achieve this un… ▽ More According to the universality hypothesis, the phase behavior of different block copolymer melt models having fixed composition depends solely on two parameters: the invariant chain length $\bar{N}$ and the effective interaction parameter $χN$. If models behave universally, they can be compared to each other and can predict experiment quantitatively. Here, we present a simple way to achieve this universality for coarse-grained models. Our method relies on the properties of the monomer interaction potential energy $z$ distribution. In particular, models having near-symmetric $z$-distributions exhibit universal phase behavior using the standard linear definition of the Flory-Huggins parameter $χ\proptoα$, where $α= ε_{AB}-(ε_{AA}+ε_{BB})/2$, and $ε_{xy}$ is the interaction energy between monomers of type $x$ and $y$. Previously, universality had been achieved using a nonlinear $χ(α)$ function which is difficult to obtain and interpret physically. The main parameter controlling the symmetry of the $z$-distribution is the monomer density $ρ$. Above certain $ρ$, models have symmetric $z$-distributions, and their order-disorder transition points follow the universal curve predicted by Fredrickson-Helfand theory in the experimentally relevant $\bar{N} > 10^2$ range. On the other hand, low-$ρ$ models exhibit skewed $z$-distributions, and the simple $χ\proptoα$ formula is no longer universally applicable to them. Our results can be used for correct block copolymer model building leading to a simple and direct comparison of simulations to experiments, which will facilitate the screening of new block copolymer morphologies and support materials design. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17605 [pdf, other]

One-loop corrections in Maxwell-metric-affine bumblebee gravity

Authors: A. C. Lehum, J. R. Nascimento, A. Yu. Petrov, P. J. Porfirio

Abstract: In this paper, we consider the coupling of the metric-affine bumblebee gravity to the Abelian gauge field and obtain the effective model corresponding to the weak gravity limit of this theory. The effective bumblebee theory displays new unconventional couplings between the bumblebee field and its field strength, and the $U(1)$ gauge field along with its respective field strength, as a result of th… ▽ More In this paper, we consider the coupling of the metric-affine bumblebee gravity to the Abelian gauge field and obtain the effective model corresponding to the weak gravity limit of this theory. The effective bumblebee theory displays new unconventional couplings between the bumblebee field and its field strength, and the $U(1)$ gauge field along with its respective field strength, as a result of the non-metricity effects. Thus, being a new gauge-bumblebee theory, it represents an example of vector-vector couplings which are very rarely considered, if not entirely overlooked, in the Abelian case. For this theory we calculate the lower perturbative corrections. We close the paper with discussions of other possible vector-vector couplings. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 12 pages

arXiv:2402.14753 [pdf, other]

Prompting a Pretrained Transformer Can Be a Universal Approximator

Authors: Aleksandar Petrov, Philip H. S. Torr, Adel Bibi

Abstract: Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-t… ▽ More Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, the attention mechanism is uniquely suited for universal approximation with prefix-tuning a single attention head being sufficient to approximate any continuous function. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13014 [pdf, other]

doi 10.1088/1475-7516/2024/07/004

An exact stationary axisymmetric vacuum solution within a metric--affine bumblebee gravity

Authors: A. A. Araújo Filho, J. R. Nascimento, A. Yu. Petrov, P. J. Porfírio

Abstract: Within the framework of the spontaneous Lorentz symmetry breaking, we consider a metric--affine generalization of the gravitational sector of the Standard--Model Extension (SME), including the Lorentz--violating (LV) coefficients $u$ and $s^{μν}$. In this model, we derive the modified Einstein field equations in order to obtain a new axisymmetric vacuum spinning solution for a particular bumblebee… ▽ More Within the framework of the spontaneous Lorentz symmetry breaking, we consider a metric--affine generalization of the gravitational sector of the Standard--Model Extension (SME), including the Lorentz--violating (LV) coefficients $u$ and $s^{μν}$. In this model, we derive the modified Einstein field equations in order to obtain a new axisymmetric vacuum spinning solution for a particular bumblebee's profile. Such a solution has the remarkable property of incorporating the effects of Lorentz symmetry breaking (LSB) through the LV dimensionless parameter $X=ξb^2$, as the LSB is turned off, $X=0$, we recover the well--established result, the Kerr solution, as expected. Afterwards, we calculate the geodesics, the radial acceleration and thermodynamic quantities for this new metric. We also estimate an upper bound for $X$ by using astrophysical data of the advance Mercury's perihelion. △ Less

Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 36 pages, version accepted to JCAP

Journal ref: JCAP07(2024)004

arXiv:2402.12368 [pdf, other]

A synthetic data approach for domain generalization of NLI models

Authors: Mohammad Javad Hosseini, Andrey Petrov, Alex Fabrikant, Annie Louis

Abstract: Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performan… ▽ More Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performance on out-of-distribution/domain data is less well-understood. We explore the opportunity for synthetic high-quality datasets to adapt NLI models for zero-shot use in downstream applications across new and unseen text domains. We demonstrate a new approach for generating NLI data in diverse domains and lengths, so far not covered by existing training sets. The resulting examples have meaningful premises, the hypotheses are formed in creative ways rather than simple edits to a few premise tokens, and the labels have high accuracy. We show that models trained on this data ($685$K synthetic examples) have the best generalization to completely new downstream test settings. On the TRUE benchmark, a T5-small model trained with our data improves around $7\%$ on average compared to training on the best alternative dataset. The improvements are more pronounced for smaller models, while still meaningful on a T5 XXL model. We also demonstrate gains on test sets when in-domain training data is augmented with our domain-general synthetic data. △ Less

Submitted 28 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.06271 [pdf, other]

doi 10.1016/j.asr.2024.01.021

PeV proton acceleration in Gamma-ray Binaries

Authors: A. M. Bykov, A. E. Petrov, G. A. Ponomaryov, K. P. Levenfish, M. Falanga

Abstract: Current generation of ground based gamma-ray telescopes observed dozens of sources of photons above 100 TeV. Supernova remnants, pulsar wind nebulae, young stellar clusters and superbubbles are considered as possible sites of PeV-regime particles producing the radiation. Another possible source of PeV particles could be gamma-ray binary systems. In these systems, a strong relativistic outflow from… ▽ More Current generation of ground based gamma-ray telescopes observed dozens of sources of photons above 100 TeV. Supernova remnants, pulsar wind nebulae, young stellar clusters and superbubbles are considered as possible sites of PeV-regime particles producing the radiation. Another possible source of PeV particles could be gamma-ray binary systems. In these systems, a strong relativistic outflow from a compact object (neutron star or black hole) collides with the dense wind from a massive companion early-type star. Gamma-ray binaries are observed from radio to high energy gamma-rays as luminous non-thermal sources. Apart from acceleration of very high energy leptons producing most of the non-thermal radiation, these systems may also efficiently accelerate protons. We present here the results of numerical simulation of the PeV-regime proton acceleration in gamma-ray binaries. The simulation is based on relativistic MHD modeling of local flows of magnetized plasma in the region of interaction of two colliding winds. We then inject 0.1 PeV protons into the system and directly follow their trajectories to demonstrate that they are accelerated to energies above PeV. High magnetization of the wind of the young massive star providing a Gauss range field in the winds interaction region is of paramount importance for the acceleration of protons above PeV. The maximum energies of protons accelerated by colliding winds in gamma ray binaries can significantly exceed the energy of the pulsar potential's drop, which limits from above the energy of particles accelerated by an isolated pulsar. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 13 pages, 4 figures, Adv. Space Res., in print

arXiv:2312.08279 [pdf, other]

doi 10.1088/1475-7516/2024/05/125

Energy spectra of elemental groups of cosmic rays with the KASCADE experiment data and machine learning

Authors: M. Yu. Kuznetsov, N. A. Petrov, I. A. Plokhikh, V. V. Sotnikov

Abstract: We report the reconstruction of the mass component spectra of cosmic rays (protons, helium, carbon, silicon and iron) and their mean mass composition, at energies from 1.4 to 100 PeV. The results are derived from the archival data of the extensive air shower experiment KASCADE. We use a novel machine learning technique developed specifically for this reconstruction, and post-LHC hadronic interacti… ▽ More We report the reconstruction of the mass component spectra of cosmic rays (protons, helium, carbon, silicon and iron) and their mean mass composition, at energies from 1.4 to 100 PeV. The results are derived from the archival data of the extensive air shower experiment KASCADE. We use a novel machine learning technique developed specifically for this reconstruction, and post-LHC hadronic interaction models: QGSJet-II.04, EPOS-LHC and Sibyll 2.3c. We have found an excess of the proton component and a deficit of intermediate and heavy nuclei components compared to the original KASCADE results. The spectra of protons and helium show a knee-like behavior at ~ 4.4 PeV and ~ 11 PeV, with significances 5.2$σ$ and 3.9$σ$, respectively. The spectrum of the iron component has a hint (2.4$σ$) of a hardening at ~ 4.5 PeV, which can be interpreted as a counterpart of a hardening in the proton spectrum at 166 TeV, recently reported by the GRAPES-3 experiment. The systematic uncertainties of our analysis were found to be smaller than those of the original KASCADE, as well as those of IceTop and TALE experiments, over the most part of the energy range studied. We also estimated separately the uncertainty related to the difference between the three mentioned hadronic interaction models. We also compute a mean logarithm mass of cosmic ray flux as a function of energy. It is in agreement with the results of IceTop, TALE and LHAASO within the uncertainties. △ Less

Submitted 2 May, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: accepted to JCAP; significant content additions compared to the previous version

Journal ref: JCAP 05 (2024) 125

arXiv:2312.06490 [pdf, other]

Automated Planning Techniques for Elementary Proofs in Abstract Algebra

Authors: Alice Petrov, Christian Muise

Abstract: This paper explores the application of automated planning to automated theorem proving, which is a branch of automated reasoning concerned with the development of algorithms and computer programs to construct mathematical proofs. In particular, we investigate the use of planning to construct elementary proofs in abstract algebra, which provides a rigorous and axiomatic framework for studying algeb… ▽ More This paper explores the application of automated planning to automated theorem proving, which is a branch of automated reasoning concerned with the development of algorithms and computer programs to construct mathematical proofs. In particular, we investigate the use of planning to construct elementary proofs in abstract algebra, which provides a rigorous and axiomatic framework for studying algebraic structures such as groups, rings, fields, and modules. We implement basic implications, equalities, and rules in both deterministic and non-deterministic domains to model commutative rings and deduce elementary results about them. The success of this initial implementation suggests that the well-established techniques seen in automated planning are applicable to the relatively newer field of automated theorem proving. Likewise, automated theorem proving provides a new, challenging domain for automated planning. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Automated Planning Techniques for Elementary Proofs in Abstract Algebra. Petrov, A. & Muise, C. In Scheduling and Planning Applications woRKshop. 2023

arXiv:2312.06165 [pdf, other]

doi 10.1145/3616855.3635821

RecJPQ: Training Large-Catalogue Sequential Recommenders

Authors: Aleksandr V. Petrov, Craig Macdonald

Abstract: Sequential Recommendation is a popular recommendation task that uses the order of user-item interaction to model evolving users' interests and sequential patterns in their behaviour. Current state-of-the-art Transformer-based models for sequential recommendation, such as BERT4Rec and SASRec, generate sequence embeddings and compute scores for catalogue items, but the increasing catalogue size make… ▽ More Sequential Recommendation is a popular recommendation task that uses the order of user-item interaction to model evolving users' interests and sequential patterns in their behaviour. Current state-of-the-art Transformer-based models for sequential recommendation, such as BERT4Rec and SASRec, generate sequence embeddings and compute scores for catalogue items, but the increasing catalogue size makes training these models costly. The Joint Product Quantisation (JPQ) method, originally proposed for passage retrieval, markedly reduces the size of the retrieval index with minimal effect on model effectiveness, by replacing passage embeddings with a limited number of shared sub-embeddings. This paper introduces RecJPQ, a novel adaptation of JPQ for sequential recommendations, which takes the place of item embeddings tensor and replaces item embeddings with a concatenation of a limited number of shared sub-embeddings and, therefore, limits the number of learnable model parameters. The main idea of RecJPQ is to split items into sub-item entities before training the main recommendation model, which is inspired by splitting words into tokens and training tokenisers in language models. We apply RecJPQ to SASRec, BERT4Rec, and GRU4rec models on three large-scale sequential datasets. Our results showed that RecJPQ could notably reduce the model size (e.g., 48% reduction for the Gowalla dataset with no effectiveness degradation). RecJPQ can also improve model performance through a regularisation effect (e.g. +0.96% NDCG@10 improvement on the Booking.com dataset). Overall, RecJPQ allows the training of state-of-the-art transformer recommenders in industrial applications, where datasets with millions of items are common. △ Less

Submitted 18 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted by ACM WSDM 2024

arXiv:2312.05097 [pdf, other]

Selective dam** of plasmons in coupled two-dimensional systems by Coulomb drag

Authors: Ilya Safonov, Aleksandr S. Petrov, Dmitry Svintsov

Abstract: The Coulomb drag is a many-body effect observed in proximized low-dimensional systems. It appears as emergence of voltage in one of them upon passage of bias current in another. The magnitude of drag voltage can be strongly affected by exchange of plasmonic excitations between the layers; however, the reverse effect of Coulomb drag on properties of plasmons has not been studied. Here, we study the… ▽ More The Coulomb drag is a many-body effect observed in proximized low-dimensional systems. It appears as emergence of voltage in one of them upon passage of bias current in another. The magnitude of drag voltage can be strongly affected by exchange of plasmonic excitations between the layers; however, the reverse effect of Coulomb drag on properties of plasmons has not been studied. Here, we study the plasmon spectra and dam** in parallel two-dimensional systems in the presence of Coulomb drag. We find that Coulomb drag leads to selective dam** of one of the two fundamental plasma modes of a coupled bilayer. For identical electron do** of both layers, the drag suppresses the acoustic plasma mode; while for symmetric electron-hole do** of the coupled pair, the drag suppresses the optical plasma mode. The selective dam** can be observed both for propagating modes in extended bilayers and for localized plasmons in bilayers confined by source and drain contacts. The discussed effect may provide access to the strength of Coulomb interaction in 2d electron systems from various optical and microwave scattering experiments. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.09644 [pdf, ps, other]

Why the Bethe-West-Yennie Formula for Coulomb-Nuclear Interference Is Inconsistent

Authors: Vladimir A. Petrov

Abstract: We give a new and simple proof of the inconsistency of the Bethe-West-Yennie parametrization for Coulomb-nuclear interference. We give a new and simple proof of the inconsistency of the Bethe-West-Yennie parametrization for Coulomb-nuclear interference. △ Less

Submitted 16 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: note added in proof added

arXiv:2311.06893 [pdf, other]

doi 10.1088/1748-0221/19/01/P01025

Methods of machine learning for the analysis of cosmic rays mass composition with the KASCADE experiment data

Authors: M. Yu. Kuznetsov, N. A. Petrov, I. A. Plokhikh, V. V. Sotnikov

Abstract: We study the problem of reconstruction of high-energy cosmic rays mass composition from the experimental data of extensive air showers. We develop several machine learning methods for the reconstruction of energy spectra of separate primary nuclei at energies 1-100 PeV, using the public data and Monte-Carlo simulations of the KASCADE experiment from the KCDC platform. We estimate the uncertainties… ▽ More We study the problem of reconstruction of high-energy cosmic rays mass composition from the experimental data of extensive air showers. We develop several machine learning methods for the reconstruction of energy spectra of separate primary nuclei at energies 1-100 PeV, using the public data and Monte-Carlo simulations of the KASCADE experiment from the KCDC platform. We estimate the uncertainties of our methods, including the unfolding procedure, and show that the overall accuracy exceeds that of the method used in the original studies of the KASCADE experiment. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 33 pages

Journal ref: JINST 19 (2024) P01025

arXiv:2311.03056 [pdf]

LitSumm: Large language models for literature summarisation of non-coding RNAs

Authors: Andrew Green, Carlos Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I. Petrov, Alex Bateman, Blake Sweeney

Abstract: Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In th… ▽ More Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied. Availability: Code used to produce these summaries can be found here: https://github.com/RNAcentral/litscan-summarization and the dataset of contexts and summaries can be found here: https://huggingface.co/datasets/RNAcentral/litsumm-v1. Summaries are also displayed on the RNA report pages in RNAcentral (https://rnacentral.org/) △ Less

Submitted 19 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.19698 [pdf, other]

When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations

Authors: Aleksandar Petrov, Philip H. S. Torr, Adel Bibi

Abstract: Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal comp… ▽ More Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are potentially less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefix-tuning can effectively elicit skills present in the pretrained model, they may not be able to learn novel tasks that require new attention patterns. △ Less

Submitted 9 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR 2024

arXiv:2310.15715 [pdf, other]

doi 10.1016/j.physletb.2024.138519

Two-loop corrections to the Carroll-Field-Jackiw term in a CPT-odd Lorentz-violating scalar QED

Authors: A. C. Lehum, J. R. Nascimento, A. Yu. Petrov

Abstract: In this study, we systematically calculate one-loop corrections to the Lorentz-violating vertices within the framework of CPT-odd Quantum Electrodynamics, encompassing scalar and photon fields in arbitrary gauge. Additionally, we ascertain the finite two-loop corrections to the Carroll-Field-Jackiw term. Furthermore, we analyze the UV divergent component of the two-loop Lorentz-violating correctio… ▽ More In this study, we systematically calculate one-loop corrections to the Lorentz-violating vertices within the framework of CPT-odd Quantum Electrodynamics, encompassing scalar and photon fields in arbitrary gauge. Additionally, we ascertain the finite two-loop corrections to the Carroll-Field-Jackiw term. Furthermore, we analyze the UV divergent component of the two-loop Lorentz-violating correction in the self-energy of the scalar field. △ Less

Submitted 4 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 11 pages, version accepted to PLB

Journal ref: Phys. Lett. B850, 138519 (2024)

arXiv:2310.15322 [pdf, ps, other]

doi 10.1140/epjc/s10052-024-12445-x

The equivalence principle for a plane gravitational wave in torsion based and non-metricity based teleparallel equivalents of general relativity

Authors: E. Emtsova, A. N. Petrov, A. V. Toporensky

Abstract: We study the energy-momentum characteristics of the plane ``+''-polarised gravitational wave solution of general relativity in the Teleparallel Equivalent of General Relativity (TEGR) and the Symmetric Teleparallel Equivalent of General Relativity (STEGR) using the previously constructed Noether currents. The current components describe locally measured by observer energy-momentum if the displacem… ▽ More We study the energy-momentum characteristics of the plane ``+''-polarised gravitational wave solution of general relativity in the Teleparallel Equivalent of General Relativity (TEGR) and the Symmetric Teleparallel Equivalent of General Relativity (STEGR) using the previously constructed Noether currents. The current components describe locally measured by observer energy-momentum if the displacement vector $ξ$ is equal to the observer's 4-velocity. To determine the non-dynamical connection in these theories we use the unified ``turning of'' gravity principle. For a constructive analysis of the values of Noether currents and superpotentials in TEGR and STEGR, we use the concept of ``gauges''. The gauge changing can affect the Noether current values. We study under what conditions the Noether current for the freely falling observer is zero. When they are established, zero result can be interpreted as a correspondence to the equivalence principle, and it is a novelty for gravitational waves in TEGR and STEGR. We highlight two important cases with positive and zero energy, which reproduce the results of previous works with a different approach to determine gravitational energy-momentum in TEGR, and give their interpretation. △ Less

Submitted 13 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Report number: https://link.springer.com/content/pdf/10.1140/epjc/s10052-024-12445-x.pdf

Journal ref: The European Physical Journal C, Volume 84, article number 215, (2024)

arXiv:2310.14269 [pdf, other]

doi 10.1103/PhysRevA.108.053103

Progress toward the $\mathcal{P}$, $\mathcal{T}$-odd Faraday effect: Light absorption by atoms briefly interacting with a laser beam

Authors: Dmitry V. Chubukov, Ivan A. Aleksandrov, Leonid V. Skripnikov, Alexander N. Petrov

Abstract: We investigate the process of photon absorption by atoms or molecules shortly interacting with a laser beam in the dipole approximation. Assuming that the interaction time $τ$ is much smaller than the lifetime of the corresponding excited state, we examine the absorption probability as a function of $τ$. Besides, we incorporate Doppler broadening due to nonzero temperature of the atoms (molecules)… ▽ More We investigate the process of photon absorption by atoms or molecules shortly interacting with a laser beam in the dipole approximation. Assuming that the interaction time $τ$ is much smaller than the lifetime of the corresponding excited state, we examine the absorption probability as a function of $τ$. Besides, we incorporate Doppler broadening due to nonzero temperature of the atoms (molecules). It is demonstrated that in the case of a zero detuning and without Doppler broadening, the absorption probability is quadratic in $τ$. Once Doppler broadening is taken into account or the laser beam is off from the resonant frequency, the absorption probability becomes linear in $τ$. Our findings are expected to be important for experimental studies in optical cells or cavities where atoms or molecules traverse continuous laser beams. The experimental prospects of searching for the electric dipole moment (EDM) of the electron are discussed in detail. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Journal ref: Phys. Rev. A 108, 053103 (2023)

arXiv:2310.05681 [pdf, other]

doi 10.1103/PhysRevA.109.012819

Electronic matrix elements for parity doubling in YbOH molecule

Authors: Alexander Petrov

Abstract: YbOH molecule is one of the most sensitive systems for the electron electric dipole moment ($e$EDM) searches. The $e$EDM-induced energy shift is proportional to polarization ($P$) of the molecule. In Ref. [A. Petrov and A. Zakharova, Phys. Rev. A 105, L050801 (2022)] was shown that the value of l-doubling and spin-rotation splitting directly influences the maximum value of $P$. Recently in Ref. [J… ▽ More YbOH molecule is one of the most sensitive systems for the electron electric dipole moment ($e$EDM) searches. The $e$EDM-induced energy shift is proportional to polarization ($P$) of the molecule. In Ref. [A. Petrov and A. Zakharova, Phys. Rev. A 105, L050801 (2022)] was shown that the value of l-doubling and spin-rotation splitting directly influences the maximum value of $P$. Recently in Ref. [Jadbabaie, Y. Takahashi, N. H. Pilgram, C. J. Conn, Y. Zeng, C. Zhang, and N. R. Hutzler, New Journal of Physics 25, 073014 (2023)] the corresponding energy levels was determined experimentally. We introduced electronic matrix elements in Hund's case $c$ coupling scheme to reproduce experimental energy levels and calculated $P$ as function of external electric field. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Journal ref: PRA 109, 012819 (2024)

arXiv:2309.16573 [pdf, other]

Language Models as a Service: Overview of a New Paradigm and its Challenges

Authors: Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael Wooldridge

Abstract: Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. In contrast with scenarios where full model access is available, as in the case of open-source models, such closed-off language models present specific challenges for evaluating, b… ▽ More Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. In contrast with scenarios where full model access is available, as in the case of open-source models, such closed-off language models present specific challenges for evaluating, benchmarking, and testing them. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness of LMaaS. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We conduct a detailed analysis of existing solutions and put forth a number of considered recommendations, and highlight the directions for future advancements. On the other hand, it serves as a comprehensive resource for existing knowledge on current, major LMaaS, offering a synthesized overview of the licences and capabilities their interfaces offer. △ Less

Submitted 30 November, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.05933 [pdf, other]

Workshop on a future muon program at FNAL

Authors: S. Corrodi, Y. Oksuzian, A. Edmonds, J. Miller, H. N. Tran, R. Bonventre, D. N. Brown, F. Meot, V. Singh, Y. Kolomensky, S. Tripathy, L. Borrel, M. Bub, B. Echenard, D. G. Hitlin, H. Jafree, S. Middleton, R. Plestid, F. C. Porter, R. Y. Zhu, L. Bottura, E. Pinsard, A. M. Teixeira, C. Carelli, D. Ambrose , et al. (68 additional authors not shown)

Abstract: The Snowmass report on rare processes and precision measurements recommended Mu2e-II and a next generation muon facility at Fermilab (Advanced Muon Facility) as priorities for the frontier. The Workshop on a future muon program at FNAL was held in March 2023 to discuss design studies for Mu2e-II, organizing efforts for the next generation muon facility, and identify synergies with other efforts (e… ▽ More The Snowmass report on rare processes and precision measurements recommended Mu2e-II and a next generation muon facility at Fermilab (Advanced Muon Facility) as priorities for the frontier. The Workshop on a future muon program at FNAL was held in March 2023 to discuss design studies for Mu2e-II, organizing efforts for the next generation muon facility, and identify synergies with other efforts (e.g., muon collider). Topics included high-power targetry, status of R&D for Mu2e-II, development of compressor rings, FFA and concepts for muon experiments (conversion, decays, muonium and other opportunities) at AMF. This document summarizes the workshop discussions with a focus on future R&D tasks needed to realize these concepts. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 68 pages, 36 figures

Report number: FERMILAB-CONF-23-464-PPD, CALT-TH-2023-036

arXiv:2309.01680 [pdf, other]

Magnetic quadrupole moment of $^{175}$Lu and parity-violating polarization degree of levels in $^{175}$LuOH$^+$

Authors: Igor Kurchavov, Daniel Maison, Leonid Skripnikov, Matt Grau, Alexander Petrov

Abstract: The calculation of the parity-violating polarizations in the external electric field, which are associated with the electron electric dipole moment ($e$EDM) and magnetic quadrupole moment (MQM) of the $^{175}$Lu nucleus, as well as the determination of the rovibrational structure for the $^{175}$LuOH$^+$ cation, is performed. Beyond the bending of the molecule, the slight effect of the stretching… ▽ More The calculation of the parity-violating polarizations in the external electric field, which are associated with the electron electric dipole moment ($e$EDM) and magnetic quadrupole moment (MQM) of the $^{175}$Lu nucleus, as well as the determination of the rovibrational structure for the $^{175}$LuOH$^+$ cation, is performed. Beyond the bending of the molecule, the slight effect of the stretching of the distance between Lu and OH is taken into account. This study is required for the preparation of the experiment and for the extraction of the $e$EDM and MQM values of $^{175}$Lu from future measurements. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.02112

arXiv:2308.16308 [pdf, other]

doi 10.1016/j.physletb.2023.138141

Non-Abelian Carroll-Field-Jackiw term term in a Rarita-Schwinger model

Authors: M. Gomes, J. G. Lima, T. Mariz, J. R. Nascimento, A. Yu. Petrov

Abstract: In this paper, we demonstrate the possibility of generating a non-Abelian Carroll-Field-Jackiw (CFJ) term in the theory of a non-Abelian gauge field coupled to a spin-3/2 field in the presence of the constant axial vector field. Applying two regularization schemes, we prove that this term is finite and ambiguous, particularly vanishing within the 't Hooft-Veltman scheme. In this paper, we demonstrate the possibility of generating a non-Abelian Carroll-Field-Jackiw (CFJ) term in the theory of a non-Abelian gauge field coupled to a spin-3/2 field in the presence of the constant axial vector field. Applying two regularization schemes, we prove that this term is finite and ambiguous, particularly vanishing within the 't Hooft-Veltman scheme. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 11 pages

Journal ref: Phys. Lett. B845, 138141 (2023)

arXiv:2308.12832 [pdf, other]

doi 10.1103/PhysRevA.108.062804

Accurate numerical evaluation of systematics in the experiment for electron electric dipole moment measurement in HfF$^+$

Authors: Alexander N. Petrov

Abstract: Hyperfine structure of the ground rotational level of the metastable $^3Δ_1$ electronic state of $^{180}$HfF$^+$ ion is calculated at presence of variable external electric and magnetic fields. Calculations are required for analysis of systematic effects in experiment for electron electric dipole moment ($e$EDM) search. Different perturbations in molecular spectra important for $e$EDM spectroscopy… ▽ More Hyperfine structure of the ground rotational level of the metastable $^3Δ_1$ electronic state of $^{180}$HfF$^+$ ion is calculated at presence of variable external electric and magnetic fields. Calculations are required for analysis of systematic effects in experiment for electron electric dipole moment ($e$EDM) search. Different perturbations in molecular spectra important for $e$EDM spectroscopy are taken into account. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.02856

Journal ref: Physical Review A 108, 062804 (2023)

arXiv:2308.09503 [pdf, other]

New brane-like solutions in modified four-dimensional Einstein-Gauss-Bonnet gravity

Authors: D. Bazeia, R. Menezes, A. Yu. Petrov, P. J. Porfírio

Abstract: We investigate solutions of a new $4D$ Einstein-Gauss-Bonnet gravity. We first describe the bulk vacuum solution, then we add a massive probe scalar field, and we follow considering a self-interacting scalar field which acts as a source to support thick brane solutions in the four-dimensional Einstein-Gauss-Bonnet scenario with a single extra dimension of infinite extent. We illustrate our results… ▽ More We investigate solutions of a new $4D$ Einstein-Gauss-Bonnet gravity. We first describe the bulk vacuum solution, then we add a massive probe scalar field, and we follow considering a self-interacting scalar field which acts as a source to support thick brane solutions in the four-dimensional Einstein-Gauss-Bonnet scenario with a single extra dimension of infinite extent. We illustrate our results with some distinct brane-like configurations engendering controllable thickness. △ Less

Submitted 21 November, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 28 pages, 6 figures. New references have been included

arXiv:2308.07192 [pdf, other]

doi 10.1145/3604915.3608783

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling

Authors: Aleksandr Petrov, Craig Macdonald

Abstract: A large catalogue size is one of the central challenges in training recommendation models: a large number of items makes them memory and computationally inefficient to compute scores for all items during training, forcing these models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data, and therefore models trained with neg… ▽ More A large catalogue size is one of the central challenges in training recommendation models: a large number of items makes them memory and computationally inefficient to compute scores for all items during training, forcing these models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data, and therefore models trained with negative sampling tend to overestimate the probabilities of positive interactions a phenomenon we call overconfidence. While the absolute values of the predicted scores or probabilities are not important for the ranking of retrieved recommendations, overconfident models may fail to estimate nuanced differences in the top-ranked items, resulting in degraded performance. In this paper, we show that overconfidence explains why the popular SASRec model underperforms when compared to BERT4Rec. This is contrary to the BERT4Rec authors explanation that the difference in performance is due to the bi-directional attention mechanism. To mitigate overconfidence, we propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence. We further propose the gSASRec model, an improvement over SASRec that deploys an increased number of negatives and the gBCE loss. We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem. As a result, gSASRec can outperform BERT4Rec (e.g. +9.47% NDCG on the MovieLens-1M dataset), while requiring less training time (e.g. -73% training time on MovieLens-1M). Moreover, in contrast to BERT4Rec, gSASRec is suitable for large datasets that contain more than 1 million items. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted at ACM RecSys 2023

arXiv:2308.02876 [pdf, other]

doi 10.1140/epjc/s10052-023-12374-1

On perturbative aspects of a nonminimal Lorentz-violating QED with CPT-even dimension-5 terms

Authors: T. Mariz, M. Melo, J. R. Nascimento, A. Yu. Petrov

Abstract: In this paper, we explicitly calculate the lower CPT-even one-loop quantum corrections in nonminimal Lorentz-violating spinor QED with all possible CPT-even dimension-5 operators. Within our calculations, we restrict ourselves to the cases when these parameters are completely expressed in terms of one constant vector. In this paper, we explicitly calculate the lower CPT-even one-loop quantum corrections in nonminimal Lorentz-violating spinor QED with all possible CPT-even dimension-5 operators. Within our calculations, we restrict ourselves to the cases when these parameters are completely expressed in terms of one constant vector. △ Less

Submitted 22 December, 2023; v1 submitted 5 August, 2023; originally announced August 2023.

Comments: 12 pages, version accepted to EPJ C

Journal ref: Eur. Phys. J. C 84, 50 (2024)

arXiv:2306.13069 [pdf, other]

doi 10.1016/j.nuclphysb.2023.116374

Braneworlds in bumblebee gravity

Authors: M. A. Marques, R. Menezes, A. Yu. Petrov, P. J. Porfírio

Abstract: We investigate thick-brane solutions within the five-dimensional bumblebee gravity in the presence of a real scalar field. Specifically, we implement the Lorentz symmetry breaking scenario within this context and obtain brane-like structures. Since the contribution of the bumblebee field is expected to be weak, we solve the field equations in the small-parameter regime. In this situation, we devel… ▽ More We investigate thick-brane solutions within the five-dimensional bumblebee gravity in the presence of a real scalar field. Specifically, we implement the Lorentz symmetry breaking scenario within this context and obtain brane-like structures. Since the contribution of the bumblebee field is expected to be weak, we solve the field equations in the small-parameter regime. In this situation, we develop a first-order framework to describe the brane. The results show that the function which drives the bumblebee field may engender a lumplike structure whose shape depends on the parameters. On one hand, the effect of the non-minimal coupling, controlled by $ξ$, between the bumblebee field and gravity shifts the field from the vacuum expectation value. On the other hand, the aether parameter, $β$, is responsible for modifying the solution inside the brane. △ Less

Submitted 11 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 6 pages, double column, 2 figures, accepted to Nuclear Physics B

Journal ref: Nucl. Phys. B996, 116374 (2023)

arXiv:2306.12740 [pdf, other]

On the possibility of observing tetraquarks in the K+ beam

Authors: A. S. Gerasimov, A. K. Likhoded, V. A. Petrov, V. D. Samoylenko

Abstract: Various models of tetraquark generation in the reaction $K^{+} p \rightarrow T (us; \bar{s}\bar{s})X$ are considered. The predictions for corresponding inclusive spectra were evaluated at the energy 32 and 250 GeV. Various models of tetraquark generation in the reaction $K^{+} p \rightarrow T (us; \bar{s}\bar{s})X$ are considered. The predictions for corresponding inclusive spectra were evaluated at the energy 32 and 250 GeV. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 22 pages, 12 figures, 4 tables, 22 refs

Report number: IHEP 2023-5

arXiv:2306.11114 [pdf, other]

Generative Sequential Recommendation with GPTRec

Authors: Aleksandr V. Petrov, Craig Macdonald

Abstract: Sequential recommendation is an important recommendation task that aims to predict the next item in a sequence. Recently, adaptations of language models, particularly Transformer-based models such as SASRec and BERT4Rec, have achieved state-of-the-art results in sequential recommendation. In these models, item ids replace tokens in the original language models. However, this approach has limitatio… ▽ More Sequential recommendation is an important recommendation task that aims to predict the next item in a sequence. Recently, adaptations of language models, particularly Transformer-based models such as SASRec and BERT4Rec, have achieved state-of-the-art results in sequential recommendation. In these models, item ids replace tokens in the original language models. However, this approach has limitations. First, the vocabulary of item ids may be many times larger than in language models. Second, the classical Top-K recommendation approach used by these models may not be optimal for complex recommendation objectives, including auxiliary objectives such as diversity, coverage or coherence. Recent progress in generative language models inspires us to revisit generative approaches to address these challenges. This paper presents the GPTRec sequential recommendation model, which is based on the GPT-2 architecture. GPTRec can address large vocabulary issues by splitting item ids into sub-id tokens using a novel SVD Tokenisation algorithm based on quantised item embeddings from an SVD decomposition of the user-item interaction matrix. The paper also presents a novel Next-K recommendation strategy, which generates recommendations item-by-item, considering already recommended items. The Next-K strategy can be used for producing complex interdependent recommendation lists. We experiment with GPTRec on the MovieLens-1M dataset and show that using sub-item tokenisation GPTRec can match the quality of SASRec while reducing the embedding table by 40%. We also show that the recommendations generated by GPTRec on MovieLens-1M using the Next-K recommendation strategy match the quality of SASRec in terms of NDCG@10, meaning that the model can serve as a strong starting point for future research. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: Accepted at Gen-IR@SIGIR2023 workshop

arXiv:2306.08488 [pdf, other]

doi 10.1140/epjp/s13360-024-04891-z

Two-loop renormalization of the CPT-even Lorentz-violating Scalar QED

Authors: L. C. T. Brito, J. C. C. Felipe, A. C. Lehum, A. Yu. Petrov

Abstract: Investigating quantum effects arising from high loops in perturbation theory is crucial for the physical applications of any quantum field theory. This paper presents a comprehensive analysis of the two-loop renormalization of CPT-even Lorentz-violating scalar electrodynamics at the first order in the background vectors. We provide results for the self-energies of the photon and scalar field, as w… ▽ More Investigating quantum effects arising from high loops in perturbation theory is crucial for the physical applications of any quantum field theory. This paper presents a comprehensive analysis of the two-loop renormalization of CPT-even Lorentz-violating scalar electrodynamics at the first order in the background vectors. We provide results for the self-energies of the photon and scalar field, as well as for the three-point function associated with the scalar-scalar-photon vertex, ensuring a thorough examination of the quantum effects. The calculations satisfy the ward identities, demonstrating their consistency. Computational tools were employed to carry out the calculations, and we provide additional details in the Supplemental Material for interested readers. Our contribution presents, for the first time, a two-loop calculation within the framework of the Lorentz-violating Standard Model Extension. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 14 pages, submission contains supplemental material in the extra file

Journal ref: Eur. Phys. J. Plus 139, 90 (2024)

arXiv:2306.00777 [pdf, other]

Object pop-up: Can we infer 3D objects and their poses from human interactions alone?

Authors: Ilya A. Petrov, Riccardo Marin, Julian Chibane, Gerard Pons-Moll

Abstract: The intimate entanglement between objects affordances and human poses is of large interest, among others, for behavioural sciences, cognitive psychology, and Computer Vision communities. In recent years, the latter has developed several object-centric approaches: starting from items, learning pipelines synthesizing human poses and dynamics in a realistic way, satisfying both geometrical and functi… ▽ More The intimate entanglement between objects affordances and human poses is of large interest, among others, for behavioural sciences, cognitive psychology, and Computer Vision communities. In recent years, the latter has developed several object-centric approaches: starting from items, learning pipelines synthesizing human poses and dynamics in a realistic way, satisfying both geometrical and functional expectations. However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. We validate our method qualitatively and quantitatively, with synthetic data and sequences acquired for the task, showing applicability for XR/VR. The code is available at https://github.com/ptrvilya/object-popup. △ Less

Submitted 27 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted at CVPR'23

arXiv:2305.15425 [pdf]

Language Model Tokenizers Introduce Unfairness Between Languages

Authors: Aleksandar Petrov, Emanuele La Malfa, Philip H. S. Torr, Adel Bibi

Abstract: Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different lang… ▽ More Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tokenization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support. Character-level and byte-level models also exhibit over 4 times the difference in the encoding length for some language pairs. This induces unfair treatment for some language communities in regard to the cost of accessing commercial language services, the processing time and latency, as well as the amount of content that can be provided as context to the models. Therefore, we make the case that we should train future language models using multilingually fair subword tokenizers. △ Less

Submitted 20 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Published at NeurIPS 2023, Project webpage: https://aleksandarpetrov.github.io/tokenization-fairness, Code: https://github.com/AleksandarPetrov/tokenization-fairness

Showing 1–50 of 994 results for author: Petrov, A