Search | arXiv e-print repository

Transformers are Multi-State RNNs

Authors: Matanel Oren, Michael Hassid, Nir Yarden, Yossi Adi, Roy Schwartz

Abstract: Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-s… ▽ More Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-state RNNs by fixing the size of their hidden state, effectively compressing their key-value cache. We introduce a novel, training-free compression policy - $\textbf{T}$oken $\textbf{O}$mission $\textbf{V}$ia $\textbf{A}$ttention (TOVA). Our experiments with four long range tasks and several LLMs show that TOVA outperforms several baseline compression policies. Particularly, our results are nearly on par with the full model, using in some cases only $\frac{1}{8}$ of the original cache size, which translates to 4.8X higher throughput. Our results shed light on the connection between transformers and RNNs, and help mitigate one of LLMs' most painful computational bottlenecks - the size of their key-value cache. We publicly release our code at https://github.com/schwartz-lab-NLP/TOVA △ Less

Submitted 18 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: preprint

arXiv:2104.02310 [pdf, ps, other]

SERRANT: a syntactic classifier for English Grammatical Error Types

Authors: Leshem Choshen, Matanel Oren, Dmitry Nikolaev, Omri Abend

Abstract: SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's annotations when they are informative and those provided by SErCl otherwise. SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's annotations when they are informative and those provided by SErCl otherwise. △ Less

Submitted 7 April, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Code library in: https://github.com/matanel-oren/serrant

arXiv:1905.04691 [pdf, other]

Sensor Defense In-Software (SDI):Practical Software Based Detection of Spoofing Attacks on Position Sensor

Authors: Kevin Sam Tharayil, Benyamin Farshteindiker, Shaked Eyal, Nir Hasidim, Roy Hershkovitz, Shani Houri, Ilia Yoffe, Michal Oren, Yossi Oren

Abstract: Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make impo… ▽ More Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make important decisions, defending sensors from these attacks is of prime importance. In this work we present practical software based defenses against attacks on two common types of position sensors, specifically the gyroscope and the magnetometer. We first characterize the sensitivity of these sensors to acoustic and magnetic adversaries. Next, we present two software-only defenses: a machine learning based single sensor defense, and a sensor fusion defense which makes use of the mathematical relationship between the two sensors. We performed a detailed theoretical analysis of our defenses, and implemented them on a variety of smartphones, as well as on a resource-constrained IoT sensor node. Our defenses do not require any hardware or OS-level modifications, making it possible to use them with existing hardware. Moreover, they provide a high detection accuracy, a short detection time and a reasonable power consumption. △ Less

Submitted 12 May, 2019; originally announced May 2019.

ACM Class: B.8.1; K.6.5

arXiv:physics/0311067 [pdf, ps, other]

doi 10.1063/1.1638736

W3 theory: robust computational thermochemistry in the kJ/mol accuracy range

Authors: A. Daniel Boese, Mikhal Oren, Onur Atasoylu, Jan M. L. Martin, Mihaly Kallay, Juergen Gauss

Abstract: We are proposing a new computational thermochemistry protocol denoted W3 theory, as a successor to W1 and W2 theory proposed earlier [Martin and De Oliveira, J. Chem. Phys. 111, 1843 (1999)]. The new method is both more accurate overall (error statistics for total atomization energies approximately cut in half) and more robust (particularly towards systems exhibiting significant nondynamical cor… ▽ More We are proposing a new computational thermochemistry protocol denoted W3 theory, as a successor to W1 and W2 theory proposed earlier [Martin and De Oliveira, J. Chem. Phys. 111, 1843 (1999)]. The new method is both more accurate overall (error statistics for total atomization energies approximately cut in half) and more robust (particularly towards systems exhibiting significant nondynamical correlation) than W2 theory. The cardinal improvement rests in an approximate account for post-CCSD(T) correlation effects. Iterative T_3 (connected triple excitations) effects exhibit a basis set convergence behavior similar to the T_3 contribution overall. They almost universally decrease molecular binding energies. Their inclusion in isolation yields less accurate results than CCSD(T) nearly across the board: it is only when T_4 (connected quadruple excitations) effects are included that superior performance is achieved. $T_4$ effects systematically increase molecular binding energies. Their basis set convergence is quite rapid, and even CCSDTQ/cc-pVDZ scaled by an empirical factor of 1.2532 will yield a quite passable quadruples contribution. The effect of still higher-order excitations was gauged for a subset of molecules (notably the eight-valence electron systems): T_5 (connected quintuple excitations) contributions reach 0.3 kcal/mol for the pathologically multireference X ^1Σ^+_g state of C_2 but are quite small for other systems. A variety of avenues for achieving accuracy beyond that of W3 theory were explored, to no significant avail. W3 thus appears to represent a good compromise between accuracy and computational cost for those seeking a robust method for computational thermochemistry in the kJ/mol accuracy range on small systems. △ Less

Submitted 14 November, 2003; originally announced November 2003.

Comments: J. Chem. Phys., in press (306406JCP)

Journal ref: Journal of Chemical Physics 120, 4129-4141 (2004)

arXiv:physics/0301056 [pdf, ps, other]

doi 10.1080/0026897031000094498

Alkali and Alkaline Earth Metal Compounds: Core-Valence Basis Sets and Importance of Subvalence Correlation

Authors: Mark A. Iron, Mikhal Oren, Jan M. L. Martin

Abstract: Core-valence basis sets for the alkali and alkaline earth metals Li, Be, Na, Mg, K, and Ca are proposed. The basis sets are validated by calculating spectroscopic constants of a variety of diatomic molecules involving these elements. Neglect of $(3s,3p)$ correlation in K and Ca compounds will lead to erratic results at best, and chemically nonsensical ones if chalcogens or halogens are present.… ▽ More Core-valence basis sets for the alkali and alkaline earth metals Li, Be, Na, Mg, K, and Ca are proposed. The basis sets are validated by calculating spectroscopic constants of a variety of diatomic molecules involving these elements. Neglect of $(3s,3p)$ correlation in K and Ca compounds will lead to erratic results at best, and chemically nonsensical ones if chalcogens or halogens are present. The addition of low-exponent $p$ functions to the K and Ca basis sets is essential for smooth convergence of molecular properties. Inclusion of inner-shell correlation is important for accurate spectroscopic constants and binding energies of all the compounds. In basis set extrapolation/convergence calculations, the explicit inclusion of alkali and alkaline earth metal subvalence correlation at all steps is essential for K and Ca, strongly recommended for Na, and optional for Li and Mg, while in Be compounds, an additive treatment in a separate `core correlation' step is probably sufficient. Consideration of $(1s)$ inner-shell correlation energy in first-row elements requires inclusion of $(2s,2p)$ `deep core' correlation energy in K and Ca for consistency. The latter requires special CCV$n$Z `deep core correlation' basis sets. For compounds involving Ca bound to electronegative elements, additional $d$ functions in the basis set are strongly recommended. For optimal basis set convergence in such cases, we suggest the sequence CV(D+3d)Z, CV(T+2d)Z, CV(Q+$d$)Z, and CV5Z on calcium. △ Less

Submitted 22 January, 2003; originally announced January 2003.

Comments: Molecular Physics, in press (W. G. Richards issue); supplementary material (basis sets in G98 and MOLPRO formats) available at http://theochem.weizmann.ac.il/web/papers/group12.html

Journal ref: Molecular Physics 101, 1345-1361 (2003)

Showing 1–5 of 5 results for author: Oren, M