-
Transformers are Multi-State RNNs
Authors:
Matanel Oren,
Michael Hassid,
Nir Yarden,
Yossi Adi,
Roy Schwartz
Abstract:
Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-s…
▽ More
Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-state RNNs by fixing the size of their hidden state, effectively compressing their key-value cache. We introduce a novel, training-free compression policy - $\textbf{T}$oken $\textbf{O}$mission $\textbf{V}$ia $\textbf{A}$ttention (TOVA). Our experiments with four long range tasks and several LLMs show that TOVA outperforms several baseline compression policies. Particularly, our results are nearly on par with the full model, using in some cases only $\frac{1}{8}$ of the original cache size, which translates to 4.8X higher throughput. Our results shed light on the connection between transformers and RNNs, and help mitigate one of LLMs' most painful computational bottlenecks - the size of their key-value cache. We publicly release our code at https://github.com/schwartz-lab-NLP/TOVA
△ Less
Submitted 18 June, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
SERRANT: a syntactic classifier for English Grammatical Error Types
Authors:
Leshem Choshen,
Matanel Oren,
Dmitry Nikolaev,
Omri Abend
Abstract:
SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's annotations when they are informative and those provided by SErCl otherwise.
SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's annotations when they are informative and those provided by SErCl otherwise.
△ Less
Submitted 7 April, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Sensor Defense In-Software (SDI):Practical Software Based Detection of Spoofing Attacks on Position Sensor
Authors:
Kevin Sam Tharayil,
Benyamin Farshteindiker,
Shaked Eyal,
Nir Hasidim,
Roy Hershkovitz,
Shani Houri,
Ilia Yoffe,
Michal Oren,
Yossi Oren
Abstract:
Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make impo…
▽ More
Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make important decisions, defending sensors from these attacks is of prime importance.
In this work we present practical software based defenses against attacks on two common types of position sensors, specifically the gyroscope and the magnetometer. We first characterize the sensitivity of these sensors to acoustic and magnetic adversaries. Next, we present two software-only defenses: a machine learning based single sensor defense, and a sensor fusion defense which makes use of the mathematical relationship between the two sensors. We performed a detailed theoretical analysis of our defenses, and implemented them on a variety of smartphones, as well as on a resource-constrained IoT sensor node. Our defenses do not require any hardware or OS-level modifications, making it possible to use them with existing hardware. Moreover, they provide a high detection accuracy, a short detection time and a reasonable power consumption.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.
-
W3 theory: robust computational thermochemistry in the kJ/mol accuracy range
Authors:
A. Daniel Boese,
Mikhal Oren,
Onur Atasoylu,
Jan M. L. Martin,
Mihaly Kallay,
Juergen Gauss
Abstract:
We are proposing a new computational thermochemistry protocol denoted W3 theory, as a successor to W1 and W2 theory proposed earlier [Martin and De Oliveira, J. Chem. Phys. 111, 1843 (1999)]. The new method is both more accurate overall (error statistics for total atomization energies approximately cut in half) and more robust (particularly towards systems exhibiting significant nondynamical cor…
▽ More
We are proposing a new computational thermochemistry protocol denoted W3 theory, as a successor to W1 and W2 theory proposed earlier [Martin and De Oliveira, J. Chem. Phys. 111, 1843 (1999)]. The new method is both more accurate overall (error statistics for total atomization energies approximately cut in half) and more robust (particularly towards systems exhibiting significant nondynamical correlation) than W2 theory. The cardinal improvement rests in an approximate account for post-CCSD(T) correlation effects. Iterative T_3 (connected triple excitations) effects exhibit a basis set convergence behavior similar to the T_3 contribution overall. They almost universally decrease molecular binding energies. Their inclusion in isolation yields less accurate results than CCSD(T) nearly across the board: it is only when T_4 (connected quadruple excitations) effects are included that superior performance is achieved. $T_4$ effects systematically increase molecular binding energies. Their basis set convergence is quite rapid, and even CCSDTQ/cc-pVDZ scaled by an empirical factor of 1.2532 will yield a quite passable quadruples contribution. The effect of still higher-order excitations was gauged for a subset of molecules (notably the eight-valence electron systems): T_5 (connected quintuple excitations) contributions reach 0.3 kcal/mol for the pathologically multireference X ^1Σ^+_g state of C_2 but are quite small for other systems. A variety of avenues for achieving accuracy beyond that of W3 theory were explored, to no significant avail. W3 thus appears to represent a good compromise between accuracy and computational cost for those seeking a robust method for computational thermochemistry in the kJ/mol accuracy range on small systems.
△ Less
Submitted 14 November, 2003;
originally announced November 2003.
-
Alkali and Alkaline Earth Metal Compounds: Core-Valence Basis Sets and Importance of Subvalence Correlation
Authors:
Mark A. Iron,
Mikhal Oren,
Jan M. L. Martin
Abstract:
Core-valence basis sets for the alkali and alkaline earth metals Li, Be, Na, Mg, K, and Ca are proposed. The basis sets are validated by calculating spectroscopic constants of a variety of diatomic molecules involving these elements. Neglect of $(3s,3p)$ correlation in K and Ca compounds will lead to erratic results at best, and chemically nonsensical ones if chalcogens or halogens are present.…
▽ More
Core-valence basis sets for the alkali and alkaline earth metals Li, Be, Na, Mg, K, and Ca are proposed. The basis sets are validated by calculating spectroscopic constants of a variety of diatomic molecules involving these elements. Neglect of $(3s,3p)$ correlation in K and Ca compounds will lead to erratic results at best, and chemically nonsensical ones if chalcogens or halogens are present. The addition of low-exponent $p$ functions to the K and Ca basis sets is essential for smooth convergence of molecular properties. Inclusion of inner-shell correlation is important for accurate spectroscopic constants and binding energies of all the compounds. In basis set extrapolation/convergence calculations, the explicit inclusion of alkali and alkaline earth metal subvalence correlation at all steps is essential for K and Ca, strongly recommended for Na, and optional for Li and Mg, while in Be compounds, an additive treatment in a separate `core correlation' step is probably sufficient. Consideration of $(1s)$ inner-shell correlation energy in first-row elements requires inclusion of $(2s,2p)$ `deep core' correlation energy in K and Ca for consistency. The latter requires special CCV$n$Z `deep core correlation' basis sets. For compounds involving Ca bound to electronegative elements, additional $d$ functions in the basis set are strongly recommended. For optimal basis set convergence in such cases, we suggest the sequence CV(D+3d)Z, CV(T+2d)Z, CV(Q+$d$)Z, and CV5Z on calcium.
△ Less
Submitted 22 January, 2003;
originally announced January 2003.