-
Jamba: A Hybrid Transformer-Mamba Language Model
Authors:
Opher Lieber,
Barak Lenz,
Hofit Bata,
Gal Cohen,
Jhonathan Osin,
Itay Dalmedigos,
Erez Safahi,
Shaked Meirom,
Yonatan Belinkov,
Shai Shalev-Shwartz,
Omri Abend,
Raz Alon,
Tomer Asida,
Amir Bergman,
Roman Glozman,
Michael Gokhman,
Avashalom Manevich,
Nir Ratner,
Noam Rozen,
Erez Shwartz,
Mor Zusman,
Yoav Shoham
Abstract:
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows reso…
▽ More
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.
△ Less
Submitted 3 July, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Studies of high-transverse momentum jet substructure and top quarks produced in 1.96 TeV proton-antiproton collisions
Authors:
T. Aaltonen,
R. Alon,
S. Amerio,
D. Amidei,
A. Anastassov,
A. Annovi,
J. Antos,
G. Apollinari,
J. A. Appel,
T. Arisawa,
A. Artikov,
J. Asaadi,
W. Ashmanskas,
B. Auerbach,
A. Aurisano,
F. Azfar,
W. Badgett,
T. Bae,
A. Barbaro-Galtieri,
V. E. Barnes,
B. A. Barnett,
P. Barria,
P. Bartos,
M. Bauce,
F. Bedeschi
, et al. (381 additional authors not shown)
Abstract:
Results of a study of the substructure of the highest transverse momentum (pT) jets observed by the CDF collaboration are presented. Events containing at least one jet with pT > 400 GeV/c in a sample corresponding to an integrated luminosity of 5.95 inverse fb, collected in 1.96 TeV proton-antiproton collisions at the Fermilab Tevatron collider, are selected. A study of the jet mass, angularity, a…
▽ More
Results of a study of the substructure of the highest transverse momentum (pT) jets observed by the CDF collaboration are presented. Events containing at least one jet with pT > 400 GeV/c in a sample corresponding to an integrated luminosity of 5.95 inverse fb, collected in 1.96 TeV proton-antiproton collisions at the Fermilab Tevatron collider, are selected. A study of the jet mass, angularity, and planar-flow distributions is presented, and the measurements are compared with predictions of perturbative quantum chromodynamics. A search for boosted top-quark production is also described, leading to a 95% confidence level upper limit of 38 fb on the production cross section of top quarks with pT > 400 GeV/c.
△ Less
Submitted 13 July, 2014;
originally announced July 2014.
-
Structure of Fat Jets at the Tevatron and Beyond
Authors:
Leandro G. Almeida,
Raz Alon,
Michael Spannowsky
Abstract:
Boosted resonances is a highly probable and enthusiastic scenario in any process probing the electroweak scale. Such objects when decaying into jets can easily blend with the cornucopia of jets from hard relative light QCD states. We review jet observables and algorithms that can contribute to the identification of highly boosted heavy jets and the possible searches that can make use of such subst…
▽ More
Boosted resonances is a highly probable and enthusiastic scenario in any process probing the electroweak scale. Such objects when decaying into jets can easily blend with the cornucopia of jets from hard relative light QCD states. We review jet observables and algorithms that can contribute to the identification of highly boosted heavy jets and the possible searches that can make use of such substructure information. We also review previous studies by CDF on boosted jets and its measurements on specific jet shapes.
△ Less
Submitted 29 November, 2011; v1 submitted 17 October, 2011;
originally announced October 2011.
-
A data-driven method of pile-up correction for the substructure of massive jets
Authors:
Raz Alon,
Ehud Duchovni,
Gilad Perez,
Aliaksandr P. Pranko,
Pekka K. Sinervo
Abstract:
We describe a method to measure and subtract the incoherent component of energy flow arising from multiple interactions from jet shape/substructure observables of ultra-massive jets. The amount subtracted is a function of the jet shape variable of interest and not a universal property. Such a correction is expected to significantly reduce any bias in the corresponding distributions generated by th…
▽ More
We describe a method to measure and subtract the incoherent component of energy flow arising from multiple interactions from jet shape/substructure observables of ultra-massive jets. The amount subtracted is a function of the jet shape variable of interest and not a universal property. Such a correction is expected to significantly reduce any bias in the corresponding distributions generated by the presence of multiple interactions, and to improve measurement resolution. Since in our method the correction is obtained from the data, it is not subject to uncertainties coming from the use of theoretical calculations and/or Monte Carlo event generators. We derive our correction method for the jet mass, angularity and planar flow. We find these corrections to be in good agreement with data on massive jets observed by the CDF collaboration. Finally, we comment on the linkage with the concept of jet area and jet mass area.
△ Less
Submitted 20 March, 2011; v1 submitted 15 January, 2011;
originally announced January 2011.
-
Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics
Authors:
The ATLAS Collaboration,
G. Aad,
E. Abat,
B. Abbott,
J. Abdallah,
A. A. Abdelalim,
A. Abdesselam,
O. Abdinov,
B. Abi,
M. Abolins,
H. Abramowicz,
B. S. Acharya,
D. L. Adams,
T. N. Addy,
C. Adorisio,
P. Adragna,
T. Adye,
J. A. Aguilar-Saavedra,
M. Aharrouche,
S. P. Ahlen,
F. Ahles,
A. Ahmad,
H. Ahmed,
G. Aielli,
T. Akdogan
, et al. (2587 additional authors not shown)
Abstract:
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on…
▽ More
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on simulations of the detector and physics processes, with particular emphasis given to the data expected from the first years of operation of the LHC at CERN.
△ Less
Submitted 14 August, 2009; v1 submitted 28 December, 2008;
originally announced January 2009.
-
Time resolution of a Thick Gas Electron Multiplier (THGEM) - based detector
Authors:
Raz Alon,
Marco Cortesi,
Amos Breskin,
Rachel Chechik
Abstract:
The time resolution of a double-stage Thick-GEM (THGEM) detector was measured with UV-photons and relativistic electrons. The photon detector, with semitransparent- or reflective-photocathode yielded time resolution of about 8-10ns RMS for single photoelectrons and 0.5-1ns RMS for few-hundred photoelectrons per photon-pulse. Time resolution of about 10ns RMS was recorded for relativistic electro…
▽ More
The time resolution of a double-stage Thick-GEM (THGEM) detector was measured with UV-photons and relativistic electrons. The photon detector, with semitransparent- or reflective-photocathode yielded time resolution of about 8-10ns RMS for single photoelectrons and 0.5-1ns RMS for few-hundred photoelectrons per photon-pulse. Time resolution of about 10ns RMS was recorded for relativistic electrons from a 106Ru source.
△ Less
Submitted 25 September, 2008;
originally announced September 2008.
-
A concise review on THGEM detectors
Authors:
A. Breskin,
R. Alon,
M. Cortesi,
R. Chechik,
J. Miyamoto,
V. Dangendorf,
J. Maia,
J. M. F. Dos Santos
Abstract:
We briefly review the concept and properties of the Thick GEM (THGEM); it is a robust, high-gain gaseous electron multiplier, manufactured economically by standard printed-circuit drilling and etching technology. Its operation and structure resemble that of GEMs but with 5 to 20-fold expanded dimensions. The millimeter-scale hole-size results in good electron transport and in large avalanche-mul…
▽ More
We briefly review the concept and properties of the Thick GEM (THGEM); it is a robust, high-gain gaseous electron multiplier, manufactured economically by standard printed-circuit drilling and etching technology. Its operation and structure resemble that of GEMs but with 5 to 20-fold expanded dimensions. The millimeter-scale hole-size results in good electron transport and in large avalanche-multiplication factors, e.g. reaching 10^7 in double-THGEM cascaded single-photoelectron detectors. The multiplier's material, parameters and shape can be application-tailored; it can operate practically in any counting gas, including noble gases, over a pressure range spanning from 1 mbar to several bars; its operation at cryogenic (LAr) conditions was recently demonstrated. The high gain, sub-millimeter spatial resolution, high counting-rate capability, good timing properties and the possibility of industrial production capability of large-area robust detectors, pave ways towards a broad spectrum of potential applications; some are discussed here in brief.
△ Less
Submitted 13 July, 2008;
originally announced July 2008.
-
Operation of a Thick Gas Electron Multiplier (THGEM) in Ar, Xe and Ar-Xe
Authors:
R. Alon,
J. Miyamoto,
M. Cortesi,
A. Breskin,
R. Chechik,
I. Carne,
J. M. Maia,
J. M. F. dos Santos,
M. Gai,
D. McKinsey,
V. Dangendorf
Abstract:
We present the results of our recent studies of a Thick Gaseous Electron Multiplier (THGEM)-based detector, operated in Ar, Xe and Ar:Xe (95:5) at various gas pressures. Avalanche-multiplication properties and energy resolution were investigated with soft x-rays for different detector configurations and parameters. Gains above 10E4 were reached in a double-THGEM detector, at atmospheric pressure…
▽ More
We present the results of our recent studies of a Thick Gaseous Electron Multiplier (THGEM)-based detector, operated in Ar, Xe and Ar:Xe (95:5) at various gas pressures. Avalanche-multiplication properties and energy resolution were investigated with soft x-rays for different detector configurations and parameters. Gains above 10E4 were reached in a double-THGEM detector, at atmospheric pressure, in all gases, in almost all the tested conditions; in Ar:Xe (95:5) similar gains were reached at pressures up to 2 bar. The energy resolution dependence on the gas, pressure, hole geometry and electric fields was studied in detail, yielding in some configurations values below 20% FWHM with 5.9 keV x-rays.
△ Less
Submitted 26 December, 2007; v1 submitted 4 December, 2007;
originally announced December 2007.
-
Investigations of a THGEM-based imaging detector
Authors:
M. Cortesi,
R. Alon,
R. Chechik,
A. Breskin,
D. Vartsky,
V. Dangendorf
Abstract:
We present the results of our recent studies on a Thick Gas Electron Multiplier (THGEM)-based imaging detector prototype. It consists of two 100x100 mm^2 THGEM electrodes in cascade, coupled to a resistive anode. The event location is recorded with a 2D double-sided readout electrode equipped with discrete delay-lines and dedicated electronics. The THGEM electrodes, produced by standard printed-…
▽ More
We present the results of our recent studies on a Thick Gas Electron Multiplier (THGEM)-based imaging detector prototype. It consists of two 100x100 mm^2 THGEM electrodes in cascade, coupled to a resistive anode. The event location is recorded with a 2D double-sided readout electrode equipped with discrete delay-lines and dedicated electronics. The THGEM electrodes, produced by standard printed-circuit board and mechanical drilling techniques, a 0.4 mm thick with 0.5 mm diameter holes spaced by 1 mm. Localization resolutions of about 0.7 mm (FWHM) were measured with soft x-rays, in a detector operated with atmospheric-pressure Ar/CH4; good linearity and homogeneity were achieved. We describe the imaging-detector layout, the resistive-anode 2D readout system and the imaging properties. The THGEM has numerous potential applications that require large-area imaging detectors, with high-rate capability, single-electron sensitivity and moderate (sub-mm) localization resolution.
△ Less
Submitted 26 December, 2007; v1 submitted 22 July, 2007;
originally announced July 2007.
-
Toward Application of a Thick Gas Electron Multiplier (THGEM) Readout for a Dark Matter Detector
Authors:
M. Gai,
D. N. McKinsey,
K. Ni,
D. A. R. Rubin,
T. Wongjirad,
R. Alon,
A. Breskin,
M. Cortesi,
J. Miyamoto
Abstract:
The Yale-Weizmann collaboration aims to develop a low-radioactivity (low-background) cryogenic noble liquid detector for Dark-Matter (DM) search in measurements to be performed deep underground as for example carried out by the XENON collaboration. A major issue is the background induced by natural radioactivity of present-detector components including the Photo Multiplier Tubes (PMT) made from…
▽ More
The Yale-Weizmann collaboration aims to develop a low-radioactivity (low-background) cryogenic noble liquid detector for Dark-Matter (DM) search in measurements to be performed deep underground as for example carried out by the XENON collaboration. A major issue is the background induced by natural radioactivity of present-detector components including the Photo Multiplier Tubes (PMT) made from glass with large U-Th content. We propose to use advanced Thick Gaseous Electron Multipliers (THGEM) recently developed at the Weizmann Institute of Science (WIS). These "hole-multipliers" will measure in a two-phase (liquid/gas) Xe detector electrons extracted into the gas phase from both ionization in the liquid as well as scintillation-induced photoelectrons from a CsI photocathode immersed in LXe. We report on initial tests (in gas) of THGEM made out of Cirlex (Kapton) which is well known to have low Ra-Th content instead of the usual G10 material with high Ra-Th content.
△ Less
Submitted 7 June, 2007;
originally announced June 2007.
-
L-selectin mediated leukocyte tethering in shear flow is controlled by multiple contacts and cytoskeletal anchorage facilitating fast rebinding events
Authors:
Ulrich S. Schwarz,
Ronen Alon
Abstract:
L-selectin mediated tethers result in leukocyte rolling only above a threshold in shear. Here we present biophysical modeling based on recently published data from flow chamber experiments (Dwir et al., J. Cell Biol. 163: 649-659, 2003) which supports the interpretation that L-selectin mediated tethers below the shear threshold correspond to single L-selectin carbohydrate bonds dissociating on t…
▽ More
L-selectin mediated tethers result in leukocyte rolling only above a threshold in shear. Here we present biophysical modeling based on recently published data from flow chamber experiments (Dwir et al., J. Cell Biol. 163: 649-659, 2003) which supports the interpretation that L-selectin mediated tethers below the shear threshold correspond to single L-selectin carbohydrate bonds dissociating on the time scale of milliseconds, whereas L-selectin mediated tethers above the shear threshold are stabilized by multiple bonds and fast rebinding of broken bonds, resulting in tether lifetimes on the timescale of $10^{-1}$ seconds. Our calculations for cluster dissociation suggest that the single molecule rebinding rate is of the order of $10^4$ Hz. A similar estimate results if increased tether dissociation for tail-truncated L-selectin mutants above the shear threshold is modeled as diffusive escape of single receptors from the rebinding region due to increased mobility. Using computer simulations, we show that our model yields first order dissociation kinetics and exponential dependence of tether dissociation rates on shear stress. Our results suggest that multiple contacts, cytoskeletal anchorage of L-selectin and local rebinding of ligand play important roles in L-selectin tether stabilization and progression of tethers into persistent rolling on endothelial surfaces.
△ Less
Submitted 5 April, 2005;
originally announced April 2005.