Skip to main content

Showing 1–19 of 19 results for author: Mohtashami, A

.
  1. arXiv:2404.00456  [pdf, other

    cs.LG

    QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

    Authors: Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

    Abstract: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to th… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 19 pages, 6 figures

  2. arXiv:2402.02622  [pdf, other

    cs.CL cs.LG

    DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

    Authors: Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

    Abstract: The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard architecture that improves the perplexity of the model without increasing its size -- adding a few thousand parameters for large-scale models in the 100B param… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  3. arXiv:2312.11441  [pdf, other

    cs.LG cs.CL

    Social Learning: Towards Collaborative Learning with Large Language Models

    Authors: Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Sharifi, Blaise Aguera y Arcas

    Abstract: We introduce the framework of "social learning" in the context of large language models (LLMs), whereby models share knowledge with each other in a privacy-aware manner using natural language. We present and evaluate two approaches for knowledge transfer between LLMs. In the first scenario, we allow the model to generate abstract prompts aiming to teach the task. In our second approach, models tra… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2311.16079  [pdf, other

    cs.CL cs.AI cs.LG

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

    Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  5. arXiv:2310.10845  [pdf, other

    cs.CL cs.LG

    CoTFormer: More Tokens With Attention Make Up For Less Depth

    Authors: Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

    Abstract: The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the Chain-of-Thought (CoT) method continue to play a pivotal role in achieving optimal downstream performance. In this work, we establish an approximate parallel between using chain-of-thought and employing a deeper transformer. Building on this insight, we introduce CoTFormer, a transf… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  6. arXiv:2305.16300  [pdf, other

    cs.CL cs.LG

    Landmark Attention: Random-Access Infinite Context Length for Transformers

    Authors: Amirkeivan Mohtashami, Martin Jaggi

    Abstract: While Transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts. Prior approaches, such as recurrent memory or retrieval-based augmentation, have either compromised the random-access flexibility of attention (i.e., the capability to select any token in the entire context) or… ▽ More

    Submitted 19 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Published as a conference paper at NeurIPS 2023 - 37th Conference on Neural Information Processing Systems

  7. arXiv:2302.03491  [pdf, ps, other

    cs.CL cs.LG

    Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

    Authors: Amirkeivan Mohtashami, Mauro Verzetti, Paul K. Rubenstein

    Abstract: Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotat… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2205.15142  [pdf, other

    cs.LG math.OC

    Special Properties of Gradient Descent with Large Learning Rates

    Authors: Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich

    Abstract: When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well understood theoretically. Several previous works have attributed this success to the stochastic noise present in SGD. However, we show through a novel set of exper… ▽ More

    Submitted 16 February, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: A short version of this work appeared in ICML 22 ICML Workshop on Continuous Time Methods for Machine Learning under the title "The Gap Between Continuous and Discrete Gradient Descent"

  9. arXiv:2202.01838  [pdf, other

    cs.LG

    Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

    Authors: Amirkeivan Mohtashami, Sebastian Stich, Martin Jaggi

    Abstract: While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge faster than SGD. When the order is chosen deterministically, a variant called incremental gradient descent (IG), the existing convergence bounds show improvemen… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  10. arXiv:2106.08895  [pdf, other

    cs.LG

    Masked Training of Neural Networks with Partial Gradients

    Authors: Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

    Abstract: State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD updates to a subset of parameters for increased efficiency (such as meProp) or a combination of both (such as Dropout). However, the convergence of these methods… ▽ More

    Submitted 22 March, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  11. arXiv:2103.02351  [pdf, other

    cs.LG cs.DC stat.ML

    Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

    Authors: Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

    Abstract: It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness. Especially, it has been observed that the speedup saturates beyond a certain batch size and/or when the delays grow too large. We identify a data-dependent parameter that explains the… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  12. arXiv:2008.01009  [pdf, other

    cs.DC cs.DS

    The Splay-List: A Distribution-Adaptive Concurrent Skip-List

    Authors: Vitaly Aksenov, Dan Alistarh, Alexandra Drozdova, Amirkeivan Mohtashami

    Abstract: The design and implementation of efficient concurrent data structures have seen significant attention. However, most of this work has focused on concurrent data structures providing good \emph{worst-case} guarantees. In real workloads, objects are often accessed at different rates, since access distributions may be non-uniform. Efficient distribution-adaptive data structures are known in the seque… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  13. arXiv:2007.01662  [pdf, other

    physics.app-ph physics.ins-det

    Scattering contrast in GHz frequency ultrasound subsurface atomic force microscopy for detection of deeply buried features

    Authors: Maarten H. van Es, Benoit A. J. Quesson, Abbas Mohtashami, Daniele Piras, Kodai Hatakeyama, Laurent Fillinger, Paul L. M. J. van Neer

    Abstract: While Atomic Force Microscopy is mostly used to investigate surface properties, people have almost since its invention sought to apply its high resolution capability to image also structures buried within samples. One of the earliest techniques for this was based on using ultrasound excitations to visualize local differences in effective tip-sample stiffness caused by the presence of buried struct… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Comments: 18 pages, 5 figures

  14. Angle-resolved polarimetry measurements of antenna-mediated fluorescence

    Authors: Abbas Mohtashami, Clara I. Osorio, A. Femius Koenderink

    Abstract: Optical phase-array antennas can be used to control not only the angular distribution but also the polarization of fluorescence from quantum emitters. The emission pattern of the resulting system is determined by the properties of the antenna, the properties of the emitters and the strength of the antenna-emitter coupling. Here we show that Fourier polarimetry can be used to characterize these thr… ▽ More

    Submitted 30 May, 2015; originally announced June 2015.

    Comments: 7 pages, 5 figures

    Journal ref: Phys. Rev. Applied 4, 054014 (2015)

  15. arXiv:1212.5172  [pdf, ps, other

    physics.optics cond-mat.mes-hall physics.atom-ph quant-ph

    Suitability of nanodiamond NV centers for spontaneous emission control experiments

    Authors: Abbas Mohtashami, A. Femius Koenderink

    Abstract: NV centers in diamond are generally recognized as highly promising as indefinitely stable highly efficient single-photon sources. We report an experimental quantification of the brightness, radiative decay rate, nonradiative decay rate and quantum efficiency of single NV centers in diamond nanocrystals. Our experiments show that the commonly observed large spread in fluorescence decay rates of NV… ▽ More

    Submitted 20 December, 2012; originally announced December 2012.

    Comments: 27 pages, 6 figures

  16. arXiv:1212.5081  [pdf, ps, other

    physics.optics

    Nanomechanical method to gauge emission quantum yield applied to NV-centers in nanodiamond

    Authors: Martin Frimmer, Abbas Mohtashami, A. Femius Koenderink

    Abstract: We present a technique to nanomechanically vary the distance between a fluorescent source and a mirror, thereby varying the local density of optical states at the source position. Our method can therefore serve to measure the quantum efficiency of fluorophores. Application of our technique to NV defects in diamond nanocrystals shows that their quantum yield can significantly differ from unity. Rel… ▽ More

    Submitted 20 December, 2012; originally announced December 2012.

    Comments: 11 pages, 3 figures

  17. arXiv:1007.3032  [pdf, other

    cond-mat.mes-hall

    Non-resonant feeding of photonic crystal nanocavity modes by quantum dots

    Authors: A. Laucht, N. Hauke, A. Neumann, T. Günthner, F. Hofbauer, A. Mohtashami, K. Müller, G. Böhm, M. Bichler, M. -C. Amann, M. Kaniber, J. J. Finley

    Abstract: We experimentally probe the non-resonant feeding of photons into the optical mode of a two dimensional photonic crystal nanocavity from the discrete emission from a quantum dot. For a strongly coupled system of a single exciton and the cavity mode, we track the detuning-dependent photoluminescence intensity of the polariton peaks at different lattice temperatures. At low temperatures we observe a… ▽ More

    Submitted 18 July, 2010; originally announced July 2010.

    Journal ref: J. Appl. Phys. 109, 102404 (2011)

  18. Temporal Monitoring of Non-resonant Feeding of Semiconductor Nanocavity Modes by Quantum Dot Multiexciton Transitions

    Authors: A. Laucht, M. Kaniber, A. Mohtashami, N. Hauke, M. Bichler, J. J. Finley

    Abstract: We experimentally investigate the non-resonant feeding of photons into the optical mode of a zero dimensional nanocavity by quantum dot multiexciton transitions. Power dependent photoluminescence measurements reveal a super-linear power dependence of the mode emission, indicating that the emission stems from multiexcitons. By monitoring the temporal evolution of the photoluminescence spectrum, we… ▽ More

    Submitted 15 March, 2010; originally announced March 2010.

    Journal ref: Physical Review B 81, 241302(R) (2010)

  19. arXiv:0910.3749  [pdf, other

    cond-mat.mes-hall cond-mat.other

    Phonon-assisted transitions from quantum dot excitons to cavity photons

    Authors: Ulrich Hohenester, Arne Laucht, Michael Kaniber, Norman Hauke, Abbas Mohtashami, Marek Seliger, Jonathan J. Finley

    Abstract: For a single semiconductor quantum dot embedded in a microcavity, we theoretically and experimentally investigate phonon-assisted transitions between excitons and the cavity mode. Within the framework of the independent boson model we find that such transitions can be very efficient, even for relatively large exciton-cavity detunings of several millielectron volts. Furthermore, we predict a stro… ▽ More

    Submitted 20 October, 2009; originally announced October 2009.

    Journal ref: Phys. Rev. B 80, 201311(R) (2009)