Search | arXiv e-print repository

Scaling Vision Transformers to 22 Billion Parameters

Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2205.10337 [pdf, other]

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Authors: Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

Abstract: We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a… ▽ More We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision. △ Less

Submitted 14 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: 22 pages. Accepted at NeurIPS 2022

arXiv:2111.02767 [pdf, other]

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

Authors: Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev

Abstract: We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also acceler… ▽ More We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also accelerates novel research. By providing a standard and lossless format of datasets it enables to quickly test new algorithms on a wider range of tasks. The RLDS ecosystem makes it easy to share datasets without any loss of information and to be agnostic to the underlying original format when applying various data processing pipelines to large collections of datasets. Besides, RLDS provides tools for collecting data generated by either synthetic agents or humans, as well as for inspecting and manipulating the collected data. Ultimately, integration with TFDS facilitates the sharing of RL datasets with the research community. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: https://github.com/google-research/rlds

arXiv:1712.06139 [pdf, other]

TensorFlow-Serving: Flexible, High-Performance ML Serving

Authors: Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, Jordan Soyke

Abstract: We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference h… ▽ More We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2. △ Less

Submitted 27 December, 2017; v1 submitted 17 December, 2017; originally announced December 2017.

Comments: Presented at NIPS 2017 Workshop on ML Systems (http://learningsys.org/nips17/acceptedpapers.html)

arXiv:1606.07792 [pdf, other]

Wide & Deep Learning for Recommender Systems

Authors: Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah

Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks… ▽ More Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow. △ Less

Submitted 24 June, 2016; originally announced June 2016.

arXiv:0810.4171 [pdf, ps, other]

Capacity of Steganographic Channels

Authors: Jeremiah J. Harmsen, William A. Pearlman

Abstract: This work investigates a central problem in steganography, that is: How much data can safely be hidden without being detected? To answer this question, a formal definition of steganographic capacity is presented. Once this has been defined, a general formula for the capacity is developed. The formula is applicable to a very broad spectrum of channels due to the use of an information-spectrum app… ▽ More This work investigates a central problem in steganography, that is: How much data can safely be hidden without being detected? To answer this question, a formal definition of steganographic capacity is presented. Once this has been defined, a general formula for the capacity is developed. The formula is applicable to a very broad spectrum of channels due to the use of an information-spectrum approach. This approach allows for the analysis of arbitrary steganalyzers as well as non-stationary, non-ergodic encoder and attack channels. After the general formula is presented, various simplifications are applied to gain insight into example hiding and detection methodologies. Finally, the context and applications of the work are summarized in a general discussion. △ Less

Submitted 22 October, 2008; originally announced October 2008.

ACM Class: H.1.1

arXiv:nucl-ex/0605025 [pdf, ps, other]

doi 10.1103/PhysRevC.74.015206

Experimental determination of the complete spin structure for anti-proton + proton -> anti-Λ+ Λat anti-proton beam momentum of 1.637 GeV/c

Authors: The PS185 collaboration, K . D. Paschke, B. Quinn, A. Berdoz, G. B. Franklin, P. Khaustov, C. A. Meyer, C. Bradtke, R. Gehring, S. Goertz, J. Harmsen, A. Meier, W. Meyer, E. Radtke, G. Reicherz, H. Dutz, M. Pluckthun, B. Schoch, H. Dennert, W. Eyrich, J. Hauffe, A. Metzger, M. Moosburger, F. Stinzing, St. Wirth , et al. (24 additional authors not shown)

Abstract: The reaction anti-proton + proton -> anti-Λ+ Λ-> anti-proton + π^+ + proton + π^- has been measured with high statistics at anti-proton beam momentum of 1.637 GeV/c. The use of a transversely-polarized frozen-spin target combined with the self-analyzing property of Λ/anti-Λdecay allows access to unprecedented information on the spin structure of the interaction. The most general spin-scattering… ▽ More The reaction anti-proton + proton -> anti-Λ+ Λ-> anti-proton + π^+ + proton + π^- has been measured with high statistics at anti-proton beam momentum of 1.637 GeV/c. The use of a transversely-polarized frozen-spin target combined with the self-analyzing property of Λ/anti-Λdecay allows access to unprecedented information on the spin structure of the interaction. The most general spin-scattering matrix can be written in terms of eleven real parameters for each bin of scattering angle, each of these parameters is determined with reasonable precision. From these results all conceivable spin-correlations are determined with inherent self-consistency. Good agreement is found with the few previously existing measurements of spin observables in anti-proton + proton -> anti-Λ+ Λnear this energy. Existing theoretical models do not give good predictions for those spin-observables that had not been previously measured. △ Less

Submitted 19 May, 2006; originally announced May 2006.

Comments: To be published in Phys. Rev. C. Tables of results (i.e. Ref. 24) are available at http://www-meg.phys.cmu.edu/~bquinn/ps185_pub/results.tab 24 pages, 16 figures

Journal ref: Phys.Rev.C74:015206,2006

arXiv:nucl-ex/0206005 [pdf, ps, other]

doi 10.1103/PhysRevLett.89.212302

Measurement of Spin Transfer Observables in Antiproton-Proton -> Antilambda-Lambda at 1.637 GeV/c

Authors: B. Bassalleck, A. Berdoz, C. Bradtke, R. Bröders, B. Bunker, H. Dennert, H. Dutz, S. Eilerts, W. Eyrich, D. Fields, H. Fischer, G. Franklin, J. Franz, R. Gehring, R. Geyer, S. Goertz, J. Harmsen, J. Hauffe, F. H. Heinsius, D. Hertzog, T. Johansson, T. Jones, P. Khaustov, K. Kilian, P. Kingsberry , et al. (23 additional authors not shown)

Abstract: Spin transfer observables for the strangeness-production reaction Antiproton-Proton -> Antilambda-Lambda have been measured by the PS185 collaboration using a transversely-polarized frozen-spin target with an antiproton beam momentum of 1.637 GeV/c at the Low Energy Antiproton Ring at CERN. This measurement investigates observables for which current models of the reaction near threshold make sig… ▽ More Spin transfer observables for the strangeness-production reaction Antiproton-Proton -> Antilambda-Lambda have been measured by the PS185 collaboration using a transversely-polarized frozen-spin target with an antiproton beam momentum of 1.637 GeV/c at the Low Energy Antiproton Ring at CERN. This measurement investigates observables for which current models of the reaction near threshold make significantly differing predictions. Those models are in good agreement with existing measurements performed with unpolarized particles in the initial state. Theoretical attention has focused on the fact that these models produce conflicting predictions for the spin-transfer observables D_{nn} and K_{nn}, which are measurable only with polarized target or beam. Results presented here for D_{nn} and K_{nn} are found to be in disagreement with predictions from existing models. These results also underscore the importance of singlet-state production at backward angles, while current models predict complete or near-complete triplet-state dominance. △ Less

Submitted 10 June, 2002; originally announced June 2002.

Comments: 5 pages, 3 figures

Journal ref: Phys.Rev.Lett.89:212302,2002

arXiv:hep-ph/0011299 [pdf, ps, other]

Electron Scattering with Polarized Targets at TESLA

Authors: The TESLA-N Study Group, :, M. Anselmino, E. C. Aschenauer, S. Belostotski, W. Bialowons, J. Bluemlein, V. Braun, R. Brinkmann, M. Dueren, F. Ellinghaus, K. Goeke, St. Goertz, A. Gute, J. Harmsen, D. v. Harrach, R. Jakob, E. M. Kabuss, R. Kaiser, V. Korotkov, P. Kroll, E. Leader, B. Lehmann-Dronke, L. Mankiewicz, A. Meier , et al. (18 additional authors not shown)

Abstract: Measurements of polarized electron-nucleon scattering can be realized at the TESLA linear collider facility with projected luminosities that are about two orders of magnitude higher than those expected of other experiments at comparable energies. Longitudinally polarized electrons, accelerated as a small fraction of the total current in the e+ arm of TESLA, can be directed onto a solid state tar… ▽ More Measurements of polarized electron-nucleon scattering can be realized at the TESLA linear collider facility with projected luminosities that are about two orders of magnitude higher than those expected of other experiments at comparable energies. Longitudinally polarized electrons, accelerated as a small fraction of the total current in the e+ arm of TESLA, can be directed onto a solid state target that may be either longitudinally or transversely polarized. A large variety of polarized parton distribution and fragmentation functions can be determined with unprecedented accuracy, many of them for the first time. A main goal of the experiment is the precise measurement of the x- and Q^2-dependence of the experimentally totally unknown quark transversity distributions that will complete the information on the nucleon's quark spin structure as relevant for high energy processes. Comparing their Q^2-evolution to that of the corresponding helicity distributions constitutes an important precision test of the predictive power of QCD in the spin sector. Measuring transversity distributions and tensor charges allows access to the hitherto unmeasured chirally odd operators in QCD which are of great importance to understand the role of chiral symmetry. The possibilities of using unpolarized targets and of experiments with a real photon beam turn TESLA-N into a versatile next-generation facility at the intersection of particle and nuclear physics. △ Less

Submitted 24 November, 2000; originally announced November 2000.

Comments: 24 pages, 10 figures

Report number: DESY 00-160, TPR-00-20

Showing 1–9 of 9 results for author: Harmsen, J