Skip to main content

Showing 1–18 of 18 results for author: Jitsev, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19146  [pdf, other

    cs.LG cs.CL

    Resolving Discrepancies in Compute-Optimal Scaling of Language Models

    Authors: Tomer Porian, Mitchell Wortsman, Jenia Jitsev, Ludwig Schmidt, Yair Carmon

    Abstract: Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup durati… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  3. arXiv:2406.02061  [pdf, other

    cs.LG cs.AI cs.CL

    Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

    Authors: Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev

    Abstract: Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across vari… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: v1.1

  4. arXiv:2403.08540  [pdf, other

    cs.CL cs.LG

    Language models scale reliably with over-training and on downstream tasks

    Authors: Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Luca Soldaini, Alexandros G. Dimakis, Gabriel Ilharco, Pang Wei Koh, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

    Abstract: Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contr… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2308.01390  [pdf, other

    cs.CV cs.AI cs.LG

    OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

    Authors: Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

    Abstract: We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperpar… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  6. arXiv:2304.14108  [pdf, other

    cs.CV cs.CL cs.LG

    DataComp: In search of the next generation of multimodal datasets

    Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song , et al. (9 additional authors not shown)

    Abstract: Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Commo… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  7. arXiv:2304.07169  [pdf, other

    cs.CV cs.LG

    A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

    Authors: Mehdi Cherti, Alexander Czernik, Stefan Kesselheim, Frederic Effenberger, Jenia Jitsev

    Abstract: Solar activity is one of the main drivers of variability in our solar system and the key source of space weather phenomena that affect Earth and near Earth space. The extensive record of high resolution extreme ultraviolet (EUV) observations from the Solar Dynamics Observatory (SDO) offers an unprecedented, very large dataset of solar images. In this work, we make use of this comprehensive dataset… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  8. arXiv:2212.07143  [pdf, other

    cs.LG cs.AI cs.CV

    Reproducible scaling laws for contrastive language-image learning

    Authors: Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev

    Abstract: Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Preprint. Under review

  9. arXiv:2210.16110  [pdf, other

    physics.flu-dyn cs.LG

    Towards prediction of turbulent flows at high Reynolds numbers using high performance computing data and deep learning

    Authors: Mathis Bode, Michael Gauding, Jens Henrik Göbbert, Baohao Liao, Jenia Jitsev, Heinz Pitsch

    Abstract: In this paper, deep learning (DL) methods are evaluated in the context of turbulent flows. Various generative adversarial networks (GANs) are discussed with respect to their suitability for understanding and modeling turbulence. Wasserstein GANs (WGANs) are then chosen to generate small-scale turbulence. Highly resolved direct numerical simulation (DNS) turbulent data is used for training the WGAN… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Journal ref: LNCS 11203, pp. 614-623, 2018

  10. arXiv:2210.08402  [pdf, other

    cs.CV cs.AI cs.LG

    LAION-5B: An open large-scale dataset for training next generation image-text models

    Authors: Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev

    Abstract: Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classi… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Track on Datasets and Benchmarks. OpenReview: https://openreview.net/forum?id=M3Y74vmsMcY

  11. arXiv:2111.02114  [pdf, other

    cs.CV cs.CL cs.LG

    LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

    Authors: Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki

    Abstract: Multi-modal language-vision models trained on hundreds of millions of image-text pairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability to perform zero- or few-shot learning and transfer even in absence of per-sample labels on target image data. Despite this trend, to date there has been no publicly available datasets of sufficient scale for training such models from scratc… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: Short version. Accepted at Data Centric AI NeurIPS Workshop 2021

  12. arXiv:2108.11976  [pdf, other

    cs.DC cs.LG

    JUWELS Booster -- A Supercomputer for Large-Scale AI Research

    Authors: Stefan Kesselheim, Andreas Herten, Kai Krajsek, Jan Ebert, Jenia Jitsev, Mehdi Cherti, Michael Langguth, Bing Gong, Scarlet Stadtler, Amirpasha Mozaffari, Gabriele Cavallaro, Rocco Sedona, Alexander Schug, Alexandre Strube, Roshni Kamath, Martin G. Schultz, Morris Riedel, Thomas Lippert

    Abstract: In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its s… ▽ More

    Submitted 30 June, 2021; originally announced August 2021.

    Comments: 12 pages, 5 figures. Accepted at ISC 2021, Workshop Deep Learning on Supercomputers. This is a duplicate submission as my previous submission is on hold for several weeks now and my attempts to contact the moderators failed

    Report number: 1234567Dummy

  13. Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images

    Authors: Mehdi Cherti, Jenia Jitsev

    Abstract: Increasing model, data and compute budget scale in the pre-training has been shown to strongly improve model generalization and transfer learning in vast line of work done in language modeling and natural image recognition. However, most studies on the positive effect of larger scale were done in scope of in-domain setting, with source and target data being in close proximity. To study effect of l… ▽ More

    Submitted 18 December, 2022; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: Short version published in MedNeurIPS 2021. Long version published in IJCNN 2022. Code: https://github.com/SLAMPAI/large-scale-pretraining-transfer

    Journal ref: 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1-9

  14. arXiv:2103.14886  [pdf, other

    cs.LG cs.AI cs.NE nlin.AO

    Generalization over different cellular automata rules learned by a deep feed-forward neural network

    Authors: Marcel Aach, Jens Henrik Goebbert, Jenia Jitsev

    Abstract: To test generalization ability of a class of deep neural networks, we randomly generate a large number of different rule sets for 2-D cellular automata (CA), based on John Conway's Game of Life. Using these rules, we compute several trajectories for each CA instance. A deep convolutional encoder-decoder network with short and long range skip connections is trained on various generated CA trajector… ▽ More

    Submitted 18 November, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: Accepted at 23rd International Conference on Artificial Intelligence (July 2021, Las Vegas, USA) To appear in: Springer Transactions on Computational Science & Computational Intelligence

  15. arXiv:2005.00568  [pdf, other

    stat.ML cs.LG hep-ex hep-ph

    Adversarial domain adaptation to reduce sample bias of a high energy physics classifier

    Authors: Jose M. Clavijo, Paul Glaysher, Judith M. Katzy, Jenia Jitsev

    Abstract: We apply adversarial domain adaptation in unsupervised setting to reduce sample bias in a supervised high energy physics events classifier training. We make use of a neural network containing event and domain classifier with a gradient reversal layer to simultaneously enable signal versus background events classification on the one hand, while on the other hand minimising the difference in respons… ▽ More

    Submitted 19 August, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: 17 pages, 8 figures, to be submitted to MLST

    Report number: Report Number: DESY 20-073

  16. arXiv:2004.00567  [pdf, other

    cs.LG cs.AI stat.ML

    Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning

    Authors: Marco Pleines, Jenia Jitsev, Mike Preuss, Frank Zimmer

    Abstract: The Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete. Whereas the most top performing entries of last year's competition used human demonstrations or reward sha** to learn how to cope with the challenge, we present an approach that performed competitively (placed 7th) but starts completely from scratch by means of De… ▽ More

    Submitted 20 July, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

    Comments: 8 pages, 9 figures, 2 tables, under review

  17. arXiv:1911.11380  [pdf, other

    cs.LG cs.GR physics.comp-ph physics.flu-dyn stat.ML

    Using Physics-Informed Super-Resolution Generative Adversarial Networks for Subgrid Modeling in Turbulent Reactive Flows

    Authors: Mathis Bode, Michael Gauding, Zeyu Lian, Dominik Denker, Marco Davidovic, Konstantin Kleinheinz, Jenia Jitsev, Heinz Pitsch

    Abstract: Turbulence is still one of the main challenges for accurately predicting reactive flows. Therefore, the development of new turbulence closures which can be applied to combustion problems is essential. Data-driven modeling has become very popular in many fields over the last years as large, often extensively labeled, datasets became available and training of large neural networks became possible on… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: Submitted to Combustion Symposium 2020

  18. arXiv:0905.2125  [pdf, other

    q-bio.NC cs.LG nlin.AO

    Experience-driven formation of parts-based representations in a model of layered visual memory

    Authors: Jenia Jitsev, Christoph von der Malsburg

    Abstract: Growing neuropsychological and neurophysiological evidence suggests that the visual cortex uses parts-based representations to encode, store and retrieve relevant objects. In such a scheme, objects are represented as a set of spatially distributed local features, or parts, arranged in stereotypical fashion. To encode the local appearance and to represent the relations between the constituent par… ▽ More

    Submitted 22 March, 2010; v1 submitted 13 May, 2009; originally announced May 2009.

    Comments: 34 pages, 12 Figures, 1 Table, published in Frontiers in Computational Neuroscience (Special Issue on Complex Systems Science and Brain Dynamics), http://www.frontiersin.org/neuroscience/computationalneuroscience/paper/10.3389/neuro.10/015.2009/

    Journal ref: Front. Comput. Neurosci. 3:15 (2009)