Skip to main content

Showing 1–28 of 28 results for author: Riquelme, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01226  [pdf, other

    cs.CL

    Stable Code Technical Report

    Authors: Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

    Abstract: We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing quest… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2402.17834  [pdf, other

    cs.CL stat.ML

    Stable LM 2 1.6B Technical Report

    Authors: Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

    Abstract: We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including z… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 23 pages, 6 figures

  3. arXiv:2401.15969  [pdf, other

    cs.CV cs.AI cs.LG

    Routers in Vision Mixture of Experts: An Empirical Study

    Authors: Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

    Abstract: Mixture-of-Experts (MoE) models are a promising way to scale up model capacity without significantly increasing computational cost. A key component of MoEs is the router, which decides which subset of parameters (experts) process which feature embeddings (tokens). In this paper, we present a comprehensive study of routers in MoEs for computer vision tasks. We introduce a unified MoE formulation th… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  4. arXiv:2309.08520  [pdf, other

    cs.LG

    Scaling Laws for Sparsely-Connected Foundation Models

    Authors: Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

    Abstract: We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i.e., "foundation models"), in both vision and language domains. In this setting, we identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data, which we validate empirically across model and data scales… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  5. arXiv:2308.00951  [pdf, other

    cs.LG cs.AI cs.CV

    From Sparse to Soft Mixtures of Experts

    Authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby

    Abstract: Sparse mixture of expert architectures (MoEs) scale model capacity without significant increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token drop**, inability to scale the number of experts, or ineffective finetuning. In this work, we propose Soft MoE, a fully-differentiable sparse Transformer that addresses these challe… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: Published as a conference paper at ICLR 2024

  6. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  7. arXiv:2210.10253  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    On the Adversarial Robustness of Mixture of Experts

    Authors: Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli

    Abstract: Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, Bubeck and Sellke proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do -- and can -- func… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  8. arXiv:2209.06794  [pdf, other

    cs.CV cs.CL

    PaLI: A Jointly-Scaled Multilingual Language-Image Model

    Authors: Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner , et al. (4 additional authors not shown)

    Abstract: Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaL… ▽ More

    Submitted 5 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 (Notable-top-5%)

  9. arXiv:2206.02770  [pdf, other

    cs.CV

    Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

    Authors: Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

    Abstract: Large sparsely-activated models have obtained excellent performance in multiple domains. However, such models are typically trained on a single modality at a time. We present the Language-Image MoE, LIMoE, a sparse mixture of experts model capable of multimodal learning. LIMoE accepts both images and text simultaneously, while being trained using a contrastive loss. MoEs are a natural fit for a mu… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  10. arXiv:2202.12015  [pdf, other

    cs.CV cs.LG

    Learning to Merge Tokens in Vision Transformers

    Authors: Cedric Renggli, André Susano Pinto, Neil Houlsby, Basil Mustafa, Joan Puigcerver, Carlos Riquelme

    Abstract: Transformers are widely applied to solve natural language understanding and computer vision tasks. While scaling up these architectures leads to improved performance, it often comes at the expense of much higher computational costs. In order for large-scale models to remain practical in real-world systems, there is a need for reducing their computational overhead. In this work, we present the Patc… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 11 pages, 9 figures

  11. arXiv:2106.05974  [pdf, other

    cs.CV cs.LG stat.ML

    Scaling Vision with Sparse Mixture of Experts

    Authors: Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

    Abstract: Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When app… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 44 pages, 38 figures

  12. Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance

    Authors: Manuel Carranza-García, Pedro Lara-Benítez, Jorge García-Gutiérrez, José C. Riquelme

    Abstract: Object detection has been one of the most active topics in computer vision for the past years. Recent works have mainly focused on pushing the state-of-the-art in the general-purpose COCO benchmark. However, the use of such detection frameworks in specific applications such as autonomous driving is yet an area to be addressed. This study presents an enhanced 2D object detector based on Faster R-CN… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Journal ref: Neurocomputing, 2021

  13. An Experimental Review on Deep Learning Architectures for Time Series Forecasting

    Authors: Pedro Lara-Benítez, Manuel Carranza-García, José C. Riquelme

    Abstract: In recent years, deep learning techniques have outperformed traditional models in many machine learning tasks. Deep neural networks have successfully been applied to address time series forecasting problems, which is a very important topic in data mining. They have proved to be an effective solution given their capacity to automatically learn the temporal dependencies present in time series. Howev… ▽ More

    Submitted 8 April, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Journal ref: International Journal of Neural Systems, Vol. 31, No. 3 (2021) 2130001

  14. arXiv:2010.06866  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Ensembles for Low-Data Transfer Learning

    Authors: Basil Mustafa, Carlos Riquelme, Joan Puigcerver, André Susano Pinto, Daniel Keysers, Neil Houlsby

    Abstract: In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for tra… ▽ More

    Submitted 19 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

  15. arXiv:2010.06402  [pdf, other

    cs.LG cs.CV

    Which Model to Transfer? Finding the Needle in the Growing Haystack

    Authors: Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic

    Abstract: Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline. The emergence of rich model repositories, such as TensorFlow Hub, enables the practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these r… ▽ More

    Submitted 25 March, 2022; v1 submitted 13 October, 2020; originally announced October 2020.

  16. arXiv:2009.13239  [pdf, other

    cs.LG cs.CV stat.ML

    Scalable Transfer Learning with Expert Models

    Authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

    Abstract: Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploit… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

  17. Coronavirus Optimization Algorithm: A bioinspired metaheuristic based on the COVID-19 propagation model

    Authors: F. Martínez-Álvarez, G. Asencio-Cortés, J. F. Torres, D. Gutiérrez-Avilés, L. Melgar-García, R. Pérez-Chacón, C. Rubio-Escudero, J. C. Riquelme, A. Troncoso

    Abstract: A novel bioinspired metaheuristic is proposed in this work, simulating how the coronavirus spreads and infects healthy people. From an initial individual (the patient zero), the coronavirus infects new patients at known rates, creating new populations of infected people. Every individual can either die or infect and, afterwards, be sent to the recovered population. Relevant terms such as re-infect… ▽ More

    Submitted 16 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

    Comments: 30 pages, 4 figures

  18. arXiv:2003.02544  [pdf, ps, other

    cs.LG stat.ML

    On the performance of deep learning models for time series classification in streaming

    Authors: Pedro Lara-Benítez, Manuel Carranza-García, Francisco Martínez-Álvarez, José C. Riquelme

    Abstract: Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, there have been recent efforts to adapt complex deep l… ▽ More

    Submitted 3 April, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Paper submitted to the 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020)

  19. arXiv:2001.08049  [pdf, other

    stat.ML cs.LG

    On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation

    Authors: Nicolas Brosse, Carlos Riquelme, Alice Martin, Sylvain Gelly, Éric Moulines

    Abstract: Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs. We propose a family of algorithms which split the classification task into two stages: representation learning and uncertaint… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

  20. arXiv:1910.04867  [pdf, other

    cs.CV cs.LG stat.ML

    A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

    Authors: Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

    Abstract: Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r… ▽ More

    Submitted 21 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  21. arXiv:1907.11180  [pdf, other

    cs.LG stat.ML

    Google Research Football: A Novel Reinforcement Learning Environment

    Authors: Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

    Abstract: Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator… ▽ More

    Submitted 14 April, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

  22. arXiv:1906.07987  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

    Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

    Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  23. arXiv:1905.11112  [pdf, other

    stat.ML cs.IT cs.LG

    Practical and Consistent Estimation of f-Divergences

    Authors: Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

    Abstract: The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and genera… ▽ More

    Submitted 24 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted to the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

    Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  24. arXiv:1802.09127  [pdf, other

    stat.ML cs.LG

    Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

    Authors: Carlos Riquelme, George Tucker, Jasper Snoek

    Abstract: Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, develo** practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posteri… ▽ More

    Submitted 25 February, 2018; originally announced February 2018.

    Comments: Sixth International Conference on Learning Representations, ICLR 2018

  25. arXiv:1703.00579  [pdf, other

    stat.ML cs.LG

    Active Learning for Accurate Estimation of Linear Models

    Authors: Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric

    Abstract: We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution. The decision maker must query one of the linear models for each incoming context, and receives an observation corrupted by noise levels that are unknown, and depend on the model instance.… ▽ More

    Submitted 29 July, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

    Comments: 37 pages, 8 figures, International Conference on Machine Learning, ICML 2017

  26. arXiv:1703.00535  [pdf, other

    stat.ML cs.LG

    Human Interaction with Recommendation Systems

    Authors: Sven Schmit, Carlos Riquelme

    Abstract: Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this fee… ▽ More

    Submitted 28 March, 2018; v1 submitted 1 March, 2017; originally announced March 2017.

    Comments: Accepted to AISTATS 2018

  27. arXiv:1602.02845  [pdf, other

    stat.ML cs.LG

    Online Active Linear Regression via Thresholding

    Authors: Carlos Riquelme, Ramesh Johari, Baosen Zhang

    Abstract: We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower b… ▽ More

    Submitted 21 December, 2016; v1 submitted 8 February, 2016; originally announced February 2016.

    Comments: Published in AAAI 2017

  28. Learning multifractal structure in large networks

    Authors: Austin R. Benson, Carlos Riquelme, Sven Schmit

    Abstract: Generating random graphs to model networks has a rich history. In this paper, we analyze and improve upon the multifractal network generator (MFNG) introduced by Palla et al. We provide a new result on the probability of subgraphs existing in graphs generated with MFNG. From this result it follows that we can quickly compute moments of an important set of graph properties, such as the expected num… ▽ More

    Submitted 26 February, 2014; originally announced February 2014.

    ACM Class: H.4.0; E.1

    Journal ref: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2014