Skip to main content

Showing 1–16 of 16 results for author: Bansal, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  4. arXiv:2307.12941  [pdf, other

    cs.LG cs.AI cs.CV

    On Privileged and Convergent Bases in Neural Network Representations

    Authors: Davis Brown, Nikhil Vyas, Yamini Bansal

    Abstract: In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational i… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: In the Workshop on High-dimensional Learning Dynamics at ICML 2023

  5. Winter Wheat Crop Yield Prediction on Multiple Heterogeneous Datasets using Machine Learning

    Authors: Yogesh Bansal, Dr. David Lillis, Prof. Mohand Tahar Kechadi

    Abstract: Winter wheat is one of the most important crops in the United Kingdom, and crop yield prediction is essential for the nation's food security. Several studies have employed machine learning (ML) techniques to predict crop yield on a county or farm-based level. The main objective of this study is to predict winter wheat crop yield using ML models on multiple heterogeneous datasets, i.e., soil and we… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Journal ref: International Conference on Computational Science and Computational Intelligence (CSCI 2022)

  6. arXiv:2306.11942   

    cs.LG

    A Deep Learning Model for Heterogeneous Dataset Analysis -- Application to Winter Wheat Crop Yield Prediction

    Authors: Yogesh Bansal, David Lillis, Mohand Tahar Kechadi

    Abstract: Western countries rely heavily on wheat, and yield prediction is crucial. Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. Existing literature reported that they perform better than traditional Machine Learning (ML) models. However, the existing LSTM cannot handle heterogeneous datasets (a combination of data which… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: This version has been removed by arXiv administrators because the submitter did not have the authority to grant the license at the time of submission

  7. arXiv:2302.01398  [pdf, other

    cs.CL

    The unreasonable effectiveness of few-shot learning for machine translation

    Authors: Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

    Abstract: We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  8. arXiv:2206.10012  [pdf, other

    cs.LG cs.AI

    Limitations of the NTK for Understanding Generalization in Deep Learning

    Authors: Nikhil Vyas, Yamini Bansal, Preetum Nakkiran

    Abstract: The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks. In this work, we study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization. In particular, we demonstrate realistic settings where finite-width… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  9. arXiv:2202.01994  [pdf, other

    cs.LG cs.CL

    Data Scaling Laws in NMT: The Effect of Noise and Architecture

    Authors: Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat

    Abstract: In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size. Then, we systematically vary aspects of the training setup to understand… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  10. arXiv:2106.07682  [pdf, other

    cs.LG stat.ML

    Revisiting Model Stitching to Compare Neural Representations

    Authors: Yamini Bansal, Preetum Nakkiran, Boaz Barak

    Abstract: We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model'' formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-apprec… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  11. arXiv:2010.13187  [pdf, other

    stat.ML cs.CV cs.LG

    Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

    Authors: Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaquan Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund

    Abstract: Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 October, 2020; originally announced October 2020.

  12. arXiv:2010.08508  [pdf, other

    cs.LG cs.NE stat.ML

    For self-supervised learning, Rationality implies generalization, provably

    Authors: Yamini Bansal, Gal Kaplun, Boaz Barak

    Abstract: We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e.g., linear) classifier $g$ to the labels. Specifically, we show that (under the assumptions described below) the generalization gap of such classifiers tends to zero if $\mathsf{C}(g) \ll n$, where… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  13. arXiv:2009.11929  [pdf, other

    cs.CV cs.LG

    Image-Based Sorghum Head Counting When You Only Look Once

    Authors: Lawrence Mosley, Hieu Pham, Yogesh Bansal, Eric Hare

    Abstract: Modern trends in digital agriculture have seen a shift towards artificial intelligence for crop quality assessment and yield estimation. In this work, we document how a parameter tuned single-shot object detection algorithm can be used to identify and count sorghum head from aerial drone images. Our approach involves a novel exploratory analysis that identified key structural elements of the sorgh… ▽ More

    Submitted 13 June, 2022; v1 submitted 24 September, 2020; originally announced September 2020.

  14. arXiv:2009.08092  [pdf, other

    cs.LG cs.NE math.ST stat.ML

    Distributional Generalization: A New Kind of Generalization

    Authors: Preetum Nakkiran, Yamini Bansal

    Abstract: We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error. For example, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, then a ResNet trained to interpolation will in fact mislabel roughly 30% of dogs as cats o… ▽ More

    Submitted 14 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Co-first authors. V2: Intro shortened; no new results

  15. arXiv:1912.02292  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Deep Double Descent: Where Bigger Models and More Data Hurt

    Authors: Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

    Abstract: We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effect… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: G.K. and Y.B. contributed equally

  16. arXiv:1806.00730  [pdf, other

    stat.ML cs.LG cs.NE

    Minnorm training: an algorithm for training over-parameterized deep neural networks

    Authors: Yamini Bansal, Madhu Advani, David D Cox, Andrew M Saxe

    Abstract: In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting trai… ▽ More

    Submitted 21 June, 2018; v1 submitted 2 June, 2018; originally announced June 2018.