Skip to main content

Showing 1–9 of 9 results for author: Behl, S

.
  1. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  2. arXiv:2101.05844  [pdf, other

    cs.LG

    Scaling the Convex Barrier with Sparse Dual Algorithms

    Authors: Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H. S. Torr, M. Pawan Kumar

    Abstract: Tight and efficient neural network bounding is crucial to the scaling of neural network verification systems. Many efficient bounding algorithms have been presented recently, but they are often too loose to verify more challenging properties. This is due to the weakness of the employed relaxation, which is usually a linear program of size linear in the number of neurons. While a tighter linear rel… ▽ More

    Submitted 26 February, 2024; v1 submitted 14 January, 2021; originally announced January 2021.

    Comments: Journal of Machine Learning Research, 2024 (extension of ICLR 2021 paper in [v1])

  3. arXiv:2008.08424  [pdf, other

    cs.CV cs.GR cs.LG stat.ML

    AutoSimulate: (Quickly) Learning Synthetic Data Generation

    Authors: Harkirat Singh Behl, Atılım Güneş Baydin, Ran Gal, Philip H. S. Torr, Vibhav Vineet

    Abstract: Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and valida… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

    Journal ref: European Conference on Computer Vision (ECCV) 2020

  4. arXiv:2006.10711  [pdf, other

    cs.LG stat.ML

    STEER: Simple Temporal Regularization For Neural ODEs

    Authors: Arnab Ghosh, Harkirat Singh Behl, Emilien Dupont, Philip H. S. Torr, Vinay Namboodiri

    Abstract: Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end ti… ▽ More

    Submitted 2 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Neurips 2020

  5. arXiv:2006.09081  [pdf, other

    cs.CV cs.LG

    Progressive Skeletonization: Trimming more fat from a network at initialization

    Authors: Pau de Jorge, Amartya Sanyal, Harkirat S. Behl, Philip H. S. Torr, Gregory Rogez, Puneet K. Dokania

    Abstract: Recent studies have shown that skeletonization (pruning parameters) of networks \textit{at initialization} provides all the practical benefits of sparsity both at inference and training time, while only marginally degrading their performance. However, we observe that beyond a certain level of sparsity (approx $95\%$), these approaches fail to preserve the network performance, and to our surprise,… ▽ More

    Submitted 19 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

  6. arXiv:1905.07435  [pdf, other

    cs.LG cs.AI stat.ML

    Alpha MAML: Adaptive Model-Agnostic Meta-Learning

    Authors: Harkirat Singh Behl, Atılım Güneş Baydin, Philip H. S. Torr

    Abstract: Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of learning tasks in a way that primes the model for few-shot learning of new tasks. The MAML algorithm performs well on few-shot learning problems in classification, regression, and fine-tuning of policy gradients in reinforcement learning, but comes with the need for costly hyperparameter tuning for… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: 6th ICML Workshop on Automated Machine Learning (2019)

    Journal ref: ICML Workshop on Automated Machine Learning (2019)

  7. arXiv:1812.01397  [pdf, other

    cs.CV

    Meta Learning Deep Visual Words for Fast Video Object Segmentation

    Authors: Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H. S. Torr

    Abstract: Personal robots and driverless cars need to be able to operate in novel environments and thus quickly and efficiently learn to recognise new object classes. We address this problem by considering the task of video object segmentation. Previous accurate methods for this task finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processi… ▽ More

    Submitted 16 August, 2020; v1 submitted 4 December, 2018; originally announced December 2018.

    Journal ref: In Proceedings of International Conference on Intelligent Robots and Systems (IROS) 2020

  8. arXiv:1704.01358  [pdf, other

    cs.CV

    Incremental Tube Construction for Human Action Detection

    Authors: Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr

    Abstract: Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and associ… ▽ More

    Submitted 23 July, 2018; v1 submitted 5 April, 2017; originally announced April 2017.

    Comments: British Machine Vision Conference (BMVC) 2018

  9. arXiv:1603.07819  [pdf

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.supr-con

    Ultrafast Dynamics of Vibrational Symmetry Breaking in a Charge-ordered Nickelate

    Authors: Giacomo Coslovich, Alexander F. Kemper, Sascha Behl, Bernhard Huber, Hans A. Bechtel, Takao Sasagawa, Michael C. Martin, Alessandra Lanzara, Robert A. Kaindl

    Abstract: The ability to probe symmetry breaking transitions on their natural time scales is one of the key challenges in nonequilibrium physics. Stripe ordering represents an intriguing type of broken symmetry, where complex interactions result in atomic-scale lines of charge and spin density. Although phonon anomalies and periodic distortions attest the importance of electron-phonon coupling in the format… ▽ More

    Submitted 30 November, 2017; v1 submitted 25 March, 2016; originally announced March 2016.

    Comments: 21 pages, 4 figures; updated version with journal ref

    Journal ref: Science Advances 3, e1600735 (2017)