Skip to main content

Showing 1–7 of 7 results for author: Poulton, A

.
  1. arXiv:2406.10229  [pdf, other

    cs.LG cs.AI

    Quantifying Variance in Evaluation Benchmarks

    Authors: Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

    Abstract: Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  3. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  4. arXiv:2211.09085  [pdf, other

    cs.CL stat.ML

    Galactica: A Large Language Model for Science

    Authors: Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

    Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can sto… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  5. arXiv:1312.4475  [pdf, ps, other

    math.RT

    Almost split sequences for Knorr lattices

    Authors: Andrew Poulton

    Abstract: Let $O$ be a complete d.v.r. and $G$ a finite group. We give two applications of an adjunction in the stable category of $OG$. The first application gives necessary and sufficient conditions for the middle term of an almost split sequence terminating in a Knorr lattice to be indecomposable. The second characterises the stable endomorphism rings of Heller lattices of kG-modules.

    Submitted 15 March, 2014; v1 submitted 16 December, 2013; originally announced December 2013.

  6. arXiv:1204.4459  [pdf

    cs.NI

    An Interference-Aware Virtual Clustering Paradigm for Resource Management in Cognitive Femtocell Networks

    Authors: Faisal Tariq, Laurence S. Dooley, Adrian S. Poulton

    Abstract: Femtocells represent a promising alternative solution for high quality wireless access in indoor scenarios where conventional cellular system coverage can be poor. Femtocell access points (FAP) are normally randomly deployed by the end user, so only post deployment network planning is possible. Furthermore, this uncoordinated deployment creates the potential for severe interference to co-located f… ▽ More

    Submitted 19 April, 2012; originally announced April 2012.

  7. arXiv:0911.2672  [pdf, ps, other

    math.CO math.GR

    Maps admitting trialities but not dualities

    Authors: Gareth A. Jones, Andrew Poulton

    Abstract: We use group theory to construct infinite families of maps on surfaces which are invariant under Wilson's map operations of order 3 but not under the operations of order 2, such as duality and Petrie duality.

    Submitted 13 November, 2009; originally announced November 2009.

    Comments: 19 pages, 1 figure

    MSC Class: 05C25; 05C10; 20B25