Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Sam Ade Jacobs,
Ammar Ahmad Awan,
Jyoti Aneja,
Ahmed Awadallah,
Hany Awadalla,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Qin Cai,
Martin Cai,
Caio César Teodoro Mendes,
Weizhu Chen,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Parul Chopra
, et al. (90 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.
△ Less
Submitted 23 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
Textbooks Are All You Need
Authors:
Suriya Gunasekar,
Yi Zhang,
Jyoti Aneja,
Caio César Teodoro Mendes,
Allie Del Giorno,
Sivakanth Gopi,
Mojan Javaheripi,
Piero Kauffmann,
Gustavo de Rosa,
Olli Saarikivi,
Adil Salim,
Shital Shah,
Harkirat Singh Behl,
Xin Wang,
Sébastien Bubeck,
Ronen Eldan,
Adam Tauman Kalai,
Yin Tat Lee,
Yuanzhi Li
Abstract:
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu…
▽ More
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.
△ Less
Submitted 2 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.