Skip to main content

Showing 1–25 of 25 results for author: Awadalla, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2403.08002  [pdf, other

    cs.CL cs.CV

    Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

    Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Akshay Chaudhari, Serena Yeung-Levy, Curtis P. Langlotz , et al. (2 additional authors not shown)

    Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More

    Submitted 26 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2402.11451  [pdf, other

    cs.CL cs.AI

    SciAgent: Tool-augmented Language Models for Scientific Reasoning

    Authors: Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen

    Abstract: Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  4. arXiv:2310.15987  [pdf, other

    cs.CL cs.AI

    Dissecting In-Context Learning of Translations in GPTs

    Authors: Vikas Raunak, Hany Hassan Awadalla, Arul Menezes

    Abstract: Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of t… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP Findings (+ Minor Updates over Camera-Ready)

  5. arXiv:2310.02410  [pdf, other

    cs.LG cs.CL

    Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

    Authors: Young ** Kim, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamental issue of larger memory consumption and increased memory bandwidth bottleneck at deployment time. In this paper, we propose Mixture of Quantized Expe… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  6. arXiv:2309.11674  [pdf, other

    cs.CL

    A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

    Authors: Haoran Xu, Young ** Kim, Amr Sharaf, Hany Hassan Awadalla

    Abstract: Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities… ▽ More

    Submitted 6 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at ICLR 2024

  7. arXiv:2308.15772  [pdf, other

    cs.CL

    Task-Based MoE for Multitask Multilingual Machine Translation

    Authors: Hai Pham, Young ** Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

    Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic… ▽ More

    Submitted 24 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  8. arXiv:2308.09723  [pdf, other

    cs.LG cs.CL

    FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

    Authors: Young ** Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the auto-regressive decoding process. To address these issues, we propose an efficient… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  9. arXiv:2305.16806  [pdf, other

    cs.CL cs.AI

    Do GPTs Produce Less Literal Translations?

    Authors: Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla

    Abstract: Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations diff… ▽ More

    Submitted 5 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  10. arXiv:2304.14802  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    ResiDual: Transformer with Dual Residual Connections

    Authors: Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

    Abstract: Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still debated. Two widely used variants are the Post-Layer-Normalization (Post-LN) and Pre-Layer-Normalization (Pre-LN) Transformers, which apply layer normaliz… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  11. arXiv:2302.09210  [pdf, other

    cs.CL

    How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

    Authors: Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young ** Kim, Mohamed Afify, Hany Hassan Awadalla

    Abstract: Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art researc… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  12. arXiv:2211.10017  [pdf, other

    cs.CL cs.AI cs.LG

    Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

    Authors: Young ** Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters. As a result, these models have achieved significantly better quality on various natural language processing tasks including machine translation. However, it remains challenging to deploy such models in real-life scenarios due to the large mem… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted to SustaiNLP 2022 (EMNLP 2022)

  13. arXiv:2208.09770  [pdf, other

    cs.CL cs.AI

    Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

    Authors: Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang

    Abstract: This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is c… ▽ More

    Submitted 7 June, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

    Comments: 16 pages, 3 figures. Accepted as long paper in main conference of ACL 2023

    MSC Class: cs.CL; cs.GL ACM Class: I.2; I.7

  14. arXiv:2208.05852  [pdf, other

    cs.CL cs.LG

    Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

    Authors: Muhammad ElNokrashy, Amr Hendy, Mohamed Maher, Mohamed Afify, Hany Hassan Awadalla

    Abstract: This paper proposes a simple yet effective method to improve direct (X-to-Y) translation for both cases: zero-shot and when direct data is available. We modify the input tokens at both the encoder and decoder to include signals for the source and target languages. We show a performance gain when training from scratch, or finetuning a pretrained model with the proposed setup. In the experiments, ou… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 10 pages, accepted at AMTA-2022 (Association for Machine Translation in the Americas Conference)

  15. arXiv:2206.14982  [pdf, other

    cs.CL cs.AI

    Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations

    Authors: Akiko Eriguchi, Shufang Xie, Tao Qin, Hany Hassan Awadalla

    Abstract: Multilingual Neural Machine Translation (MNMT) enables one system to translate sentences from multiple source languages to multiple target languages, greatly reducing deployment costs compared with conventional bilingual systems. The MNMT training benefit, however, is often limited to many-to-one directions. The model suffers from poor performance in one-to-many and many-to-many with zero-shot set… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: NAACL 2022

  16. arXiv:2205.14336  [pdf, other

    cs.LG

    Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

    Authors: Rui Liu, Young ** Kim, Alexandre Muzio, Hany Hassan Awadalla

    Abstract: Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achieve this, MoE models replace the feedforward sub-layer with Mixture-of-Experts sub-layer in transformers and use a gating network to route each token to… ▽ More

    Submitted 4 July, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: Accepted to ICML 2022

  17. arXiv:2111.13284  [pdf, other

    cs.CL

    Ensembling of Distilled Models from Multi-task Teachers for Constrained Resource Language Pairs

    Authors: Amr Hendy, Esraa A. Gad, Mohamed Abdelghaffar, Jailan S. ElMosalami, Mohamed Afify, Ahmed Y. Tawfik, Hany Hassan Awadalla

    Abstract: This paper describes our submission to the constrained track of WMT21 shared news translation task. We focus on the three relatively low resource language pairs Bengali to and from Hindi, English to and from Hausa, and Xhosa to and from Zulu. To overcome the limitation of relatively low parallel data we train a multilingual model using a multitask objective employing both parallel and monolingual… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  18. arXiv:2111.02086  [pdf, other

    cs.CL

    Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

    Authors: Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

    Abstract: This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM\footnote{\url{https://… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: WMT21

  19. arXiv:2109.10465  [pdf, other

    cs.CL cs.AI cs.LG

    Scalable and Efficient MoE Training for Multitask Multilingual Models

    Authors: Young ** Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

    Abstract: The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE tra… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  20. arXiv:2106.13736  [pdf, other

    cs.CL

    DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

    Authors: Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

    Abstract: While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model… ▽ More

    Submitted 17 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Work in progress

  21. arXiv:2012.15547  [pdf, other

    cs.CL

    XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

    Authors: Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

    Abstract: Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fi… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  22. arXiv:2011.07933  [pdf, other

    cs.CL

    Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions

    Authors: Muhammad N. ElNokrashy, Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, Ahmed Tawfik, Hany Hassan Awadalla

    Abstract: This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvemen… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: Accepted at WMT20 (EMNLP 2020 Fifth Conference on Machine Translation)

  23. arXiv:2010.13382  [pdf, other

    cs.CL

    FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

    Authors: Young ** Kim, Hany Hassan Awadalla

    Abstract: Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time pe… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to SustaiNLP 2020 at EMNLP 2020

  24. arXiv:2010.02523  [pdf, other

    cs.CL cs.LG

    Multi-task Learning for Multilingual Neural Machine Translation

    Authors: Yiren Wang, ChengXiang Zhai, Hany Hassan Awadalla

    Abstract: While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingua… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  25. arXiv:1511.01042  [pdf, other

    cs.CL cs.LG cs.NE

    Detecting Interrogative Utterances with Recurrent Neural Networks

    Authors: Junyoung Chung, Jacob Devlin, Hany Hassan Awadalla

    Abstract: In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement. We com- pare the outcomes of regularization methods that are popularly used to train deep neural networks and study how different context functions can affect the classification performance. We also compare the efficacy of gated activation… ▽ More

    Submitted 15 November, 2015; v1 submitted 3 November, 2015; originally announced November 2015.

    Comments: 6 pages, accepted to NIPS 2015 Workshop on Machine Learning for Spoken Language Understanding and Interaction