Skip to main content

Showing 1–6 of 6 results for author: Ngo, N T

.
  1. arXiv:2309.09400  [pdf, other

    cs.CL cs.AI

    CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

    Authors: Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

    Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, es… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: Ongoing Work

  2. arXiv:2307.16039  [pdf, other

    cs.CL cs.LG

    Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

    Authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

    Abstract: A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercia… ▽ More

    Submitted 1 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

  3. arXiv:2304.05613  [pdf, other

    cs.CL cs.AI

    ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

    Authors: Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen

    Abstract: Over the last few years, large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) that fundamentally transform research and developments in the field. ChatGPT represents one of the most exciting LLM systems developed recently to showcase impressive skills for language generation and highly attract public attention. Among various exciting ap… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  4. arXiv:2206.01964  [pdf, ps, other

    math.RT math.OA

    Indecomposable characters on direct limit of symmetric groups with diagonal embeddings

    Authors: N. Nessonov, N. T. S. Ngo

    Abstract: In this paper we obtain the complete description of all indecomposable characters (central positive-definite functions) of inductive limits of the symmetric groups under block diagonal embedding. As a corollary we obtain the full classification of the isomorphism classes of these inductive limits.

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: 34 pages

    MSC Class: 20C32 Representations of infinite symmetric groups

  5. arXiv:2202.08316  [pdf, other

    cs.CL

    FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

    Authors: Minh Van Nguyen, Nghia Trung Ngo, Bonan Min, Thien Huu Nguyen

    Abstract: This paper presents FAMIE, a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. This hinders the engagement, pr… ▽ More

    Submitted 4 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted to NAACL 2022 (System Demonstrations)

  6. arXiv:2111.08754  [pdf, ps, other

    math.AG math.AC math.RT

    $\mathrm{GL}_n$-structure and principal $\mathfrak{sl}_2$-triple on the cohomology ring of complex Grassmannian

    Authors: Nhok Tkhai Shon Ngo

    Abstract: In this note we describe the cohomology ring of the Grassmannian of $k$-planes in $n$-dimensional complex vector space as an $\mathrm{GL}_n$-module. We give explicit formulas for the operators of its principal $\mathfrak{sl}_2$-triple. It is proved that one of these operators corresponds to the shifted cohomology degree operator and the second operator coincides with the Lefschetz map on cohomolog… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.