Skip to main content

Showing 1–12 of 12 results for author: Do, X L

.
  1. arXiv:2403.16685  [pdf, other

    cs.CL cs.CY

    ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

    Authors: Nhat M. Hoang, Xuan Long Do, Duc Anh Do, Duc Anh Vu, Luu Anh Tuan

    Abstract: The proliferation of online toxic speech is a pertinent problem posing threats to demographic groups. While explicit toxic speech contains offensive lexical signals, implicit one consists of coded or indirect language. Therefore, it is crucial for models not only to detect implicit toxic speech but also to explain its toxicity. This draws a unique need for unified frameworks that can effectively d… ▽ More

    Submitted 20 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted at NAACL 2024 (Main Conference)

  2. arXiv:2403.01251  [pdf, other

    cs.CL

    Accelerating Greedy Coordinate Gradient via Probe Sampling

    Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

    Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  3. arXiv:2312.10610  [pdf, other

    cs.CL

    Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization

    Authors: Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty

    Abstract: A number of tasks have been proposed recently to facilitate easy access to charts such as chart QA and summarization. The dominant paradigm to solve these tasks has been to fine-tune a pretrained model on the task data. However, this approach is not only expensive but also not generalizable to unseen tasks. On the other hand, large language models (LLMs) have shown impressive generalization capabi… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 23 pages

  4. arXiv:2312.02614  [pdf, other

    cs.LG cs.CL

    Prompt Optimization via Adversarial In-Context Learning

    Authors: Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

    Abstract: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough outp… ▽ More

    Submitted 22 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  5. arXiv:2312.01661  [pdf, other

    cs.CL cs.AI

    ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

    Authors: Phuoc Pham Van Long, Duc Anh Vu, Nhat M. Hoang, Xuan Long Do, Anh Tuan Luu

    Abstract: Mathematical questioning is crucial for assessing students problem-solving skills. Since manually creating such questions requires substantial effort, automatic methods have been explored. Existing state-of-the-art models rely on fine-tuning strategies and struggle to generate questions that heavily involve multiple steps of logical and arithmetic reasoning. Meanwhile, large language models(LLMs)… ▽ More

    Submitted 27 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted at the 39th ACM/SIGAPP Symposium On Applied Computing (SAC 2024), Main Conference

  6. arXiv:2311.08385  [pdf, other

    cs.CL

    ChOiRe: Characterizing and Predicting Human Opinions with Chain of Opinion Reasoning

    Authors: Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

    Abstract: Aligning language models (LMs) with human opinion is challenging yet vital to enhance their grasp of human values, preferences, and beliefs. We present ChOiRe, a four-step framework to predict human opinion which differentially models the user explicit personae (i.e. demographic or ideological attributes) that are manually declared, and implicit personae inferred from user historical opinions. ChO… ▽ More

    Submitted 27 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 22 pages

  7. arXiv:2305.14761  [pdf, other

    cs.CL

    UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

    Authors: Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty

    Abstract: Charts are very popular for analyzing data, visualizing key insights and answering complex reasoning questions about data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently such as chart question answering and chart summarization. However, most of the methods that solve these tasks use pretraining on language or vision-language t… ▽ More

    Submitted 10 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  8. arXiv:2305.03088  [pdf, other

    cs.CL cs.AI

    Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

    Authors: Xuan Long Do, Bowei Zou, Shafiq Joty, Anh Tai Tran, Liangming Pan, Nancy F. Chen, Ai Ti Aw

    Abstract: Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 17 pages, ACL 2023

  9. arXiv:2303.10868  [pdf, other

    cs.CL

    Retrieving Multimodal Information for Augmented Generation: A Survey

    Authors: Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, Shafiq Joty

    Abstract: As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multim… ▽ More

    Submitted 30 November, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  10. arXiv:2303.03004  [pdf, other

    cs.CL

    xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

    Authors: Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty

    Abstract: Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments. However, the evaluation of these models has often been performed in a scattered way on only one or two specific tasks, in a few languages, at a partial granularit… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Code & Data available at https://github.com/ntunlp/xCodeEval, https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval. Evaluation framework available at https://github.com/ntunlp/execeval

  11. arXiv:2210.06628  [pdf, other

    cs.LG cs.CL

    OpenCQA: Open-ended Question Answering with Charts

    Authors: Shankar Kantharaj, Xuan Long Do, Rixie Tiffany Ko Leong, Jia Qing Tan, Enamul Hoque, Shafiq Joty

    Abstract: Charts are very popular to analyze data and convey important insights. People often analyze visualizations to answer open-ended questions that require explanatory answers. Answering such questions are often difficult and time-consuming as it requires a lot of cognitive and perceptual efforts. To address this challenge, we introduce a new task called OpenCQA, where the goal is to answer an open-end… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  12. arXiv:2209.06652  [pdf, other

    cs.CL

    CoHS-CQG: Context and History Selection for Conversational Question Generation

    Authors: Xuan Long Do, Bowei Zou, Liangming Pan, Nancy F. Chen, Shafiq Joty, Ai Ti Aw

    Abstract: Conversational question generation (CQG) serves as a vital task for machines to assist humans, such as interactive reading comprehension, through conversations. Compared to traditional single-turn question generation (SQG), CQG is more challenging in the sense that the generated question is required not only to be meaningful, but also to align with the occurred conversation history. While previous… ▽ More

    Submitted 10 October, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted by 29th International Conference on Computational Linguistics (COLING 2022)