Skip to main content

Showing 1–20 of 20 results for author: Chia, Y K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20267  [pdf, other

    cs.CL

    Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

    Authors: Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing

    Abstract: As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustwort… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2403.13315  [pdf, other

    cs.CV

    PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

    Authors: Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria

    Abstract: Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. Wit… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  3. arXiv:2312.00738  [pdf, other

    cs.CL

    SeaLLMs -- Large Language Models for Southeast Asia

    Authors: Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Zhiqiang Hu, Chenhui Shen, Yew Ken Chia, Xingxuan Li, Jianyu Wang, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang, Chaoqun Liu, Hang Zhang, Lidong Bing

    Abstract: Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are buil… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Technical report, ACL 2024 DEMO TRACK

  4. arXiv:2311.09277  [pdf, other

    cs.CL

    Contrastive Chain-of-Thought Prompting

    Authors: Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing

    Abstract: Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mista… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  5. arXiv:2307.02053  [pdf, other

    cs.CL

    Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

    Authors: Deepanway Ghosal, Yew Ken Chia, Navonil Majumder, Soujanya Poria

    Abstract: Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skill… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  6. arXiv:2306.05179  [pdf, other

    cs.CL cs.CV

    M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

    Authors: Wenxuan Zhang, Sharifah Mahani Aljunied, Chang Gao, Yew Ken Chia, Lidong Bing

    Abstract: Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchm… ▽ More

    Submitted 9 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Datasets and Benchmarks)

  7. arXiv:2306.04757  [pdf, other

    cs.CL cs.AI

    INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

    Authors: Yew Ken Chia, Pengfei Hong, Lidong Bing, Soujanya Poria

    Abstract: Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding r… ▽ More

    Submitted 15 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Github: https://github.com/declare-lab/instruct-eval Leaderboard: https://declare-lab.github.io/instruct-eval/

  8. arXiv:2305.14434  [pdf, other

    cs.CL

    Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

    Authors: Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We suppor… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2305.13269  [pdf, other

    cs.CL

    Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

    Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

    Abstract: We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by ICLR 2024

  10. arXiv:2304.11076  [pdf, other

    cs.CL cs.AI

    Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

    Authors: Ruochen Zhao, Xingxuan Li, Yew Ken Chia, Bosheng Ding, Lidong Bing

    Abstract: Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we sh… ▽ More

    Submitted 2 March, 2023; originally announced April 2023.

  11. arXiv:2212.10450  [pdf, other

    cs.CL

    Is GPT-3 a Good Data Annotator?

    Authors: Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Shafiq Joty, Boyang Li, Lidong Bing

    Abstract: Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefor… ▽ More

    Submitted 14 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted by ACL 2023

  12. arXiv:2211.10018  [pdf, other

    cs.CL

    A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

    Authors: Yew Ken Chia, Lidong Bing, Sharifah Mahani Aljunied, Luo Si, Soujanya Poria

    Abstract: Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard Universi… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 19 pages, 6 figures, accepted by EMNLP 2022

  13. arXiv:2203.09101  [pdf, other

    cs.CL

    RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction

    Authors: Yew Ken Chia, Lidong Bing, Soujanya Poria, Luo Si

    Abstract: Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation labe… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 13 pages, 9 figures, to appear in ACL Findings 2022

  14. arXiv:2107.12214  [pdf, other

    cs.CL

    Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction

    Authors: Lu Xu, Yew Ken Chia, Lidong Bing

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which con… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: ACL 2021, long paper, main conference

  15. arXiv:2012.14164  [pdf, other

    cs.CL cs.AI cs.IR

    Red Dragon AI at TextGraphs 2020 Shared Task: LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking

    Authors: Yew Ken Chia, Sam Witteveen, Martin Andrews

    Abstract: Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranki… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted paper for TextGraphs-14 workshop at COLING 2020. (6 pages including references)

  16. arXiv:1911.08976  [pdf, other

    cs.CL cs.AI cs.IR

    Red Dragon AI at TextGraphs 2019 Shared Task: Language Model Assisted Explanation Generation

    Authors: Yew Ken Chia, Sam Witteveen, Martin Andrews

    Abstract: The TextGraphs-13 Shared Task on Explanation Regeneration asked participants to develop methods to reconstruct gold explanations for elementary science questions. Red Dragon AI's entries used the language of the questions and explanation text directly, rather than a constructing a separate graph-like representation. Our leaderboard submission placed us 3rd in the competition, but we present here t… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted paper for TextGraphs-13 workshop at EMNLP-IJCNLP 2019. (5 pages including references)

  17. arXiv:1909.06273  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Scene Graph Parsing by Attention Graph

    Authors: Martin Andrews, Yew Ken Chia, Sam Witteveen

    Abstract: Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs. In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: Accepted paper for the ViGIL workshop at NeurIPS 2018. (4 pages + references)

  18. arXiv:1909.03508  [pdf, ps, other

    cs.LG cs.CL cs.IR stat.ML

    Transformer to CNN: Label-scarce distillation for efficient text classification

    Authors: Yew Ken Chia, Sam Witteveen, Martin Andrews

    Abstract: Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

    Comments: Accepted paper for CDNNRIA workshop at NeurIPS 2018. (3 pages + references)

  19. Energy-Efficient, Large-scale Distributed-Antenna System (L-DAS) for Multiple Users

    Authors: **gon Joung, Yeow Khiang Chia, Sumei Sun

    Abstract: Large-scale distributed-antenna system (L-DAS) with very large number of distributed antennas, possibly up to a few hundred antennas, is considered. A few major issues of the L-DAS, such as high latency, energy consumption, computational complexity, and large feedback (signaling) overhead, are identified. The potential capability of the L-DAS is illuminated in terms of an energy efficiency (EE) th… ▽ More

    Submitted 20 January, 2014; v1 submitted 6 December, 2013; originally announced December 2013.

    Comments: 29 pages, 7 figures, submitted to JSTSP

  20. arXiv:1010.3726  [pdf, ps, other

    cs.IT

    Cascade, Triangular and Two Way Source Coding with degraded side information at the second user

    Authors: Yeow Khiang Chia, Haim Permuter, Tsachy Weissman

    Abstract: We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning t… ▽ More

    Submitted 18 October, 2010; originally announced October 2010.

    Comments: 29 pages, 9 figures