Skip to main content

Showing 1–50 of 88 results for author: Xiong, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00875  [pdf, other

    cs.CL cs.AI

    MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

    Authors: Tianhao Li, Shangjie Li, Binbin Xie, Deyi Xiong, Baosong Yang

    Abstract: The advent of large language models (LLMs) has predominantly catered to high-resource languages, leaving a disparity in performance for low-resource languages. Conventional Continual Training (CT) approaches to bridge this gap often undermine a model's original linguistic proficiency when expanding to multilingual contexts. Addressing this issue, we introduce a novel MoE-CT architecture, a paradig… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 13 pages, 2 figures

  2. arXiv:2406.18406  [pdf, other

    cs.CL cs.AI

    IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

    Authors: Dan Shi, Renren **, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong

    Abstract: It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Ide… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, 5 tables

  3. arXiv:2406.07081  [pdf, other

    cs.CL

    Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

    Authors: Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 long paper (Findings)

  4. arXiv:2406.04752  [pdf, other

    cs.CL

    CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

    Authors: Ling Shi, Deyi Xiong

    Abstract: Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. T… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 28 pages, 5 figures

  5. arXiv:2405.17840  [pdf, other

    cs.CL

    Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents

    Authors: Andrew H. Lee, Sina J. Semnani, Galo Castillo-López, Gäel de Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen, Manish Shrivastava, Deyi Xiong, Monica S. Lam

    Abstract: Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are mor… ▽ More

    Submitted 16 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.15208  [pdf, other

    cs.CL cs.AI

    Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

    Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, **gyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

    Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at LREC-COLING 2024

  7. arXiv:2405.13578  [pdf, other

    cs.CL

    ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation

    Authors: Weilong Dong, Xinwei Wu, Renren **, Shaoyang Xu, Deyi Xiong

    Abstract: Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  8. arXiv:2405.10166  [pdf, other

    cs.CL cs.PF

    LFED: A Literary Fiction Evaluation Dataset for Large Language Models

    Authors: Linhao Yu, Qun Liu, Deyi Xiong

    Abstract: The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  9. arXiv:2405.07673  [pdf, other

    cs.CL

    An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation

    Authors: Supryadi, Leiyu Pan, Deyi Xiong

    Abstract: Massively multilingual neural machine translation (MMNMT) has been proven to enhance the translation quality of low-resource languages. In this paper, we empirically investigate the translation robustness of Indonesian-Chinese translation in the face of various naturally occurring noise. To assess this, we create a robustness evaluation benchmark dataset for Indonesian-Chinese translation. This da… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  10. arXiv:2403.12601  [pdf, other

    cs.CL

    LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models

    Authors: Chuang Liu, Renren **, Yuqi Ren, Deyi Xiong

    Abstract: Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications. However, the existing benchmarks for comprehensively evaluating these LLMs are still insufficient, particularly in terms of measuring knowledge that LLMs capture. Current datasets collect questions from Chinese examinations across different subjects and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  11. arXiv:2403.12316  [pdf, other

    cs.CL

    OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

    Authors: Chuang Liu, Linhao Yu, Jiaxuan Li, Renren **, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, **wang Song, Hongying Zan, Sun Li, Deyi Xiong

    Abstract: The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that b… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  12. arXiv:2403.07747  [pdf, other

    cs.CL cs.AI

    FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models

    Authors: Yan Liu, Renren **, Lin Shi, Zheng Yao, Deyi Xiong

    Abstract: To thoroughly assess the mathematical reasoning abilities of Large Language Models (LLMs), we need to carefully curate evaluation datasets covering diverse mathematical concepts and mathematical problems at different difficulty levels. In pursuit of this objective, we propose FineMath in this paper, a fine-grained mathematical evaluation benchmark dataset for assessing Chinese LLMs. FineMath is cr… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  13. arXiv:2402.18120  [pdf, other

    cs.CL

    Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

    Authors: Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, Deyi Xiong

    Abstract: Prior research in representation engineering has revealed that LLMs encode concepts within their representation spaces, predominantly centered around English. In this study, we extend this philosophy to a multilingual scenario, delving into multilingual human value concepts in LLMs. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  14. arXiv:2402.18023  [pdf, other

    cs.AI cs.CL

    Do Large Language Models Mirror Cognitive Language Processing?

    Authors: Yuqi Ren, Renren **, Tongxuan Zhang, Deyi Xiong

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embedding… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  15. arXiv:2402.16775  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation of Quantization Strategies for Large Language Models

    Authors: Renren **, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

    Abstract: Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantizatio… ▽ More

    Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  16. arXiv:2402.08788  [pdf

    cs.CL cs.SD eess.AS

    Syllable based DNN-HMM Cantonese Speech to Text System

    Authors: Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng

    Abstract: This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventi… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 7 pages, 3 figures, LREC 2016

    MSC Class: 94-06 ACM Class: I.2.7

  17. arXiv:2312.16132  [pdf, other

    cs.CL

    RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models

    Authors: Tianhao Shen, Sun Li, Quan Tu, Deyi Xiong

    Abstract: The rapid evolution of large language models necessitates effective benchmarks for evaluating their role knowledge, which is essential for establishing connections with the real world and providing more immersive interactions. This paper introduces RoleEval, a bilingual benchmark designed to assess the memorization, utilization, and reasoning capabilities of role knowledge. RoleEval comprises Role… ▽ More

    Submitted 16 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Our dataset is available at https://github.com/Magnetic2014/RoleEval

  18. arXiv:2312.12853  [pdf, other

    cs.CL

    CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

    Authors: Dan Shi, Chaobin You, Jiantao Huang, Taihao Li, Deyi Xiong

    Abstract: As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense k… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  19. arXiv:2312.03017  [pdf, other

    cs.LG physics.optics

    AI-driven emergence of frequency information non-uniform distribution via THz metasurface spectrum prediction

    Authors: Xiaohua Xing, Yuqi Ren, Die Zou, Qiankun Zhang, Bingxuan Mao, Jianquan Yao, Deyi Xiong, Shuang Zhang, Liang Wu

    Abstract: Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of auton… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  20. arXiv:2311.09829  [pdf, other

    cs.CL

    FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models

    Authors: Yimin **g, Renren **, Jiahao Hu, Huishi Qiu, Xiaohua Wang, Peng Wang, Deyi Xiong

    Abstract: The effective assessment of the instruction-following ability of large language models (LLMs) is of paramount importance. A model that cannot adhere to human instructions might be not able to provide reliable and helpful responses. In pursuit of this goal, various benchmarks have been constructed to evaluate the instruction-following capacity of these models. However, these benchmarks are limited… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Work in progress

  21. arXiv:2311.03788  [pdf, other

    cs.CL

    Language Representation Projection: Can We Transfer Factual Knowledge across Languages in Multilingual Language Models?

    Authors: Shaoyang Xu, Junzhuo Li, Deyi Xiong

    Abstract: Multilingual pretrained language models serve as repositories of multilingual factual knowledge. Nevertheless, a substantial performance gap of factual knowledge probing exists between high-resource languages and low-resource languages, suggesting limited implicit factual knowledge transfer across languages in multilingual pretrained language models. This paper investigates the feasibility of expl… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP 2023

  22. arXiv:2310.20456  [pdf, other

    cs.CL

    Towards a Deep Understanding of Multilingual End-to-End Speech Translation

    Authors: Haoran Sun, Xiaohu Zhao, Yikun Lei, Shaolin Zhu, Deyi Xiong

    Abstract: In this paper, we employ Singular Value Canonical Correlation Analysis (SVCCA) to analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages. SVCCA enables us to estimate representational similarity across languages and layers, enhancing our understanding of the functionality of multilingual speech translation and its potential connection to mult… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  23. arXiv:2310.20162  [pdf, other

    cs.AI

    Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?

    Authors: Leiyu Pan, Supryadi, Deyi Xiong

    Abstract: Robustness, the ability of models to maintain performance in the face of perturbations, is critical for develo** reliable NLP systems. Recent studies have shown promising results in improving the robustness of models through adversarial training and data augmentation. However, in machine translation, most of these studies have focused on bilingual machine translation with a single translation di… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  24. arXiv:2310.20138  [pdf, other

    cs.CR cs.CL

    DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models

    Authors: Xinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao Bian, Deyi Xiong

    Abstract: Large language models pretrained on a huge amount of data capture rich knowledge and information in the training data. The ability of data memorization and regurgitation in pretrained language models, revealed in previous studies, brings the risk of data leakage. In order to effectively reduce these risks, we propose a framework DEPN to Detect and Edit Privacy Neurons in pretrained language models… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  25. arXiv:2310.19736  [pdf, other

    cs.CL cs.AI

    Evaluating Large Language Models: A Comprehensive Survey

    Authors: Zishan Guo, Renren **, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rap… ▽ More

    Submitted 25 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 111 pages

  26. arXiv:2309.15025  [pdf, other

    cs.CL cs.AI

    Large Language Model Alignment: A Survey

    Authors: Tianhao Shen, Renren **, Yufei Huang, Chuang Liu, Weilong Dong, Zishan Guo, Xinwei Wu, Yan Liu, Deyi Xiong

    Abstract: Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast; however, they may yield texts that are imprecise, misleading, or even detrimental. Consequently, it becomes paramount to employ alignment techniques to ensure th… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 76 pages

  27. arXiv:2307.13808  [pdf, other

    cs.CL cs.CR

    Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

    Authors: Yu Fu, Deyi Xiong, Yue Dong

    Abstract: To mitigate potential risks associated with language models, recent AI detection research proposes incorporating watermarks into machine-generated text through random vocabulary restrictions and utilizing this information for detection. While these watermarks only induce a slight deterioration in perplexity, our empirical investigation reveals a significant detriment to the performance of conditio… ▽ More

    Submitted 13 February, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: 8 pages, 6 figures (accepted to AAAI 2024)

  28. arXiv:2306.17674  [pdf, other

    cs.CL

    X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

    Authors: Mehrad Moradshahi, Tianhao Shen, Kalika Bali, Monojit Choudhury, Gaël de Chalendar, Anmol Goel, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Nasredine Semmar, Sina J. Semnani, Jiwon Seo, Vivek Seshadri, Manish Shrivastava, Michael Sun, Aditya Yadavalli, Chaobin You, Deyi Xiong, Monica S. Lam

    Abstract: Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-H… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 Findings

  29. arXiv:2306.16244  [pdf, other

    cs.CL cs.AI

    CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

    Authors: Yufei Huang, Deyi Xiong

    Abstract: Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  30. arXiv:2305.10263  [pdf, other

    cs.CL

    M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

    Authors: Chuang Liu, Renren **, Yuqi Ren, Linhao Yu, Tianyu Dong, Xiaohan Peng, Shuting Zhang, Jianxiang Peng, Peiyi Zhang, Qingqing Lyu, Xiaowen Su, Qun Liu, Deyi Xiong

    Abstract: Large language models have recently made tremendous progress in a variety of aspects, e.g., cross-task generalization, instruction following. Comprehensively evaluating the capability of large language models in multiple tasks is of great importance. In this paper, we propose M3KE, a Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark, which is developed to measure knowledge acquired… ▽ More

    Submitted 20 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  31. Efficient Halftoning via Deep Reinforcement Learning

    Authors: Haitian Jiang, Dongliang Xiong, Xiaowen Jiang, Li Ding, Liang Chen, Kai Huang

    Abstract: Halftoning aims to reproduce a continuous-tone image with pixels whose intensities are constrained to two discrete levels. This technique has been deployed on every printer, and the majority of them adopt fast methods (e.g., ordered dithering, error diffusion) that fail to render structural details, which determine halftone's quality. Other prior methods of pursuing visual pleasure by searching fo… ▽ More

    Submitted 12 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Journal ref: IEEE Transactions on Image Processing (TIP), 2023

  32. arXiv:2212.09917  [pdf, other

    cs.CL

    Inverse Reinforcement Learning for Text Summarization

    Authors: Yu Fu, Deyi Xiong, Yue Dong

    Abstract: We introduce inverse reinforcement learning (IRL) as an effective paradigm for training abstractive summarization models, imitating human summarization behaviors. Our IRL model estimates the reward function using a suite of important sub-rewards for summarization and concurrently optimizes the policy network. Experimental results across datasets in different domains (CNN/DailyMail and WikiHow) and… ▽ More

    Submitted 4 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 8 pages, 2 figures; accepted to Findings of EMNLP 2013

  33. arXiv:2212.08354  [pdf, ps, other

    cs.CL

    FewFedWeight: Few-shot Federated Learning Framework across Multiple NLP Tasks

    Authors: Weilong Dong, Xinwei Wu, Junzhuo Li, Shuangzhi Wu, Chao Bian, Deyi Xiong

    Abstract: Massively multi-task learning with large language models has recently made substantial progress on few-shot generalization. However, this is usually performed in a centralized learning fashion, ignoring the privacy sensitivity issue of (annotated) data used in multiple tasks. To mitigate this issue, we propose FewFedWeight, a few-shot federated learning framework across multiple tasks, to achieve… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  34. arXiv:2212.08349  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework

    Authors: Junzhuo Li, Xinwei Wu, Weilong Dong, Shuangzhi Wu, Chao Bian, Deyi Xiong

    Abstract: Knowledge distillation (KD) has been widely used for model compression and knowledge transfer. Typically, a big teacher model trained on sufficient data transfers knowledge to a small student model. However, despite the success of KD, little effort has been made to study whether KD leaks the training data of the teacher model. In this paper, we experimentally reveal that KD suffers from the risk o… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  35. arXiv:2211.03462  [pdf, other

    cs.CL

    NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering

    Authors: Tengxun Zhang, Hongfei Xu, Josef van Genabith, Deyi Xiong, Hongying Zan

    Abstract: Hybrid tabular-textual question answering (QA) requires reasoning from heterogeneous information, and the types of reasoning are mainly divided into numerical reasoning and span extraction. Current numerical reasoning methods autoregressively decode program sequences, and each decoding step produces either an operator or an operand. However, the step-by-step decoding suffers from exposure bias, an… ▽ More

    Submitted 13 October, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  36. arXiv:2209.01530  [pdf, other

    cs.CL

    Informative Language Representation Learning for Massively Multilingual Neural Machine Translation

    Authors: Renren **, Deyi Xiong

    Abstract: In a multilingual neural machine translation model that fully shares parameters across all languages, an artificial language token is usually used to guide translation into the desired target language. However, recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions, especially on zero-shot… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: Accepted by COLING 2022

  37. arXiv:2208.04524  [pdf, other

    stat.ML cs.LG

    Multiple Instance Neural Networks Based on Sparse Attention for Cancer Detection using T-cell Receptor Sequences

    Authors: Younghoon Kim, Tao Wang, Danyi Xiong, Xinlei Wang, Seongoh Park

    Abstract: Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and mult… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

  38. Halftoning with Multi-Agent Deep Reinforcement Learning

    Authors: Haitian Jiang, Dongliang Xiong, Xiaowen Jiang, Aiguo Yin, Li Ding, Kai Huang

    Abstract: Deep neural networks have recently succeeded in digital halftoning using vanilla convolutional layers with high parallelism. However, existing deep methods fail to generate halftones with a satisfying blue-noise property and require complex training schemes. In this paper, we propose a halftoning method based on multi-agent deep reinforcement learning, called HALFTONERS, which learns a shared poli… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ICIP 2022

  39. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di **, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  40. arXiv:2206.04980  [pdf, other

    cs.CL

    Unsupervised and Few-shot Parsing from Pretrained Language Models

    Authors: Zhiyuan Zeng, Deyi Xiong

    Abstract: Pretrained language models are generally acknowledged to be able to encode syntax [Tenney et al., 2019, Jawahar et al., 2019, Hewitt and Manning, 2019]. In this article, we propose UPOA, an Unsupervised constituent Parsing model that calculates an Out Association score solely based on the self-attention weight matrix learned in a pretrained language model as the syntactic distance for span segment… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Published in Artificial Intelligence

    Journal ref: Artificial Intelligence, Volume 305, April 2022, 103665

  41. arXiv:2204.06175  [pdf, other

    cs.CL

    Efficient Cluster-Based k-Nearest-Neighbor Machine Translation

    Authors: Dexin Wang, Kai Fan, Boxing Chen, Deyi Xiong

    Abstract: k-Nearest-Neighbor Machine Translation (kNN-MT) has been recently proposed as a non-parametric solution for domain adaptation in neural machine translation (NMT). It aims to alleviate the performance degradation of advanced MT systems in translating out-of-domain sentences by coordinating with an additional token-level feature-based retrieval module constructed from in-domain data. Previous studie… ▽ More

    Submitted 3 May, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: 8 pages,6 figures, Accepted by ACL 2022 main conference

  42. Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension

    Authors: Linjuan Wu, Shaojuan Wu, Xiaowang Zhang, Deyi Xiong, Shizhan Chen, Zhiqiang Zhuang, Zhiyong Feng

    Abstract: Multilingual pre-trained models are able to zero-shot transfer knowledge from rich-resource to low-resource languages in machine reading comprehension (MRC). However, inherent linguistic discrepancies in different languages could make answer spans predicted by zero-shot transfer violate syntactic constraints of the target language. In this paper, we propose a novel multilingual MRC framework equip… ▽ More

    Submitted 14 January, 2023; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted to ACL 2022 (main conference)

  43. arXiv:2112.08831  [pdf, other

    cs.CL cs.AI

    Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network

    Authors: Yuqi Ren, Deyi Xiong

    Abstract: Cognitive processing signals can be used to improve natural language processing (NLP) tasks. However, it is not clear how these signals correlate with linguistic information. Bridging between human language processing and linguistic features has been widely studied in neurolinguistics, usually via single-variable controlled experiments with highly-controlled stimuli. Such methods not only compromi… ▽ More

    Submitted 18 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  44. Adversarial Attacks Against Deep Generative Models on Data: A Survey

    Authors: Hui Sun, Tianqing Zhu, Zhiqiu Zhang, Dawei **. ** Xiong, Wanlei Zhou

    Abstract: Deep generative models have gained much attention given their ability to generate data for applications as varied as healthcare to financial technology to surveillance, and many more - the most popular models being generative adversarial networks and variational auto-encoders. Yet, as with all machine learning models, ever is the concern over security breaches and privacy leaks and deep generative… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: To be published in IEEE Transactions on Knowledge and Data Engineering

  45. arXiv:2108.12137  [pdf, other

    cs.CL

    Secoco: Self-Correcting Encoding for Neural Machine Translation

    Authors: Tao Wang, Chengqi Zhao, Mingxuan Wang, Lei Li, Hang Li, Deyi Xiong

    Abstract: This paper presents Self-correcting Encoding (Secoco), a framework that effectively deals with input noise for robust neural machine translation by introducing self-correcting predictors. Different from previous robust approaches, Secoco enables NMT to explicitly correct noisy inputs and delete specific errors simultaneously with the translation decoding process. Secoco is able to achieve signific… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: 6 pages, 2 figures, 3 tables

    MSC Class: 68T50 ACM Class: I.2.7

  46. arXiv:2106.05544  [pdf, other

    cs.CL

    CogAlign: Learning to Align Textual Neural Representations to Cognitive Language Processing Signals

    Authors: Yuqi Ren, Deyi Xiong

    Abstract: Most previous studies integrate cognitive language processing signals (e.g., eye-tracking or EEG data) into neural models of natural language processing (NLP) just by directly concatenating word embeddings with cognitive features, ignoring the gap between the two modalities (i.e., textual vs. cognitive) and noise in cognitive features. In this paper, we propose a CogAlign approach to these issues,… ▽ More

    Submitted 14 November, 2023; v1 submitted 10 June, 2021; originally announced June 2021.

  47. arXiv:2105.12887  [pdf

    cs.CL cs.AI

    Multi-turn Dialog System on Single-turn Data in Medical Domain

    Authors: Nazib Sorathiya, Chuan-An Lin, Daniel Chen Daniel Xiong, Scott Zin, Yi Zhang, He Sarina Yang, Sharon Xiaolei Huang

    Abstract: Recently there has been a huge interest in dialog systems. This interest has also been developed in the field of the medical domain where researchers are focusing on building a dialog system in the medical domain. This research is focused on the multi-turn dialog system trained on the multi-turn dialog data. It is difficult to gather a huge amount of multi-turn conversational data in the medical d… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  48. arXiv:2103.16189  [pdf, other

    cs.CL

    Autocorrect in the Process of Translation -- Multi-task Learning Improves Dialogue Machine Translation

    Authors: Tao Wang, Chengqi Zhao, Mingxuan Wang, Lei Li, Deyi Xiong

    Abstract: Automatic translation of dialogue texts is a much needed demand in many real life scenarios. However, the currently existing neural machine translation delivers unsatisfying results. In this paper, we conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation, including pronoun drop** (\droppro), punctuation drop** (\droppun), and typos (\typo). In re… ▽ More

    Submitted 21 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: 8 pages, 3 figures, 7 tables

    MSC Class: 68T50 ACM Class: I.2.7

  49. arXiv:2103.03446  [pdf, other

    cs.CL

    Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning

    Authors: **song Su, Jialong Tang, Hui Jiang, Ziyao Lu, Yubin Ge, Linfeng Song, Deyi Xiong, Le Sun, Jiebo Luo

    Abstract: In aspect-based sentiment analysis (ABSA), many neural models are equipped with an attention mechanism to quantify the contribution of each context word to sentiment prediction. However, such a mechanism suffers from one drawback: only a few frequent words with sentiment polarities are tended to be taken into consideration for final sentiment decision while abundant infrequent sentiment words are… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 31 pages. arXiv admin note: text overlap with arXiv:1906.01213

    Journal ref: Artificial Intelligence 2021

  50. arXiv:2102.08553  [pdf, ps, other

    cs.CL cs.AI

    Integrating Pre-trained Model into Rule-based Dialogue Management

    Authors: Jun Quan, Meng Yang, Qiang Gan, Deyi Xiong, Yiming Liu, Yuchen Dong, Fangxin Ouyang, Jun Tian, Ruiling Deng, Yongzhi Li, Yang Yang, Daxin Jiang

    Abstract: Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility. However, it is hard for developers to maintain the dialogue logic when the scenarios get more and more complex. On the other hand, data-driven dialogue systems, usually with end-to-end structures, are popular in academic research and easier to deal with compl… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: AAAI 2021 Demo Paper