Skip to main content

Showing 1–11 of 11 results for author: Xi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wen** Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, **bo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equip** LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2405.16933  [pdf, other

    cs.CL cs.IR

    Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning

    Authors: Xun Liang, Simin Niu, Zhiyu li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, Chenyang Xi

    Abstract: Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them wi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2403.06221  [pdf, other

    cs.AI cs.CL cs.IR

    TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

    Authors: Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang

    Abstract: Numerous large language model (LLM) agents have been built for different tasks like web navigation and online shop** due to LLM's wide knowledge and text-understanding ability. Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Codes available at: https://github.com/skyriver-2000/TRAD-Official

  4. arXiv:2403.02513  [pdf, other

    cs.CL

    Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF

    Authors: Chen Zheng, Ke Sun, Hang Wu, Chenguang Xi, Xun Zhou

    Abstract: In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user pr… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  5. arXiv:2401.02072  [pdf, other

    cs.CL

    ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

    Authors: Chen Zheng, Ke Sun, Da Tang, Yukun Ma, Yuyu Zhang, Chenguang Xi, Xun Zhou

    Abstract: The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Huma… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  6. arXiv:2309.16583  [pdf, other

    cs.CL

    GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

    Authors: Shen Zheng, Yuyu Zhang, Yijie Zhu, Chenguang Xi, Pengyang Gao, Xun Zhou, Kevin Chen-Chuan Chang

    Abstract: With the rapid advancement of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we in… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted by NAACL 2024

  7. arXiv:2204.06240  [pdf, other

    cs.LG cs.IR

    CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

    Authors: Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You

    Abstract: The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vis… ▽ More

    Submitted 30 November, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: AAAI 2023

  8. arXiv:2106.05150  [pdf, other

    cs.LG cs.SI

    Scaling Up Graph Neural Networks Via Graph Coarsening

    Authors: Zengfeng Huang, Shengzhong Zhang, Chong Xi, Tang Liu, Min Zhou

    Abstract: Scalability of graph neural networks remains one of the major challenges in graph machine learning. Since the representation of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes from previous layers, the receptive fields grow exponentially, which makes standard stochastic optimization techniques ineffective. Various approaches have been… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: KDD 2021

  9. arXiv:2105.05473  [pdf, other

    cs.LG cs.AI

    Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

    Authors: Chenyang Xi, Bo Tang, Jiajun Shen, Xinfu Liu, Feiyu Xiong, Xueying Li

    Abstract: Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we first propose a two-fold taxonomy for existing offline RL algorithms from the perspective of exploration and exploitation tendency. Secondly, we derive the exp… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  10. arXiv:1904.08784  [pdf, other

    cs.RO cs.LG stat.ML

    Efficient Motion Planning for Automated Lane Change based on Imitation Learning and Mixed-Integer Optimization

    Authors: Chenyang Xi, Tianyu Shi, Yuankai Wu, Lijun Sun

    Abstract: Intelligent motion planning is one of the core components in automated vehicles, which has received extensive interests. Traditional motion planning methods suffer from several drawbacks in terms of optimality, efficiency and generalization capability. Sampling based methods cannot guarantee the optimality of the generated trajectories. Whereas the optimization-based methods are not able to perfor… ▽ More

    Submitted 8 May, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: Accepted by IEEE ITSC 2020

  11. arXiv:1412.5526   

    math.OC cs.IT

    Distributed Mirror Descent over Directed Graphs

    Authors: Chenguang Xi, Qiong Wu, Usman A. Khan

    Abstract: In this paper, we propose Distributed Mirror Descent (DMD) algorithm for constrained convex optimization problems on a (strongly-)connected multi-agent network. We assume that each agent has a private objective function and a constraint set. The proposed DMD algorithm employs a locally designed Bregman distance function at each agent, and thus can be viewed as a generalization of the well-known Di… ▽ More

    Submitted 27 April, 2015; v1 submitted 15 December, 2014; originally announced December 2014.

    Comments: This paper has been withdrawn by the author due to a crucial error