Skip to main content

Showing 1–50 of 395 results for author: Xiong, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01370  [pdf, other

    cs.CL

    Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

    Authors: Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu

    Abstract: LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specifi… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.11548  [pdf, other

    cs.RO cs.AI cs.CV

    AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

    Authors: Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Rui** Wang, Hao Dong

    Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an ad… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.11271  [pdf, other

    cs.CV cs.LG

    MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

    Authors: Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Ye** Choi, Ludwig Schmidt

    Abstract: Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10291  [pdf, other

    cs.AI cs.CL cs.IR

    ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents

    Authors: Hao Kang, Chenyan Xiong

    Abstract: Large language models (LLMs) have exhibited remarkable performance across various tasks in natural language processing. Nevertheless, challenges still arise when these tasks demand domain-specific expertise and advanced analytical skills, such as conducting research surveys on a designated topic. In this research, we develop ResearchArena, a benchmark that measures LLM agents' ability to conduct a… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.10290  [pdf, other

    cs.CL cs.AI cs.LG

    MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

    Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.09696  [pdf, other

    eess.IV cs.CV

    MoME: Mixture of Multimodal Experts for Cancer Survival Prediction

    Authors: Conghao Xiong, Hao Chen, Hao Zheng, Dong Wei, Yefeng Zheng, Joseph J. Y. Sung, Irwin King

    Abstract: Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making. There are two main challenges in this task: significant heterogeneity and complex inter- and intra-modal interactions between the two modalities. Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 + 1/2 pages, early accepted to MICCAI2024

  8. arXiv:2406.06046  [pdf, other

    cs.CL cs.LG

    MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

    Authors: Zichun Yu, Spandan Das, Chenyan Xiong

    Abstract: Pretraining data selection has the potential to improve language model pretraining efficiency by utilizing higher-quality data from massive web data corpora. Current data selection methods, which rely on either hand-crafted rules or larger reference models, are conducted statically and do not capture the evolving data preferences during pretraining. In this paper, we introduce model-aware data sel… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: The code is open-sourced at https://github.com/cxcscmu/MATES

  9. arXiv:2406.04975  [pdf, other

    cs.LG cs.AI

    UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting

    Authors: Juncheng Liu, Chenghao Liu, Gerald Woo, Yiwei Wang, Bryan Hooi, Caiming Xiong, Doyen Sahoo

    Abstract: Transformer-based models have emerged as powerful tools for multivariate time series forecasting (MTSF). However, existing Transformer models often fall short of capturing both intricate dependencies across variate and temporal dimensions in MTS data. Some recent models are proposed to separately capture variate and temporal dependencies through either two sequential or parallel attention mechanis… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2405.20099  [pdf, other

    cs.CR

    Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks

    Authors: Chen Xiong, Xiangyu Qi, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Safety, security, and compliance are essential requirements when aligning large language models (LLMs). However, many seemingly aligned LLMs are soon shown to be susceptible to jailbreak attacks. These attacks aim to circumvent the models' safety guardrails and security mechanisms by introducing jailbreak prompts into malicious queries. In response to these challenges, this paper introduces Defens… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.17418  [pdf, other

    cs.CV

    Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

    Authors: Jiaming Liu, Chenxuan Li, Guanqun Wang, Lily Lee, Kaichen Zhou, Sixiang Chen, Chuyan Xiong, Jiaxin Ge, Renrui Zhang, Shanghang Zhang

    Abstract: Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in va… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2405.15638  [pdf, other

    cs.CV cs.CL

    M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models

    Authors: Hongyu Wang, Jiayu Xu, Senwei Xie, Rui** Wang, Jialin Li, Zhaojie Xie, Bin Zhang, Chuyan Xiong, Xilin Chen

    Abstract: Multilingual multimodal reasoning is a core component in achieving human-level intelligence. However, most existing benchmarks for multilingual multimodal reasoning struggle to differentiate between models of varying performance; even language models without visual capabilities can easily achieve high scores. This leaves a comprehensive evaluation of leading multilingual multimodal models largely… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Work in progress

  13. arXiv:2405.07863  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    RLHF Workflow: From Reward Modeling to Online RLHF

    Authors: Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

    Abstract: We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill i… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  14. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, **gwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  15. arXiv:2405.02826  [pdf, other

    cs.CR

    Nip in the Bud: Forecasting and Interpreting Post-exploitation Attacks in Real-time through Cyber Threat Intelligence Reports

    Authors: Tiantian Zhu, Jie Ying, Tieming Chen, Chunlin Xiong, Wenrui Cheng, Qixuan Yuan, Aohan Zheng, Mingqi Lv, Yan Chen

    Abstract: Advanced Persistent Threat (APT) attacks have caused significant damage worldwide. Various Endpoint Detection and Response (EDR) systems are deployed by enterprises to fight against potential threats. However, EDR suffers from high false positives. In order not to affect normal operations, analysts need to investigate and filter detection results before taking countermeasures, in which heavy manua… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  16. arXiv:2405.02629  [pdf, other

    cs.CR

    SPARSE: Semantic Tracking and Path Analysis for Attack Investigation in Real-time

    Authors: Jie Ying, Tiantian Zhu, Wenrui Cheng, Qixuan Yuan, Mingjun Ma, Chunlin Xiong, Tieming Chen, Mingqi Lv, Yan Chen

    Abstract: As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  17. arXiv:2404.16251  [pdf, ps, other

    cs.CR cs.AI cs.CL

    Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and ope… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  18. arXiv:2404.07972  [pdf, other

    cs.AI cs.CL

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Authors: Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh **g Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

    Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 figures

  19. arXiv:2404.04163  [pdf, other

    cs.IR cs.CL

    Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

    Authors: João Coelho, Bruno Martins, João Magalhães, Jamie Callan, Chenyan Xiong

    Abstract: This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval. We build on previous research that demonstrated loss of information in the middle of input sequences for causal language models, extending it to the domain of representation learning. We examine positional biases at various… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  20. arXiv:2404.02415  [pdf, other

    cs.CV

    What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

    Authors: Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong

    Abstract: Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate. A common belief is that a small number of VL skills underlie the variety of VL tests. In this paper, we perform a large-scale transfer learning experiment aimed at discovering latent VL skills from data. We reveal interesting characteristics that have important… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model int… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables

  22. arXiv:2403.12504  [pdf, other

    cs.RO

    TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO

    Authors: Chaoran Xiong, Guoqing Liu, Qi Wu, Songpengcheng Xia, Tong Hua, Kehui Ma, Zhen Sun, Yan Xiang, Ling Pei

    Abstract: Temporal misalignment (time offset) between sensors is common in low cost visual-inertial odometry (VIO) systems. Such temporal misalignment introduces inconsistent constraints for state estimation, leading to a significant positioning drift especially in high dynamic motion scenarios. In this article, we focus on online temporal calibration to reduce the positioning drift caused by the time offse… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  23. arXiv:2402.18667  [pdf, other

    cs.CL

    FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

    Authors: Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin, Caiming Xiong

    Abstract: This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and in… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: The first two authors contributed equally

  24. arXiv:2402.18264  [pdf, other

    cs.CL

    Retrieval-based Full-length Wikipedia Generation for Emergent Events

    Authors: Jiebin Zhang, Eugene J. Yu, Qinyu Chen, Chenhao Xiong, Dawei Zhu, Han Qian, Mingbo Song, Xiaoguang Li, Qun Liu, Sujian Li

    Abstract: In today's fast-paced world, the growing demand to quickly generate comprehensive and accurate Wikipedia documents for emerging events is both crucial and challenging. However, previous efforts in Wikipedia generation have often fallen short of meeting real-world requirements. Some approaches focus solely on generating segments of a complete Wikipedia document, while others overlook the importance… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  25. arXiv:2402.16058  [pdf, other

    cs.CL

    Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

    Authors: Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu

    Abstract: Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions, a process that incurs extra costs during inference. In this paper, we propose the Gist COnditioned deCOding (Gist-COCO) model, introducing a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decode… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  26. arXiv:2402.15538  [pdf, other

    cs.MA cs.AI

    AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

    Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese

    Abstract: The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from singl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: preprint. Library is available at https://github.com/SalesforceAIResearch/AgentLite

  27. arXiv:2402.15506  [pdf, other

    cs.AI cs.CL cs.LG

    AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

    Authors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

    Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Add GitHub repo link at \url{https://github.com/SalesforceAIResearch/xLAM} and HuggingFace model link at \url{https://huggingface.co/Salesforce/xLAM-v0.1-r}

  28. arXiv:2402.14652  [pdf, other

    cs.CL

    Cleaner Pretraining Corpus Curation with Neural Web Scra**

    Authors: Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong

    Abstract: The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans. Through meticulous data collection, preprocessing, and curation, webpages can be used as a fundamental data resource for language model pretraining. However, when confronted with the progressively revolutionized and intricate nature of webpages, rule-based/feature-based web scrapers… ▽ More

    Submitted 14 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  29. arXiv:2402.13547  [pdf, other

    cs.CL

    ActiveRAG: Revealing the Treasures of Knowledge via Active Learning

    Authors: Zhipeng Xu, Zhenghao Liu, Yibin Liu, Chenyan Xiong, Yukun Yan, Shuo Wang, Shi Yu, Zhiyuan Liu, Ge Yu

    Abstract: Retrieval Augmented Generation (RAG) has introduced a new paradigm for Large Language Models (LLMs), aiding in the resolution of knowledge-intensive tasks. However, current RAG models position LLMs as passive knowledge receptors, thereby restricting their capacity for learning and comprehending external knowledge. In this paper, we present ActiveRAG, an innovative RAG framework that shifts from pa… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  30. arXiv:2402.13448  [pdf, other

    cs.CL cs.AI cs.LG

    ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

    Authors: Liwen Sun, Abhineet Agarwal, Aaron Kornblith, Bin Yu, Chenyan Xiong

    Abstract: In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In coll… ▽ More

    Submitted 27 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  31. arXiv:2402.10941  [pdf, other

    cs.CL cs.AI cs.LG

    Text2Data: Low-Resource Data Generation with Textual Control

    Authors: Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesi… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: We propose a method that can achieve text-to-data generation under low-resource situation

  32. arXiv:2402.02592  [pdf, other

    cs.LG cs.AI

    Unified Training of Universal Time Series Forecasting Transformers

    Authors: Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo

    Abstract: Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forec… ▽ More

    Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  33. arXiv:2401.10495  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Causal Layering via Conditional Entropy

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle, when distributions are discrete. Our algorithms work by repeatedly removing sources or s… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  34. arXiv:2401.07526  [pdf, other

    cs.CL cs.AI cs.LG

    Editing Arbitrary Propositions in LLMs without Subject Labels

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositi… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  35. arXiv:2401.06947  [pdf, other

    cs.CL cs.AI

    Parameter-Efficient Detoxification with Contrastive Decoding

    Authors: Tong Niu, Caiming Xiong, Semih Yavuz, Yingbo Zhou

    Abstract: The field of natural language generation has witnessed significant advancements in recent years, including the development of controllable text generation techniques. However, controlling the attributes of the generated text remains a challenge, especially when aiming to avoid undesirable behavior such as toxicity. In this work, we introduce Detoxification Generator (DETOXIGEN), an inference-time… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  36. arXiv:2401.05561  [pdf, other

    cs.CL

    TrustLLM: Trustworthiness in Large Language Models

    Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

    Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More

    Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: This work is still under work and we welcome your contribution

  37. arXiv:2401.01827  [pdf, other

    cs.CV

    Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions

    Authors: David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo

    Abstract: Most existing video diffusion models (VDMs) are limited to mere text conditions. Thereby, they are usually lacking in control over visual appearance and geometry structure of the generated videos. This work presents Moonshot, a new video generation model that conditions simultaneously on multimodal inputs of image and text. The model builts upon a core module, called multimodal video block (MVB),… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: project page: https://showlab.github.io/Moonshot/

  38. arXiv:2312.14511  [pdf

    cs.RO eess.SY

    3D Programming of Patterned Heterogeneous Interface for 4D Smart Robotics

    Authors: Kewei Song, Chunfeng Xiong, Ze Zhang, Kunlin Wu, Weiyang Wan, Yifan Wang, Shinjiro Umezu, Hirotaka Sato

    Abstract: Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 37 Pages, 11 Figures

  39. arXiv:2312.11444  [pdf, other

    cs.CL cs.AI

    An In-depth Look at Gemini's Language Abilities

    Authors: Syeda Nahida Akter, Zichun Yu, Aashiq Muhamed, Tianyue Ou, Alex Bäuerle, Ángel Alexander Cabrera, Krish Dholakia, Chenyan Xiong, Graham Neubig

    Abstract: The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

  40. arXiv:2312.06149  [pdf, other

    cs.CL cs.AI

    Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding

    Authors: Lifu Tu, Semih Yavuz, ** Qu, Jiacheng Xu, Rui Meng, Caiming Xiong, Yingbo Zhou

    Abstract: Large Language Models (LLMs) have demonstrated a powerful ability for text generation. However, achieving optimal results with a given prompt or instruction can be challenging, especially for billion-sized models. Additionally, undesired behaviors such as toxicity or hallucinations can manifest. While much larger models (e.g., ChatGPT) may demonstrate strength in mitigating these issues, there is… ▽ More

    Submitted 25 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  41. arXiv:2312.00314  [pdf, other

    cs.RO

    A bilevel optimal motion planning (BOMP) model with application to autonomous parking

    Authors: Shenglei Shi, Youlun Xiong, Jiankui Chen, Caihua Xiong

    Abstract: In this paper, we present a bilevel optimal motion planning (BOMP) model for autonomous parking. The BOMP model treats motion planning as an optimal control problem, in which the upper level is designed for vehicle nonlinear dynamics, and the lower level is for geometry collision-free constraints. The significant feature of the BOMP model is that the lower level is a linear programming problem tha… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  42. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  43. arXiv:2311.16989  [pdf, other

    cs.CL

    ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

    Authors: Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

    Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: version v4, included latest top-performing open-sourced LLMs

  44. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  45. arXiv:2311.09458  [pdf, other

    cs.CL

    Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of Lexical Overlap in Train and Test Reference Summaries

    Authors: Prafulla Kumar Choubey, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu

    Abstract: Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with traini… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023-Findings

  46. arXiv:2311.08596  [pdf, other

    cs.CL

    Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment

    Authors: Philippe Laban, Lidiya Murakhovs'ka, Caiming Xiong, Chien-Sheng Wu

    Abstract: The interactive nature of Large Language Models (LLMs) theoretically allows models to refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs remains limited. In this paper, we propose the FlipFlop experiment: in the first round of the conversation, an LLM completes a classification task. In a second round, the LLM is challenged with a follow-up phrase like "Ar… ▽ More

    Submitted 21 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  47. arXiv:2311.07884  [pdf, other

    cs.CL

    Fair Abstractive Summarization of Diverse Perspectives

    Authors: Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

    Abstract: People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation h… ▽ More

    Submitted 29 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  48. arXiv:2310.17749  [pdf, other

    cs.CL cs.AI

    Salespeople vs SalesBot: Exploring the Role of Educational Value in Conversational Recommender Systems

    Authors: Lidiya Murakhovs'ka, Philippe Laban, Tian Xie, Caiming Xiong, Chien-Sheng Wu

    Abstract: Making big purchases requires consumers to research or consult a salesperson to gain domain expertise. However, existing conversational recommender systems (CRS) often overlook users' lack of background knowledge, focusing solely on gathering preferences. In this work, we define a new problem space for conversational agents that aim to provide both product recommendations and educational value thr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  49. arXiv:2310.16605  [pdf, other

    cs.IR

    WebDRO: A Web-based Group-level Clustering and Reweighting Method for Unsupervised Dense Retrieval

    Authors: Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong

    Abstract: The anchor-document data derived from web graphs offers a wealth of paired information for training dense retrieval models in an unsupervised manner. However, the presence of inherent noise invariably compromises the robustness of training dense retrieval models, consequently hurting the performance. In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and op… ▽ More

    Submitted 29 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  50. arXiv:2310.14037  [pdf, other

    cs.IR

    MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

    Authors: Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu

    Abstract: This paper proposes Multi-modAl Retrieval model via Visual modulE pLugin (MARVEL), which learns an embedding space for queries and multi-modal documents to conduct retrieval. MARVEL encodes queries and multi-modal documents with a unified encoder model, which helps to alleviate the modality gap between images and texts. Specifically, we enable the image understanding ability of the well-trained de… ▽ More

    Submitted 15 June, 2024; v1 submitted 21 October, 2023; originally announced October 2023.