Skip to main content

Showing 1–50 of 130 results for author: Cai, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, **ghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  3. arXiv:2406.13793  [pdf, other

    cs.HC

    Exploring the Optimal Time Window for Predicting Cognitive Load Using Physiological Sensor Data

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Learning analytics has begun to use physiological signals because these have been linked with learners' cognitive and affective states. These signals, when interpreted through machine learning techniques, offer a nuanced understanding of the temporal dynamics of student learning experiences and processes. However, there is a lack of clear guidance on the optimal time window to use for analyzing ph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Presented at PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI, May 11, 2024, Honolulu, HI, USA

  4. arXiv:2406.09400  [pdf, other

    cs.CV cs.LG

    Yo'LLaVA: Your Personalized Language and Vision Assistant

    Authors: Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://thaoshibe.github.io/YoLLaVA

  5. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  6. arXiv:2405.20718  [pdf, other

    cs.IR cs.AI

    Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

    Authors: Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang

    Abstract: Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias,… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  7. arXiv:2405.17430  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Matryoshka Multimodal Models

    Authors: Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://matryoshka-mm.github.io/

  8. Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

    Authors: Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang

    Abstract: Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., m… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

  9. arXiv:2405.10818  [pdf

    cs.SI

    Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

    Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

    Abstract: In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

  10. arXiv:2405.06670  [pdf, other

    cs.LO cs.LG

    TLINet: Differentiable Neural Network Temporal Logic Inference

    Authors: Danyang Li, Mingyu Cai, Cristian-Ioan Vasile, Roberto Tron

    Abstract: There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gr… ▽ More

    Submitted 14 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  11. arXiv:2405.05543  [pdf, ps, other

    cs.HC

    Predicting Cognitive Load Using Sensor Data in a Literacy Game

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Educational games are being increasingly used to support self-paced learning. However, educators and system designers often face challenges in monitoring student affect and cognitive load. Existing assessments in game-based learning environments (GBLEs) tend to focus more on outcomes rather than processes, potentially overlooking key aspects of the learning journey that include learner affect and… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This work has been accepted by the 17th International Conference on Educational Data Mining

  12. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  13. arXiv:2403.19770  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical Deep Learning for Intention Estimation of Teleoperation Manipulation in Assembly Tasks

    Authors: Mingyu Cai, Karankumar Patel, Soshi Iba, Songpo Li

    Abstract: In human-robot collaboration, shared control presents an opportunity to teleoperate robotic manipulation to improve the efficiency of manufacturing and assembly processes. Robots are expected to assist in executing the user's intentions. To this end, robust and prompt intention estimation is needed, relying on behavioral observations. The framework presents an intention estimation technique at hie… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  14. arXiv:2403.18348  [pdf, other

    cs.IR

    Sequential Recommendation with Latent Relations based on Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

    Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  15. arXiv:2403.18325  [pdf, other

    cs.IR

    Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

    Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024

  16. arXiv:2403.15388  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

    Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

    Abstract: Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://llava-prumerge.github.io/

  17. arXiv:2403.14125  [pdf, other

    stat.ML cs.LG

    Learning causal graphs using variable grou** according to ancestral relationship

    Authors: Ming Cai, Hisayuki Hara

    Abstract: Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms usin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  18. arXiv:2403.04369  [pdf, other

    cs.AI cs.CL

    From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

    Authors: Ang Li, Qiangchao Chen, Yiquan Wu, Ming Cai, Xiang Zhou, Fei Wu, Kun Kuang

    Abstract: Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing c… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  19. arXiv:2403.04366  [pdf, other

    cs.AI

    Enhancing Court View Generation with Knowledge Injection and Guidance

    Authors: Ang Li, Yiquan Wu, Yifei Liu, Fei Wu, Ming Cai, Kun Kuang

    Abstract: Court View Generation (CVG) is a challenging task in the field of Legal Artificial Intelligence (LegalAI), which aims to generate court views based on the plaintiff claims and the fact descriptions. While Pretrained Language Models (PLMs) have showcased their prowess in natural language generation, their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitatio… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  20. arXiv:2403.03790  [pdf, other

    cs.CV

    Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Guoqiang Lei, Yin Zhuang, Xuerui Mao

    Abstract: Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful g… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  21. arXiv:2403.03730  [pdf, other

    cs.CV cs.AI cs.LG

    Learning 3D object-centric representation through prediction

    Authors: John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

    Abstract: As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures. Project webpage can be found at https://jday54.github.io/opple_site/

    ACM Class: I.2.10; I.4.8; I.4.6; I.4.10; I.2.6

  22. arXiv:2402.18166  [pdf, other

    cs.IR

    Sequence-level Semantic Representation Fusion for Recommender Systems

    Authors: Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, **peng Wang, Mingchen Cai, Wayne Xin Zhao

    Abstract: With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data charact… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  23. arXiv:2402.13254  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

    Authors: Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

    Abstract: We propose CounterCurate, a framework to comprehensively improve the visio-linguistic compositional reasoning capability for both contrastive and generative multimodal models. In particular, we identify two critical under-explored problems: the neglect of the physically grounded reasoning (counting and position understanding) and the potential of using highly capable text and image generation mode… ▽ More

    Submitted 12 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 15 pages, 6 figures, 12 tables, Project Page: https://countercurate.github.io/

  24. arXiv:2402.13194  [pdf, other

    quant-ph cs.IT

    Quantum Wiretap Channel Coding Assisted by Noisy Correlation

    Authors: Minglai Cai, Andreas Winter

    Abstract: We consider the private classical capacity of a quantum wiretap channel, where the users (sender Alice, receiver Bob, and eavesdropper Eve) have access to the resource of a shared quantum state, additionally to their channel inputs and outputs. An extreme case is maximal entanglement or a secret key between Alice and Bob, both of which would allow for onetime padding the message. But here both the… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: Proc. ISIT 2024, Athens (Greece), 7-12 July 2024

  25. arXiv:2401.16822  [pdf, other

    cs.CV

    EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant diversities between the natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpre… ▽ More

    Submitted 8 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  26. arXiv:2401.04997  [pdf, other

    cs.IR

    Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis

    Authors: Lanling Xu, Junjie Zhang, Bingqian Li, **peng Wang, Mingchen Cai, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Recently, large language models such as ChatGPT have showcased remarkable abilities in solving general tasks, demonstrating the potential for applications in recommender systems. To assess how effectively LLMs can be used in recommendation tasks, our study primarily focuses on employing LLMs as recommender systems through prompting engineering. We propose a general framework for utilizing LLMs in… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 40 pages, under review

  27. arXiv:2312.10897  [pdf, other

    cs.CL cs.AI cs.LG

    Generalized Category Discovery with Large Language Models in the Loop

    Authors: Wenbin An, Wenkai Shi, Feng Tian, Haonan Lin, QianYing Wang, Yaqiang Wu, Mingxiang Cai, Luyan Wang, Yan Chen, Hai** Zhu, ** Chen

    Abstract: Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of supervision and category information, current methods usually perform poorly on novel categories and struggle to reveal semantic meanings of the discovered clusters, which limits the… ▽ More

    Submitted 26 May, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by ACL 2024 Findings, code and data are available at https://github.com/Lackel/LOOP

  28. arXiv:2312.08153  [pdf, other

    physics.comp-ph cs.LG

    $ρ$-Diffusion: A diffusion-based density estimation framework for computational physics

    Authors: Maxwell X. Cai, Kin Long Kelvin Lee

    Abstract: In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabili… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 6 pages, 2 figures, accepted for publication at the NeurIPS 2023 workshop "Machine Learning and the Physical Sciences"

  29. arXiv:2312.00784  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Authors: Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

    Abstract: While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024. Project page: https://vip-llava.github.io/

  30. arXiv:2311.17318  [pdf

    cs.CY

    Impact of Indoor Mobility Behavior on the Respiratory Infectious Diseases Transmission Trends

    Authors: Ziwei Cui, Ming Cai, Zheng Zhu, Gongbo Chen, Yao Xiao

    Abstract: The importance of indoor human mobility in the transmission dynamics of respiratory infectious diseases has been acknowledged. Previous studies have predominantly addressed a single type of mobility behavior such as queueing and a series of behaviors under specific scenarios. However, these studies ignore the abstraction of mobility behavior in various scenes and the critical examination of how th… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.01487  [pdf, other

    cs.CV cs.CL

    What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

    Authors: Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, **peng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen

    Abstract: Visual instruction tuning is an essential approach to improving the zero-shot generalization capability of Multi-modal Large Language Models (MLLMs). A surge of visual instruction datasets with various focuses and characteristics have been proposed recently, enabling MLLMs to achieve surprising results on evaluation benchmarks. To develop more capable MLLMs, in this paper, we aim to investigate a… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  32. arXiv:2310.20398  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG physics.comp-ph

    A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks

    Authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai

    Abstract: Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the l… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted for publication in the Journal of Computational Physics

  33. arXiv:2310.13610  [pdf, other

    cs.CL cs.AI

    Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

    Authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin

    Abstract: Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to ac… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  34. arXiv:2310.05036  [pdf, other

    cs.AI cs.CL

    AvalonBench: Evaluating LLMs Playing the Game of Avalon

    Authors: Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu

    Abstract: In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compe… ▽ More

    Submitted 8 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  35. arXiv:2310.05035  [pdf, other

    cs.CL cs.AI

    Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection

    Authors: Haodi Zhang, Min Cai, Xinhe Zhang, Chen Jason Zhang, Rui Mao, Kaishun Wu

    Abstract: While large language models (LLMs) such as ChatGPT and PaLM have demonstrated remarkable performance in various language understanding and generation tasks, their capabilities in complex reasoning and intricate knowledge utilization still fall short of human-level proficiency. Recent studies have established the effectiveness of prompts in steering LLMs towards generating desired outputs. Building… ▽ More

    Submitted 10 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  36. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  37. arXiv:2309.12530  [pdf, other

    cs.CV

    A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

    Authors: Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

    Abstract: Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain. In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unsee… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: to appear at ICCV2023

  38. arXiv:2309.10313  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating the Catastrophic Forgetting in Multimodal Large Language Models

    Authors: Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

    Abstract: Following the success of GPT4, there has been a surge in interest in multimodal large language model (MLLM) research. This line of research focuses on develo** general-purpose LLMs through fine-tuning pre-trained LLMs and vision models. However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  39. arXiv:2309.04198  [pdf, other

    cs.CL

    Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

    Authors: Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin

    Abstract: Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both posit… ▽ More

    Submitted 23 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  40. arXiv:2309.04175  [pdf, other

    cs.CL cs.AI

    Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

    Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tunin… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 11 pages, 5 figures

  41. arXiv:2309.04174  [pdf, other

    cs.CL cs.AI

    Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

    Authors: Haochun Wang, Sendong Zhao, Chi Liu, Nuwa Xi, Muzhen Cai, Bing Qin, Ting Liu

    Abstract: Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Mean… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 2024, 11 pages, 3 figures

  42. arXiv:2308.12033  [pdf, other

    cs.CL cs.AI

    PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

    Authors: Chenrui Zhang, Lin Liu, **peng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai

    Abstract: As an effective tool for eliciting the power of Large Language Models (LLMs), prompting has recently demonstrated unprecedented abilities across a variety of complex tasks. To further improve the performance, prompt ensemble has attracted substantial interest for tackling the hallucination and instability of LLMs. However, existing methods usually adopt a two-stage paradigm, which requires a pre-p… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 8 pages, 4 figures

  43. arXiv:2308.09370  [pdf, other

    cs.CL cs.SD eess.AS

    TrOMR:Transformer-Based Polyphonic Optical Music Recognition

    Authors: Yixuan Li, Hua** Liu, Qiang **, Miaomiao Cai, Peng Li

    Abstract: Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel con… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  44. arXiv:2306.06094  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

    Authors: Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

    Abstract: Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG represe… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  45. Three-way Imbalanced Learning based on Fuzzy Twin SVM

    Authors: Wanting Cai, Mingjie Cai, Qingguo Li, Qiong Liu

    Abstract: Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is… ▽ More

    Submitted 19 May, 2023; originally announced June 2023.

  46. Enhancing Language Representation with Constructional Information for Natural Language Understanding

    Authors: Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Zhilin Gong, Ming Cai, Tianxiang Wang

    Abstract: Natural language understanding (NLU) is an essential branch of natural language processing, which relies on representations generated by pre-trained language models (PLMs). However, PLMs primarily focus on acquiring lexico-semantic information, while they may be unable to adequately handle the meaning of constructions. To address this issue, we introduce construction grammar (CxG), which highlight… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Long paper, accepted at the ACL 2023

  47. arXiv:2305.12114  [pdf, other

    cs.LG cs.AI cs.DC cs.IT

    GFDC: A Granule Fusion Density-Based Clustering with Evidential Reasoning

    Authors: Mingjie Cai, Zhishan Wu, Qingguo Li, Feng Xu, Jie Zhou

    Abstract: Currently, density-based clustering algorithms are widely applied because they can detect clusters with arbitrary shapes. However, they perform poorly in measuring global density, determining reasonable cluster centers or structures, assigning samples accurately and handling data with large density differences among clusters. To overcome their drawbacks, this paper proposes a granule fusion densit… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  48. arXiv:2305.00561  [pdf, other

    cs.AI cs.FL cs.MA cs.RO eess.SY

    Model-free Motion Planning of Autonomous Agents for Complex Tasks in Partially Observable Environments

    Authors: Junchao Li, Mingyu Cai, Zhen Kan, Abstract: Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL)… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 32 pages, 22 figures, submitted to Autonomous Agents and Multi-Agent Systems

  49. arXiv:2304.09176  [pdf, other

    cs.LG cs.AI

    Enhancing Personalized Ranking With Differentiable Group AUC Optimization

    Authors: Xiao Sun, Bo Zhang, Chenrui Zhang, Han Ren, Mingchen Cai

    Abstract: AUC is a common metric for evaluating the performance of a classifier. However, most classifiers are trained with cross entropy, and it does not optimize the AUC metric directly, which leaves a gap between the training and evaluation stage. In this paper, we propose the PDAOM loss, a Personalized and Differentiable AUC Optimization method with Maximum violation, which can be directly applied when… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: NeuRec@ICDM 2022

  50. arXiv:2304.00790  [pdf, other

    cs.RO eess.SY

    LQR-CBF-RRT*: Safe and Optimal Motion Planning

    Authors: Guang Yang, Mingyu Cai, Ahmad Ahmad, Amanda Prorok, Roberto Tron, Calin Belta

    Abstract: We present LQR-CBF-RRT*, an incremental sampling-based algorithm for offline motion planning. Our framework leverages the strength of Control Barrier Functions (CBFs) and Linear Quadratic Regulators (LQR) to generate safety-critical and optimal trajectories for a robot with dynamics described by an affine control system. CBFs are used for safety guarantees, while LQRs are employed for optimal cont… ▽ More

    Submitted 27 September, 2023; v1 submitted 3 April, 2023; originally announced April 2023.