Skip to main content

Showing 1–50 of 175 results for author: Ji, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20087  [pdf, other

    cs.LG cs.AI cs.CL cs.CY cs.HC

    ProgressGym: Alignment with a Millennium of Moral Progress

    Authors: Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang

    Abstract: Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigat… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18259  [pdf, other

    cs.CL cs.AI

    Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

    Authors: Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, Xinru Lu

    Abstract: As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlap** behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentia… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 2 figures

  3. arXiv:2406.18197  [pdf, other

    cs.CV

    Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

    Authors: Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

    Abstract: Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.16449  [pdf, other

    cs.CV

    Evaluating and Analyzing Relationship Hallucinations in LVLMs

    Authors: Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

    Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Benc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML2024

  5. arXiv:2406.15513  [pdf, other

    cs.AI cs.CL

    PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

    Authors: Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

    Abstract: In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs). As a sibling project to SafeRLHF and BeaverTails, we separate annotations of helpfulness and harmlessness for question-answering pairs, providing distinct perspectives on these coupled attributes. Overall, we provide 44.6k refined prompts and 265k question-answer p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: a sibling project to SafeRLHF and BeaverTails

  6. arXiv:2406.15283  [pdf, other

    cs.LG

    FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

    Authors: Austin Coursey, Junyi Ji, Marcos Quinones-Grueiro, William Barbour, Yuhang Zhang, Tyler Derr, Gautam Biswas, Daniel B. Work

    Abstract: Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-s… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.14477  [pdf, other

    cs.CV cs.AI cs.DB

    SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

    Authors: Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

    Abstract: To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values. This dataset encompasses human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness. To capture in-depth human preferences and facilitate structured reasoning by cro… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.08607  [pdf, other

    cs.CL cs.AI

    Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

    Authors: Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, Shiyu Chang

    Abstract: As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the oth… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 21 pages, 11 figures

  9. arXiv:2406.06144  [pdf, other

    cs.CL cs.AI

    Language Models Resist Alignment

    Authors: Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang

    Abstract: Large language models (LLMs) may exhibit undesirable behaviors. Recent efforts have focused on aligning these models to prevent harmful generation. Despite these efforts, studies have shown that even a well-conducted alignment process can be easily circumvented, whether intentionally or accidentally. Do alignment fine-tuning have robust effects on models, or are merely superficial? In this work, w… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 21 pages

  10. arXiv:2406.05620  [pdf, other

    cs.CV

    Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

    Authors: Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji

    Abstract: Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific individual based on a textual description. Despite considerable efforts to bridge the gap between vision and language, the significant differences between these modalities continue to pose a challenge. Previous methods have attempted to align text and image samples in a modal-shared space, but they face unc… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: ACM MM2023

  11. arXiv:2406.05127  [pdf, other

    cs.CV

    Towards Semantic Equivalence of Tokenization in Multimodal LLM

    Authors: Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, r… ▽ More

    Submitted 27 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Technical Report. The project page: https://chocowu.github.io/SeTok-web/

  12. arXiv:2406.04428  [pdf, other

    cs.CL cs.AI

    MoralBench: Moral Evaluation of LLMs

    Authors: Jianchao Ji, Yutong Chen, Mingyu **, Wujiang Xu, Wenyue Hua, Yongfeng Zhang

    Abstract: In the rapidly evolving field of artificial intelligence, large language models (LLMs) have emerged as powerful tools for a myriad of applications, from natural language processing to decision-making support systems. However, as these models become increasingly integrated into societal frameworks, the imperative to ensure they operate within ethical and moral boundaries has never been more critica… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03367  [pdf, other

    cs.AI

    CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning

    Authors: Xinrui Lin, Yangfan Wu, Huanyu Yang, Yu Zhang, Yanyong Zhang, Jianmin Ji

    Abstract: Large Language Models (LLMs) possess extensive foundational knowledge and moderate reasoning abilities, making them suitable for general task planning in open-world scenarios. However, it is challenging to ground a LLM-generated plan to be executable for the specified robot with certain restrictions. This paper introduces CLMASP, an approach that couples LLMs with Answer Set Programming (ASP) to o… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  14. arXiv:2406.01636  [pdf

    q-bio.QM cs.AI

    COVID-19: post infection implications in different age groups, mechanism, diagnosis, effective prevention, treatment, and recommendations

    Authors: Muhammad Akmal Raheem, Muhammad Ajwad Rahim, Ijaz Gul, Md. Reyad-ul-Ferdous, Liyan Le, Junguo Hui, Shuiwei Xia, Minjiang Chen, Dongmei Yu, Vijay Pandey, Peiwu Qin, Jiansong Ji

    Abstract: SARS-CoV-2, the highly contagious pathogen responsible for the COVID-19 pandemic, has persistent effects that begin four weeks after initial infection and last for an undetermined duration. These chronic effects are more harmful than acute ones. This review explores the long-term impact of the virus on various human organs, including the pulmonary, cardiovascular, neurological, reproductive, gastr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  15. arXiv:2406.01451  [pdf, other

    cs.CV cs.MM

    SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

    Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarca… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024

  16. arXiv:2406.00334  [pdf, other

    cs.CV

    Image Captioning via Dynamic Path Customization

    Authors: Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

    Abstract: This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only heavily rely on expert knowledge, but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address thes… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: TNNLS24

  17. arXiv:2405.13715  [pdf, other

    cs.LO cs.AI

    Traffic Scenario Logic: A Spatial-Temporal Logic for Modeling and Reasoning of Urban Traffic Scenarios

    Authors: Ruolin Wang, Yuejiao Xu, Jianmin Ji

    Abstract: Formal representations of traffic scenarios can be used to generate test cases for the safety verification of autonomous driving. However, most existing methods are limited in highway or highly simplified intersection scenarios due to the intricacy and diversity of traffic scenarios. In response, we propose Traffic Scenario Logic (TSL), which is a spatial-temporal logic designed for modeling and r… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Submitted to KR 2024

  18. arXiv:2405.13704  [pdf, other

    cs.RO cs.LO

    Safe and Personalizable Logical Guidance for Trajectory Planning of Autonomous Driving

    Authors: Yuejiao Xu, Ruolin Wang, Chengpeng Xu, Jianmin Ji

    Abstract: Autonomous vehicles necessitate a delicate balance between safety, efficiency, and user preferences in trajectory planning. Existing traditional or learning-based methods face challenges in adequately addressing all these aspects. In response, this paper proposes a novel component termed the Logical Guidance Layer (LGL), designed for seamless integration into autonomous driving trajectory planning… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Submitted to ITSC 2024

  19. arXiv:2405.12496  [pdf, other

    eess.AS cs.NI cs.SD eess.SP

    A Survey of Integrating Wireless Technology into Active Noise Control

    Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

    Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  20. arXiv:2405.05589  [pdf, other

    cs.RO

    Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

    Authors: Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, Yanyong Zhang

    Abstract: Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 19 figures

  21. arXiv:2405.00954  [pdf, other

    cs.CV

    X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

    Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: ICML2024

  22. arXiv:2404.19531  [pdf, other

    cs.CV

    MoST: Multi-modality Scene Tokenization for Motion Prediction

    Authors: Norman Mu, **gwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou

    Abstract: Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e.g., failures in detecting open-vocabulary obstacles) while missing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  23. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu **, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  24. arXiv:2404.14668  [pdf, other

    cs.SI

    Source Localization for Cross Network Information Diffusion

    Authors: Chen Ling, Tanmoy Chowdhury, Jie Ji, Sirui Li, Andreas Züfle, Liang Zhao

    Abstract: Source localization aims to locate information diffusion sources only given the diffusion observation, which has attracted extensive attention in the past few years. Existing methods are mostly tailored for single networks and may not be generalized to handle more complex networks like cross-networks. Cross-network is defined as two interconnected networks, where one network's functionality depend… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Code and data are available at: https://github.com/tanmoysr/CNSL/

  25. arXiv:2404.12274  [pdf, other

    cs.CL cs.AI

    Advancing the Robustness of Large Language Models through Self-Denoised Smoothing

    Authors: Jiabao Ji, Bairu Hou, Zhen Zhang, Guanhua Zhang, Wenqi Fan, Qing Li, Yang Zhang, Gaowen Liu, Sijia Liu, Shiyu Chang

    Abstract: Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024. Jiabao, Bairu, Zhen, Guanhua contributed equally. This is an updated version of the paper: arXiv:2307.07171

  26. arXiv:2404.05164  [pdf, other

    cs.RO

    Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes

    Authors: Yu Sheng, Lu Zhang, Xingchen Li, Yifan Duan, Yanyong Zhang, Yu Zhang, Jianmin Ji

    Abstract: Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  27. arXiv:2404.04026  [pdf, other

    cs.RO cs.CV

    MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

    Authors: Chenyang Wu, Yifan Duan, Xinran Zhang, Yu Sheng, Jianmin Ji, Yanyong Zhang

    Abstract: Localization and map** are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and map** in unbounded scenes. Our approach is inspired by the recently de… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  28. arXiv:2404.03191  [pdf, other

    cs.CV

    CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks

    Authors: Beibei Wang, Shuang Meng, Lu Zhang, Chenjie Wang, **g**g Huang, Yao Li, Haojie Ren, Yuxuan Xiao, Yuru Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

    Abstract: Numerous roadside perception datasets have been introduced to propel advancements in autonomous driving and intelligent transportation systems research and development. However, it has been observed that the majority of their concentrates is on urban arterial roads, inadvertently overlooking residential areas such as parks and campuses that exhibit entirely distinct characteristics. In light of th… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  29. arXiv:2403.15183  [pdf, other

    cs.RO

    CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition

    Authors: Shaowei Fu, Yifan Duan, Yao Li, Chengzhen Meng, Yingjie Wang, Jianmin Ji, Yanyong Zhang

    Abstract: The integration of complementary characteristics from camera and radar data has emerged as an effective approach in 3D object detection. However, such fusion-based methods remain unexplored for place recognition, an equally important task for autonomous systems. Given that place recognition relies on the similarity between a query scene and the corresponding candidate scene, the stationary backgro… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  30. arXiv:2403.12457  [pdf, other

    cs.CV

    Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

    Authors: Yuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou

    Abstract: The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-p… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  31. arXiv:2403.02977  [pdf, other

    cs.RO

    Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space

    Authors: Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui **, Yuman Gao, Chao Xu, Fei Gao

    Abstract: Convex polytopes have compact representations and exhibit convexity, which makes them suitable for abstracting obstacle-free spaces from various environments. Existing methods for generating convex polytopes always struggle to strike a balance between two requirements, producing high-quality polytope and efficiency. Moreover, another crucial requirement for convex polytopes to accurately contain c… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  32. arXiv:2402.16192  [pdf, other

    cs.CL

    Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

    Authors: Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

    Abstract: Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content. While initial defenses show promise against token-based threat models, there do not exist defenses that provide robustness against semantic attacks and avoid unfavorable trade-offs between robustness and nominal performance.… ▽ More

    Submitted 28 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: 37 pages

  33. arXiv:2402.10189  [pdf, other

    cs.CL cs.LG

    Uncertainty Quantification for In-Context Learning of Large Language Models

    Authors: Chen Ling, Xujiang Zhao, Xuchao Zhang, Wei Cheng, Yanchi Liu, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Jie Ji, Guangji Bai, Liang Zhao, Haifeng Chen

    Abstract: In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM's response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM's response, but they often overlo… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted to the main conference of NAACL 2024

  34. arXiv:2402.10184  [pdf, other

    cs.LG cs.AI cs.CL cs.DM

    Reward Generalization in RLHF: A Topological Perspective

    Authors: Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

    Abstract: Existing alignment methods share a common topology of information flow, where reward information is collected from humans, modeled with preference learning, and used to tune language models. However, this shared topology has not been systematically characterized, nor have its alternatives been thoroughly explored, leaving the problems of low data efficiency and unreliable generalization unaddresse… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  35. arXiv:2402.03784  [pdf, other

    cs.LG cs.AI physics.app-ph

    AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction

    Authors: Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang, Cheng Long, Gao Cong, **gyuan Wang

    Abstract: Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structur… ▽ More

    Submitted 6 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by the 12th International Conference on Learning Representations (ICLR 2024)

  36. arXiv:2402.02416  [pdf, other

    cs.CL cs.AI cs.LG

    Aligner: Efficient Alignment by Learning to Correct

    Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang

    Abstract: With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate un… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  37. arXiv:2402.00284  [pdf, other

    cs.IR cs.AI cs.LG

    PAP-REC: Personalized Automatic Prompt for Recommendation Language Model

    Authors: Zelong Li, Jianchao Ji, Yingqiang Ge, Wenyue Hua, Yongfeng Zhang

    Abstract: Recently emerged prompt-based Recommendation Language Models (RLM) can solve multiple recommendation tasks uniformly. The RLMs make full use of the inherited knowledge learned from the abundant pre-training data to solve the downstream recommendation tasks by prompts, without introducing additional parameters or network training. However, handcrafted prompts require significant expertise and human… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  38. arXiv:2401.15818  [pdf, other

    cs.RO

    A Middle Way to Traffic Enlightenment

    Authors: Matthew W. Nice, George Gunter, Junyi Ji, Yuhang Zhang, Matthew Bunting, Will Barbour, Jonathan Sprinkle, Dan Work

    Abstract: This paper introduces a novel approach that seeks a middle ground for traffic control in multi-lane congestion, where prevailing traffic speeds are too fast, and speed recommendations designed to dampen traffic waves are too slow. Advanced controllers that modify the speed of an automated car for wave-dampening, eco-driving, or other goals, typically are designed with forward collision safety in m… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: To Appear, ICCPS 2024

  39. arXiv:2401.15555  [pdf, other

    cs.CL

    Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion

    Authors: Yujian Liu, Jiabao Ji, Tong Yu, Ryan Rossi, Sungchul Kim, Handong Zhao, Ritwik Sinha, Yang Zhang, Shiyu Chang

    Abstract: Table question answering is a popular task that assesses a model's ability to understand and interact with structured data. However, the given table often does not contain sufficient information for answering the question, necessitating the integration of external knowledge. Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  40. arXiv:2401.11768  [pdf, other

    cs.LG cond-mat.mtrl-sci

    ADA-GNN: Atom-Distance-Angle Graph Neural Network for Crystal Material Property Prediction

    Authors: Jiao Huang, Qianli Xing, **glong Ji, Bo Yang

    Abstract: Property prediction is a fundamental task in crystal material research. To model atoms and structures, structures represented as graphs are widely used and graph learning-based methods have achieved significant progress. Bond angles and bond distances are two key structural information that greatly influence crystal properties. However, most of the existing works only consider bond distances and o… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  41. arXiv:2401.02402  [pdf, other

    cs.CV

    3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

    Authors: Zihao Xiao, Longlong **g, Shangxuan Wu, Alex Zihao Zhu, **gwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng

    Abstract: 3D panoptic segmentation is a challenging perception task, especially in autonomous driving. It aims to predict both semantic and instance annotations for 3D points in a scene. Although prior 3D panoptic segmentation approaches have achieved great performance on closed-set benchmarks, generalizing these approaches to unseen things and unseen stuff categories remains an open problem. For unseen obj… ▽ More

    Submitted 2 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  42. arXiv:2312.14936  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.LG

    PerCNet: Periodic Complete Representation for Crystal Graphs

    Authors: Jiao Huang, Qianli Xing, **glong Ji, Bo Yang

    Abstract: Crystal material representation is the foundation of crystal material research. Existing works consider crystal molecules as graph data with different representation methods and leverage the advantages of techniques in graph learning. A reasonable crystal representation method should capture the local and global information. However, existing methods only consider the local information of crystal… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  43. arXiv:2312.12470  [pdf, other

    cs.CV

    Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

    Authors: Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

    Abstract: Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries. Traditional Referring Image Segmentation (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation resu… ▽ More

    Submitted 2 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  44. Adaptive Tracking and Perching for Quadrotor in Dynamic Scenarios

    Authors: Yuman Gao, Jialin Ji, Qianhao Wang, Rui **, Yi Lin, Zhimeng Shang, Yanjun Cao, Shaojie Shen, Chao Xu, Fei Gao

    Abstract: Perching on the moving platforms is a promising solution to enhance the endurance and operational range of quadrotors, which could benefit the efficiency of a variety of air-ground cooperative tasks. To ensure robust perching, tracking with a steady relative state and reliable perception is a prerequisite. This paper presents an adaptive dynamic tracking and perching scheme for autonomous quadroto… ▽ More

    Submitted 17 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  45. arXiv:2312.08914  [pdf, other

    cs.CV

    CogAgent: A Visual Language Model for GUI Agents

    Authors: Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billi… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 27 pages, 19 figures

  46. arXiv:2312.08751  [pdf, other

    cs.LG

    Improve Robustness of Reinforcement Learning against Observation Perturbations via $l_\infty$ Lipschitz Policy Networks

    Authors: Buqing Nie, **gtian Ji, Yangqing Fu, Yue Gao

    Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks. However, recent works have revealed that DRL agents are susceptible to slight perturbations in observations. This vulnerability raises concerns regarding the effectiveness and robustness of deploying such agents in real-world applications. In this work, we propose a novel robust reinforcement learning… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted paper on AAAI2024

  47. arXiv:2312.00085  [pdf, other

    cs.CV

    X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

    Authors: Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

    Abstract: In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images an… ▽ More

    Submitted 25 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Technical report

  48. arXiv:2311.17227  [pdf, other

    cs.AI cs.CL cs.CY

    War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars

    Authors: Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang

    Abstract: Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the partic… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 47 pages, 9 figures, 5 tables

  49. arXiv:2311.15241  [pdf, other

    cs.CV cs.RO

    CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network

    Authors: Yuxuan Xiao, Yao Li, Chengzhen Meng, Xingchen Li, Jianmin Ji, Yanyong Zhang

    Abstract: The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual inter… ▽ More

    Submitted 17 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  50. arXiv:2311.12472  [pdf, other

    cs.AI

    Self-Supervised Deconfounding Against Spatio-Temporal Shifts: Theory and Modeling

    Authors: Jiahao Ji, Wentao Zhang, **gyuan Wang, Yue He, Chao Huang

    Abstract: As an important application of spatio-temporal (ST) data, ST traffic forecasting plays a crucial role in improving urban travel efficiency and promoting sustainable development. In practice, the dynamics of traffic data frequently undergo distributional shifts attributed to external factors such as time evolution and spatial differences. This entails forecasting models to handle the out-of-distrib… ▽ More

    Submitted 6 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 14 pages, 9 figures