Skip to main content

Showing 1–50 of 176 results for author: Qi, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16855  [pdf, other

    cs.CV

    DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

    Authors: Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, **g Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia

    Abstract: Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project page: https://dreambenchplus.github.io/

  2. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, **gyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.02884  [pdf, other

    cs.CV

    PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

    Authors: Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

    Abstract: Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages; typos corrected, appendix added

  5. arXiv:2406.00589  [pdf, other

    cs.CV cs.AI

    Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

    Authors: Zhuang Qi, Junlin Zhang, Xin Qi

    Abstract: Visual tracking fundamentally involves regressing the state of the target in each frame of a video. Despite significant progress, existing regression-based trackers still tend to experience failures and inaccuracies. To enhance the precision of target estimation, this paper proposes a tracking technique based on robust regression. Firstly, we introduce a novel robust linear regression estimator, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.20902  [pdf, other

    cs.CL cs.AI cs.CR

    Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

    Authors: Rongwu Xu, Zehan Qi, Wei Xu

    Abstract: Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL'24 (Findings). Camera-ready version

  7. arXiv:2405.20046  [pdf, other

    cs.AI

    Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

    Authors: Zhuang Qi, Lei Meng, Weihao He, Ruohan Zhang, Yu Wang, Xin Qi, Xiangxu Meng

    Abstract: Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global know… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  8. arXiv:2405.19606  [pdf, other

    cs.AI

    Relation Modeling and Distillation for Learning with Noisy Labels

    Authors: Xiaming Che, Junlin Zhang, Zhuang Qi, Xin Qi

    Abstract: Learning with noisy labels has become an effective strategy for enhancing the robustness of models, which enables models to better tolerate inaccurate data. Existing methods either focus on optimizing the loss function to mitigate the interference from noise, or design procedures to detect potential noise and correct errors. However, their effectiveness is often compromised in representation learn… ▽ More

    Submitted 1 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  9. arXiv:2405.19247  [pdf, other

    cs.LG

    Comparative Study of Neighbor-based Methods for Local Outlier Detection

    Authors: Zhuang Qi, Junlin Zhang, Xiaming Chen, Xin Qi

    Abstract: The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  10. arXiv:2405.14171  [pdf, other

    cs.CV

    Multi-view Remote Sensing Image Segmentation With SAM priors

    Authors: Zipeng Qi, Chenyang Liu, Zili Liu, Hao Chen, Yongchang Wu, Zhengxia Zou, Zhenwei Sh

    Abstract: Multi-view segmentation in Remote Sensing (RS) seeks to segment images from diverse perspectives within a scene. Recent methods leverage 3D information extracted from an Implicit Neural Field (INF), bolstering result consistency across multiple views while using limited accounts of labels (even within 3-5 labels) to streamline labor. Nonetheless, achieving superior performance within the constrain… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.13055  [pdf, other

    cs.CL cs.AI cs.CY

    Large Language Models for Medicine: A Survey

    Authors: Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, Philip S. Yu

    Abstract: To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, w… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Preprint. 5 figures,5 tables

  12. arXiv:2405.13001  [pdf, other

    cs.CL cs.AI cs.CY

    Large Language Models for Education: A Survey

    Authors: Hanyi Xu, Wensheng Gan, Zhenlian Qi, Jiayang Wu, Philip S. Yu

    Abstract: Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and financ… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Journal of Machine Learning and Cybernetics. 4 tables, 6 figures

  13. arXiv:2405.09054  [pdf, other

    cs.CV

    Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

    Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

    Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  14. arXiv:2405.04520  [pdf, other

    cs.CL cs.LG cs.SE

    NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

    Authors: Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeB… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  15. arXiv:2405.00204  [pdf, other

    cs.CL cs.AI

    General Purpose Verification for Chain of Thought Prompting

    Authors: Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel Ballesteros

    Abstract: Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 22 pages, preprint

  16. arXiv:2404.18648  [pdf, other

    cs.CV

    Uncertainty-boosted Robust Video Activity Anticipation

    Authors: Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang

    Abstract: Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by T-PAMI

  17. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  18. arXiv:2404.12634  [pdf

    cs.CV cs.AI cs.LG

    Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment

    Authors: Danqing Ma, Meng Wang, Ao Xiang, Zongqing Qi, Qin Yang

    Abstract: This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of str… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  19. arXiv:2404.12407  [pdf, other

    cs.CV cs.LG

    TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen

    Authors: Da-Wei Zhou, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan

    Abstract: The era of pre-trained models has ushered in a wealth of new insights for the machine learning community. Among the myriad of questions that arise, one of paramount importance is: 'Do pre-trained models possess comprehensive knowledge?' This paper seeks to address this crucial inquiry. In line with our objective, we have made publicly available a novel dataset comprised of images from TV series re… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project page: https://tv-100.github.io/

  20. arXiv:2404.10147  [pdf, other

    cs.CV

    Eyes on the Streets: Leveraging Street-Level Imaging to Model Urban Crime Dynamics

    Authors: Zhixuan Qi, Huaiying Luo, Chen Chi

    Abstract: This study addresses the challenge of urban safety in New York City by examining the relationship between the built environment and crime rates using machine learning and a comprehensive dataset of street view images. We aim to identify how urban landscapes correlate with crime statistics, focusing on the characteristics of street views and their association with crime rates. The findings offer in… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  21. arXiv:2404.01780  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

    Authors: Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, Yuedong Fang, Qi Guo, Dezi Liu, Guoliang Li, Lin Lin, Ming Li, Ran Li, Xiaobo Li, Yu Luo, Xianmin Meng, Jundan Nie, Zhaoxiang Qi, Yisheng Qiu, Li Shao, Hao Tian , et al. (7 additional authors not shown)

    Abstract: Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

  22. arXiv:2403.19646  [pdf, other

    cs.CV

    Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

    Authors: Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi

    Abstract: Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI techn… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  23. arXiv:2403.10065  [pdf, other

    cs.CL

    Triple GNNs: Introducing Syntactic and Semantic Information for Conversational Aspect-Based Quadruple Sentiment Analysis

    Authors: Binbin Li, Yuqing Li, Siyu Jia, Bingnan Ma, Yu Ding, Zisen Qi, Xingbang Tan, Menghan Guo, Shenghui Liu

    Abstract: Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utteran… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CSCWD2024

  24. arXiv:2403.10044  [pdf, other

    cs.CV

    SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

    Authors: Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

    Abstract: Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI2024

  25. arXiv:2403.08715  [pdf, other

    cs.CL

    SOTOPIA-$Ï€$: Interactive Learning of Socially Intelligent Language Agents

    Authors: Ruiyi Wang, Haofei Yu, Wenxin Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu

    Abstract: Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$Ï€$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social… ▽ More

    Submitted 25 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  26. arXiv:2403.08511  [pdf

    cs.CV

    A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product

    Authors: Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, Danqing Ma

    Abstract: This paper introduces a new multi-modal model based on the Transformer architecture and tensor product fusion strategy, combining BERT's text vectors and ViT's image vectors to classify students' psychological conditions, with an accuracy of 93.65%. The purpose of the study is to accurately analyze the mental health status of students from various data sources. This paper discusses modal fusion me… ▽ More

    Submitted 19 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  27. arXiv:2403.08499  [pdf

    cs.CV

    Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks

    Authors: Zongqing Qi, Danqing Ma, **gyu Xu, Ao Xiang, Hedi Qu

    Abstract: In recent years, there have been frequent incidents of foreign objects intruding into railway and Airport runways. These objects can include pedestrians, vehicles, animals, and debris. This paper introduces an improved YOLOv5 architecture incorporating FasterNet and attention mechanisms to enhance the detection of foreign objects on railways and Airport runways. This study proposes a new dataset,… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  28. arXiv:2403.08319  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Knowledge Conflicts for LLMs: A Survey

    Authors: Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu

    Abstract: This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs… ▽ More

    Submitted 22 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Our GitHub repo is available at https://github.com/pillowsofwind/Knowledge-Conflicts-Survey

  29. arXiv:2403.01694  [pdf, other

    cs.RO

    Tac-Man: Tactile-Informed Prior-Free Manipulation of Articulated Objects

    Authors: Zihang Zhao, Yuyang Li, Wanlin Li, Zhenghao Qi, Lecheng Ruan, Yixin Zhu, Kaspar Althoefer

    Abstract: Integrating robotics into human-centric environments such as homes, necessitates advanced manipulation skills as robotic devices will need to engage with articulated objects like doors and drawers. Key challenges in robotic manipulation are the unpredictability and diversity of these objects' internal structures, which render models based on priors, both explicit and implicit, inadequate. Their re… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 18 pages, 13 figures, 5 tables

  30. arXiv:2402.17840  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

    Authors: Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, Himabindu Lakkaraju

    Abstract: Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with ins… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  31. arXiv:2402.17766  [pdf, other

    cs.CV

    ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

    Authors: Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

    Abstract: This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point clo… ▽ More

    Submitted 6 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Project page: https://qizekun.github.io/shapellm/

  32. arXiv:2402.15481  [pdf, other

    cs.CL cs.CY

    Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

    Authors: Y Liu, K Yang, Z Qi, X Liu, Y Yu, C Zhai

    Abstract: This study investigates why and how inconsistency in the generation of Large Language Models (LLMs) might induce or exacerbate societal injustice. For instance, LLMs frequently exhibit contrasting gender stereotypes regarding the same career depending on varied contexts, highlighting the arguably harmful unpredictability of LLMs' behavioral patterns. To augment the existing discrimination assessme… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  33. arXiv:2402.04195  [pdf, other

    cs.CV

    Instance by Instance: An Iterative Framework for Multi-instance 3D Registration

    Authors: Xinyue Cao, Xiyu Zhang, Yuxin Cheng, Zhaoshuai Qi, Yanning Zhang, Jiaqi Yang

    Abstract: Multi-instance registration is a challenging problem in computer vision and robotics, where multiple instances of an object need to be registered in a standard coordinate system. In this work, we propose the first iterative framework called instance-by-instance (IBI) for multi-instance 3D registration (MI-3DReg). It successively registers all instances in a given scenario, starting from the easies… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 14 pages, 12 figures, 10 tables

  34. arXiv:2402.01900  [pdf, other

    stat.ML cs.LG

    Distributional Off-policy Evaluation with Bellman Residual Minimization

    Authors: Sungee Hong, Zhengling Qi, Raymond K. W. Wong

    Abstract: We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms. In contrast to most existing works (that rely on supremum-extended statistical distances such as supremum-Wasserstein distance), we study the expectation-extended statistical distance for quantifying the distributional Bellman residuals and… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  35. RecDCL: Dual Contrastive Learning for Recommendation

    Authors: Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, Jie Tang

    Abstract: Self-supervised learning (SSL) has recently achieved great success in mining the user-item interactions for collaborative filtering. As a major paradigm, contrastive learning (CL) based SSL helps address data sparsity in Web platforms by contrasting the embeddings between raw and augmented data. However, existing CL-based methods mostly focus on contrasting in a batch-wise way, failing to exploit… ▽ More

    Submitted 18 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted to WWW 2024

    Journal ref: Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore

  36. arXiv:2401.08932  [pdf, other

    cs.CV cs.AI

    Learning to detect cloud and snow in remote sensing images from noisy labels

    Authors: Zili Liu, Hao Chen, Wenyuan Li, Keyan Chen, Zipeng Qi, Chenyang Liu, Zhengxia Zou, Zhenwei Shi

    Abstract: Detecting clouds and snow in remote sensing images is an essential preprocessing task for remote sensing imagery. Previous works draw inspiration from semantic segmentation models in computer vision, with most research focusing on improving model architectures to enhance detection performance. However, unlike natural images, the complexity of scenes and the diversity of cloud types in remote sensi… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  37. arXiv:2401.07567  [pdf, other

    cs.CV

    Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

    Authors: Zhaobo Qi, Yibo Yuan, Xiaowen Ruan, Shuhui Wang, Weigang Zhang, Qingming Huang

    Abstract: Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts. Existing methods resort to utilizing prior knowledge about bias to artificially break this uneven distribution, which only removes a limited amount of significant languag… ▽ More

    Submitted 19 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: accepted by AAAI 2024

  38. arXiv:2312.15311  [pdf, other

    cs.CV

    Pixel-Level Change Detection Pseudo-Label Learning for Remote Sensing Change Captioning

    Authors: Chenyang Liu, Keyan Chen, Zipeng Qi, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

    Abstract: The existing methods for Remote Sensing Image Change Captioning (RSICC) perform well in simple scenes but exhibit poorer performance in complex scenes. This limitation is primarily attributed to the model's constrained visual ability to distinguish and locate changes. Acknowledging the inherent correlation between change detection (CD) and RSICC tasks, we believe pixel-level CD is significant for… ▽ More

    Submitted 21 May, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  39. arXiv:2312.15011  [pdf, other

    cs.CV

    Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

    Authors: Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

    Abstract: The rapidly evolving sector of Multi-modal Large Language Models (MLLMs) is at the forefront of integrating linguistic and visual processing in artificial intelligence. This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision). Our study involves a multi-faceted evaluation of both models across key dimensions such as Vision-Language Capa… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Project Page: https://github.com/Qi-Zhangyang/Gemini-vs-GPT4V. arXiv admin note: substantial text overlap with arXiv:2309.17421

  40. arXiv:2312.13913  [pdf, other

    cs.CV

    Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

    Authors: Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

    Abstract: This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within mod… ▽ More

    Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Website: https://github.com/OpenTexture/Paint3D

  41. arXiv:2312.12808  [pdf, other

    cs.CL

    Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario

    Authors: Hiroki Onozeki, Zhiyang Qi, Kazuma Akiyama, Ryutaro Asahara, Takumasa Kaneko, Michimasa Inaba

    Abstract: This paper describes our dialogue system submitted to Dialogue Robot Competition 2023. The system's task is to help a user at a travel agency decide on a plan for visiting two sightseeing spots in Kyoto City that satisfy the user. Our dialogue system is flexible and stable and responds to user requirements by controlling dialogue flow according to dialogue scenarios. We also improved user satisfac… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2023

  42. arXiv:2312.05621  [pdf, other

    cs.CL

    PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching

    Authors: Zhenting Qi, Xiaoyu Tan, Shaojie Shi, Chao Qu, Yinghui Xu, Yuan Qi

    Abstract: Instruction fine-tuning has conventionally been employed to adapt Large Language Models (LLMs) to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuni… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: Accepted by EMNLP 2023 (Industry Track), Oral Presentation

  43. arXiv:2312.04461  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

    Authors: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

    Abstract: Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized t… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Tech report; Project page: https://photo-maker.github.io/

  44. arXiv:2312.03718  [pdf, other

    cs.CL cs.AI

    Large Language Models in Law: A Survey

    Authors: **qi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu

    Abstract: The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI wi… ▽ More

    Submitted 25 November, 2023; originally announced December 2023.

    Comments: Preprint

  45. arXiv:2312.02980  [pdf, other

    cs.CV

    GPT4Point: A Unified Framework for Point-Language Understanding and Generation

    Authors: Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

    Abstract: Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation. To solve this problem, we introduce GPT4Point, an innovative groundbreaking point-language multimodal model designed specifically for unified 3D object understanding a… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  46. arXiv:2311.18435  [pdf, other

    cs.CV

    Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

    Authors: Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, **wen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, **gdong Wang

    Abstract: This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion (LRDiff) framework. Vision Guidance, a spatial layout condition, acts as a clue in the perturbed distribution, greatly narrowing down the search space, to focus on the image sampling process ad… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  47. arXiv:2311.13160  [pdf, other

    cs.AI

    Large Language Models in Education: Vision and Opportunities

    Authors: Wensheng Gan, Zhenlian Qi, Jiayang Wu, Jerry Chun-Wei Lin

    Abstract: With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: IEEE BigData 2023. 10 pages

  48. arXiv:2311.08225  [pdf, other

    eess.IV cs.CV

    Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images

    Authors: Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang

    Abstract: Cross-modality synthesis (CMS), super-resolution (SR), and their combination (CMSR) have been extensively studied for magnetic resonance imaging (MRI). Their primary goals are to enhance the imaging quality by synthesizing the desired modality and reducing the slice thickness. Despite the promising synthetic results, these techniques are often tailored to specific tasks, thereby limiting their ada… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  49. arXiv:2311.05720  [pdf, other

    cs.CL cs.AI cs.LG

    Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models

    Authors: Simon Stepputtis, Joseph Campbell, Yaqi Xie, Zhengyang Qi, Wenxin Sharon Zhang, Ruiyi Wang, Sanketh Rangreji, Michael Lewis, Katia Sycara

    Abstract: Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP, Findings of the Association for Computational Linguistics)

  50. arXiv:2310.19784  [pdf, other

    cs.CV cs.AI

    CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

    Authors: Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan

    Abstract: Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization a… ▽ More

    Submitted 7 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Project webpage available at https://jiangyzy.github.io/CustomNet/