Skip to main content

Showing 1–50 of 2,680 results for author: Xu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01303  [pdf, other

    cs.RO

    RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields

    Authors: Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, Li Zhang

    Abstract: Leveraging neural implicit representation to conduct dense RGB-D SLAM has been studied in recent years. However, this approach relies on a static environment assumption and does not work robustly within a dynamic environment due to the inconsistent observation of geometry and photometry. To address the challenges presented in dynamic environments, we propose a novel dynamic SLAM framework with neu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: IEEE RAL 2024

  2. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  3. arXiv:2407.01080  [pdf, other

    cs.CL cs.AI

    Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

    Authors: Yunqi Xu, Tianchi Cai, Jiyan Jiang, Xierui Song

    Abstract: The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

    Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

    Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  5. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  7. arXiv:2406.19898  [pdf, other

    cs.CL

    Paraphrase Types Elicit Prompt Engineering Capabilities

    Authors: Jan Philip Wahle, Terry Ruas, Yang Xu, Bela Gipp

    Abstract: Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression of prompts affect these models. This study systematically and empirically evaluates which linguistic features influence models through paraphrase types, i.e., different linguistic changes at particular positions… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.19874  [pdf, other

    cs.CL cs.AI

    Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

    Authors: Yang Xu, Yu Wang, Hao An, Zhichen Liu, Yongyuan Li

    Abstract: Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likeli… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 13 pages, 12 figures

    ACM Class: I.2.7

  9. arXiv:2406.19611  [pdf, other

    q-bio.QM cs.AI

    Multimodal Data Integration for Precision Oncology: Challenges and Future Directions

    Authors: Huajun Zhou, Fengtao Zhou, Chenyu Zhao, Yingxue Xu, Luyang Luo, Hao Chen

    Abstract: The essence of precision oncology lies in its commitment to tailor targeted treatments and care measures to each patient based on the individual characteristics of the tumor. The inherent heterogeneity of tumors necessitates gathering information from diverse data sources to provide valuable insights from various perspectives, fostering a holistic comprehension of the tumor. Over the past decade,… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  10. arXiv:2406.18962  [pdf, other

    cs.IR

    Multi-modal Food Recommendation using Clustering and Self-supervised Learning

    Authors: Yixin Zhang, Xin Zhou, Qianwen Meng, Fanglin Zhu, Yonghui Xu, Zhiqi Shen, Lizhen Cui

    Abstract: Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigati… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Working paper

  11. arXiv:2406.18868  [pdf, other

    cs.CV

    Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

    Authors: Yicheng Xu, Yuxin Chen, Jiahao Nie, Yusong Wang, Hui** Zhuang, Manabu Okumura

    Abstract: Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to mainta… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  12. arXiv:2406.18841  [pdf

    cs.CY cs.AI cs.CL

    Navigating LLM Ethics: Advancements, Challenges, and Future Directions

    Authors: Junfeng Jiao, Saleh Afroogh, Yiming Xu, Connor Phillips

    Abstract: This study addresses ethical issues surrounding Large Language Models (LLMs) within the field of artificial intelligence. It explores the common ethical challenges posed by both LLMs and other AI systems, such as privacy and fairness, as well as ethical challenges uniquely arising from LLMs. It highlights challenges such as hallucination, verifiable accountability, and decoding censorship complexi… ▽ More

    Submitted 27 June, 2024; v1 submitted 14 May, 2024; originally announced June 2024.

  13. arXiv:2406.18742  [pdf, other

    cs.CV cs.RO

    3D Feature Distillation with Object-Centric Priors

    Authors: Georgios Tziafas, Yucheng Xu, Zhibin Li, Hamidreza Kasaei

    Abstract: Grounding natural language to the physical world is a ubiquitous topic with a wide range of applications in computer vision and robotics. Recently, 2D vision-language models such as CLIP have been widely popularized, due to their impressive capabilities for open-vocabulary grounding in 2D images. Recent works aim to elevate 2D CLIP features to 3D via feature distillation, but either learn neural f… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted CoRL-24

  14. arXiv:2406.18522  [pdf, other

    cs.CV cs.CL

    ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

    Authors: Shenghai Yuan, **fa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan

    Abstract: We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e.g. Sora and Lumiere) in time-lapse video generation. In contrast to existing benchmarks that focus on the visual quality and textual relevance of generated videos, ChronoMagic-Bench focuses on the model's ability to generate time-lapse videos wi… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 31 pages, 15 figures

  15. Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion

    Authors: Wen Zhang, Ya**g Xu, Peng Ye, Zhiwei Huang, Zezhong Xu, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen

    Abstract: Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this p… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Paper accepted by TKDE in 2024

  16. arXiv:2406.17605  [pdf, other

    cs.MM cs.AI cs.CL cs.CV cs.IR

    NativE: Multi-modal Knowledge Graph Completion in the Wild

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Ya**g Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

    Abstract: Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: Accepted by SIGIR 2024 as a full paper

  17. arXiv:2406.17483  [pdf, other

    cs.CV eess.IV

    TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision

    Authors: Cina Arjmand, Yingfu Xu, Kevin Shidqi, Alexandra F. Dobrita, Kanishkan Vadivel, Paul Detterer, Manolis Sifalakis, Amirreza Yousefzadeh, Guangzhi Tang

    Abstract: Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted in ICONS 2024

  18. arXiv:2406.17233  [pdf, other

    cs.SE cs.CL

    Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement

    Authors: Yunlong Feng, Yang Xu, Dechuan Teng, Honglin Mu, Xiao Xu, Libo Qin, Wanxiang Che, Qingfu Zhu

    Abstract: Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Con… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  19. arXiv:2406.17148  [pdf, other

    cs.CV

    Unambiguous Recognition Should Not Rely Solely on Natural Language Training

    Authors: Renqing Luo, Yuhan Xu

    Abstract: In LaTeX text recognition using Transformer-based architectures, this paper identifies certain "bias" issues. For instance, $e-t$ is frequently misrecognized as $e^{-t}$. This bias stems from the inherent characteristics of the dataset. To mitigate this bias, we propose a LaTeX printed text recognition model trained on a mixed dataset of pseudo-formulas and pseudo-text. The model employs a Swin Tr… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  20. arXiv:2406.16835  [pdf, other

    cs.HC

    Preserving Real-World Finger Dexterity Using a Lightweight Fingertip Haptic Device for Virtual Dexterous Manipulation

    Authors: Yunxiu XU, Siyu Wang, Shoichi Hasegawa

    Abstract: This study presents a lightweight, wearable fingertip haptic device that provides physics-based haptic feedback for dexterous manipulation in virtual environments without hindering real-world interactions. The device's design utilizes thin strings and actuators attached to the fingernails, minimizing the weight (1.76g each finger) while preserving finger flexibility. Multiple types of haptic feedb… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  21. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.15691  [pdf, other

    math.OC cs.DS cs.GT

    Stochastic Scheduling with Abandonments via Greedy Strategies

    Authors: Yihua Xu, Rohan Ghuge, Sebastian Perez-Salazar

    Abstract: Motivated by applications where impatience is pervasive and service times are uncertain, we study a scheduling model where jobs may depart at an unknown point in time and service times are stochastic. Initially, we have access to a single server and $n$ jobs with known non-negative values: these jobs have unknown stochastic service and departure times with known distributional information, which w… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  23. arXiv:2406.15657  [pdf, other

    cs.IR

    FIRST: Faster Improved Listwise Reranking with Single Token Decoding

    Authors: Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji

    Abstract: Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidat… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Preprint

  24. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  25. arXiv:2406.14064  [pdf, other

    cs.IT eess.SP

    PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiple

    Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

    Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique based on discrete affine Fourier transform (DAFT). By properly tuning pre-chirp parameter and post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely avoid overlap of different paths, thus constitutes a full representation of delay-Doppler profile, which significantly improves… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  26. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, **jie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  27. arXiv:2406.13375  [pdf, other

    cs.CL

    ALiiCE: Evaluating Positional Fine-grained Citation Generation

    Authors: Yilong Xu, **hua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng

    Abstract: Large Language Models (LLMs) can enhance the credibility and verifiability by generating text with citations. However, existing tasks and evaluation methods are predominantly limited to sentence-level statement, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the fine-grained citation generation, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.12468  [pdf, other

    cs.CL cs.AI

    Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

    Authors: Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Hongcheng Gao, Yilong Xu, Xueqi Cheng

    Abstract: The parametric knowledge memorized by large language models (LLMs) becomes outdated quickly. In-context editing (ICE) is currently the most effective method for updating the knowledge of LLMs. Recent advancements involve enhancing ICE by modifying the decoding strategy, obviating the need for altering internal model structures or adjusting external prompts. However, this enhancement operates acros… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  30. arXiv:2406.12382  [pdf, other

    cs.CL

    From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

    Authors: Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Sheng** Liu, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.12225  [pdf, other

    cs.CV

    The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

    Authors: Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

    Abstract: This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tunin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Foundational Few-Shot Object Detection Challenge

  32. arXiv:2406.12178  [pdf, other

    cs.CV

    FCA-RAC: First Cycle Annotated Repetitive Action Counting

    Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

    Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  34. arXiv:2406.11371  [pdf, other

    cs.CV physics.optics

    Video Frame Interpolation for Polarization via Swin-Transformer

    Authors: Feng Huang, Xin Zhang, Yixuan Xu, Xuesong Wang, Xianyu Wu

    Abstract: Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects v… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages, 10 figures, 7 tables, 73 citations

  35. arXiv:2406.11301  [pdf, other

    cs.AI cs.CL cs.LG

    Optimizing and Testing Instruction-Following: Analyzing the Impact of Fine-Grained Instruction Variants on instruction-tuned LLMs

    Authors: Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu

    Abstract: The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation techniqu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  36. arXiv:2406.11285  [pdf, other

    cs.CR cs.CL

    Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment

    Authors: Jie Li, Yi Liu, Chongyang Liu, Xiaoning Ren, Ling Shi, Weisong Sun, Yinxing Xue

    Abstract: Large Language Models (LLMs) like OpenAI's GPT series, Anthropic's Claude, and Meta's LLaMa have shown remarkable capabilities in text generation. However, their susceptibility to toxic prompts presents significant security challenges. This paper investigates alignment techniques, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), to mitigate these risks.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  37. arXiv:2406.11131  [pdf, other

    cs.CL cs.AI cs.DB

    Are Large Language Models a Good Replacement of Taxonomies?

    Authors: Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

    Abstract: Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by VLDB 2024

  38. arXiv:2406.10563  [pdf, other

    cs.LG cs.AI cs.CR

    Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

    Authors: Yukai Xu, **gfeng Zhang, Yujie Gu

    Abstract: In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively traine… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to the 2024 IEEE Conference on Artificial Intelligence (IEEE CAI 2024)

  39. arXiv:2406.10445  [pdf, other

    cs.LG

    Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

    Authors: Yinglun Xu, David Zhu, Rohan Gumastate, Gagandeep Singh

    Abstract: Offline reinforcement learning has become one of the most practical RL settings. A recent success story has been RLHF, offline preference-based RL (PBRL) with preference from humans. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich understanding of offline RL from the reward-based… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  40. arXiv:2406.10118  [pdf, other

    cs.CL

    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

    Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: https://github.com/SEACrowd

  41. arXiv:2406.10000  [pdf, other

    cs.CV

    OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

    Authors: Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu

    Abstract: In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  42. arXiv:2406.09782  [pdf, other

    cs.CV

    Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

    Authors: Runze Liu, Dongchen Zhu, Guanghui Zhang, Yue Xu, Wenjun Shi, Xiaolin Zhang, Lei Wang, Jiamao Li

    Abstract: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  43. arXiv:2406.09469  [pdf, other

    cs.DB

    Conformance Testing of Relational DBMS Against SQL Specifications

    Authors: Shuang Liu, Chenglin Tian, Jun Sun, Ruifeng Wang, Wei Lu, Yongxin Zhao, Yinxing Xue, Junjie Wang, Xiaoyong Du

    Abstract: A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evalua… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  44. arXiv:2406.09089  [pdf, other

    cs.LG

    DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

    Authors: Xuemin Hu, Shen Li, Yingfen Xu, Bo Tang, Long Chen

    Abstract: Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given state, resulting in the extrapolation error issue. Recent works address this issue by employing generative adversarial networks (GANs). However, these methods often… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  45. arXiv:2406.08907  [pdf, other

    cs.CV cs.MM

    Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

    Authors: Yue Xu, Kaizhi Yang, Jiebo Luo, Xue** Chen

    Abstract: 3D visual grounding is an emerging research area dedicated to making connections between the 3D physical world and natural language, which is crucial for achieving embodied intelligence. In this paper, we propose DASANet, a Dual Attribute-Spatial relation Alignment Network that separately models and aligns object attributes and spatial relation features between language and 3D vision modalities. W… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  46. arXiv:2406.08903  [pdf, other

    cs.CL

    Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

    Authors: Bowen **, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

    Abstract: Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 12 pages

  47. arXiv:2406.08765  [pdf, other

    cs.LG

    LLM-based Knowledge Pruning for Time Series Data Analytics on Edge-computing Devices

    Authors: Ruibing **, Qing Xu, Min Wu, Yuecong Xu, Dan Li, Xiaoli Li, Zhenghua Chen

    Abstract: Limited by the scale and diversity of time series data, the neural networks trained on time series data often overfit and show unsatisfacotry performances. In comparison, large language models (LLMs) recently exhibit impressive generalization in diverse fields. Although massive LLM based approaches are proposed for time series tasks, these methods require to load the whole LLM in both training and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  48. arXiv:2406.08475  [pdf, other

    cs.CV

    Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

    Authors: Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll

    Abstract: Creating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realis… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Project Page: https://yuxuan-xue.com/human-3diffusion

  49. arXiv:2406.08386  [pdf, other

    cs.CY

    Banal Deception Human-AI Ecosystems: A Study of People's Perceptions of LLM-generated Deceptive Behaviour

    Authors: Xiao Zhan, Yifan Xu, Noura Abdi, Joe Collenette, Ruba Abu-Salma, Stefan Sarkadi

    Abstract: Large language models (LLMs) can provide users with false, inaccurate, or misleading information, and we consider the output of this type of information as what Natale (2021) calls `banal' deceptive behaviour. Here, we investigate peoples' perceptions of ChatGPT-generated deceptive behaviour and how this affects peoples' own behaviour and trust. To do this, we use a mixed-methods approach comprisi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.08310  [pdf, other

    cs.LG

    GraphFM: A Comprehensive Benchmark for Graph Foundation Model

    Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan

    Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogeniza… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.