Skip to main content

Showing 1–50 of 531 results for author: Gao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00896  [pdf, other

    eess.SP cs.AI

    Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

    Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

    Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.19966  [pdf, other

    cs.CL

    Simulating Financial Market via Large Language Model based Agents

    Authors: Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

    Abstract: Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.19644  [pdf, other

    cs.AI

    Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

    Authors: Zichao Shen, Tianchen Zhu, Qingyun Sun, Shiqi Gao, Jianxin Li

    Abstract: Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints. Preference-based reinforcement learning (PbRL) presents a pioneering framework that cap… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: accepted by IJCAI 2024 GAAMAL

  4. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  5. arXiv:2406.19014  [pdf, other

    cs.GT

    The Impact of Autonomous Vehicles on Ride-Hailing Platforms with Strategic Human Drivers

    Authors: Shuqin Gao, Xinyuan Wu, Antonis Dimakis, Costas Courcoubetis

    Abstract: Motivated by the rapid development of autonomous vehicle technology, this work focuses on the challenges of introducing them in ride-hailing platforms with conventional strategic human drivers. We consider a ride-hailing platform that operates a mixed fleet of autonomous vehicles (AVs) and conventional vehicles (CVs), where AVs are fully controlled by the platform and CVs are operated by self-inte… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: This is a working paper. 60 pages

  6. arXiv:2406.18181  [pdf, ps, other

    cs.SE

    An Empirical Study of Unit Test Generation with Large Language Models

    Authors: Lin Yang, Chen Yang, Shutao Gao, Wei**g Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

    Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  7. arXiv:2406.18085  [pdf, other

    cs.CL

    Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

    Authors: Ran Song, Shizhu He, Shengxiang Gao, Li Cai, Kang Liu, Zhengtao Yu, Jun Zhao

    Abstract: Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, ACL 2023

  8. arXiv:2406.18062  [pdf, other

    cs.LG cs.AI

    Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

    Authors: Chung-En Sun, Sicun Gao, Tsui-Wei Weng

    Abstract: Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Published in ICML 2024

  9. arXiv:2406.16473  [pdf, other

    cs.CV cs.AI

    Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

    Abstract: The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional map**s of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16459  [pdf, other

    cs.CV

    Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

    Authors: Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

    Abstract: The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.15848  [pdf, other

    cs.CV

    Quality-guided Skin Tone Enhancement for Portrait Photography

    Authors: Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai

    Abstract: In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a map** from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image contin… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  12. arXiv:2406.14891  [pdf, other

    cs.CL cs.IR

    Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

    Authors: Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, Zhaochun Ren

    Abstract: Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable no… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (main conference)

  13. arXiv:2406.12178  [pdf, other

    cs.CV

    FCA-RAC: First Cycle Annotated Repetitive Action Counting

    Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

    Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.12042  [pdf, other

    cs.CV cs.LG

    Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

    Authors: Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao, Heng Huang

    Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pru… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.11228  [pdf, other

    cs.CL

    ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark

    Authors: Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut

    Abstract: We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes mul… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.10813  [pdf, other

    cs.CL

    Self-Evolution Fine-Tuning for Policy Optimization

    Authors: Ruijun Chen, Jiehao Liang, Shi** Gao, Fanqi Wan, Xiaojun Quan

    Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  17. arXiv:2406.07843  [pdf, other

    cs.CV q-bio.NC

    Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

    Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

    Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint NeurIPS 2024

  18. arXiv:2406.05361  [pdf, other

    cs.CL

    Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

    Authors: Xiuying Chen, Shen Gao, Mingzhe Li, Qingqing Zhu, Xin Gao, Xiangliang Zhang

    Abstract: Nowadays, neural text generation has made tremendous progress in abstractive summarization tasks. However, most of the existing summarization models take in the whole document all at once, which sometimes cannot meet the needs in practice. Practically, social text streams such as news events and tweets keep growing from time to time, and can only be fed to the summarization system step by step. He… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, published in TASLP

  19. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  20. arXiv:2406.04938  [pdf, other

    cs.LG cs.AI

    SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training

    Authors: Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao

    Abstract: Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  21. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuan**g Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  22. arXiv:2406.00617  [pdf, other

    cs.DB cs.SI

    Maximum $k$-Plex Search: An Alternated Reduction-and-Bound Method

    Authors: Shuohao Gao, Kaiqiang Yu, Shengxin Liu, Cheng Long

    Abstract: $k$-plexes relax cliques by allowing each vertex to disconnect to at most $k$ vertices. Finding a maximum $k… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  23. arXiv:2406.00587  [pdf, other

    cs.CV

    Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024

    Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

    Abstract: Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction,because the real-world is actually video-based rather th… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Champion Solution for CVPR 2024 PVUW VSS Track. arXiv admin note: text overlap with arXiv:2306.02894

  24. arXiv:2406.00500  [pdf, other

    cs.CV

    2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation

    Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

    Abstract: Video Panoptic Segmentation (VPS) is a challenging task that is extends from image panoptic segmentation.VPS aims to simultaneously classify, track, segment all objects in a video, including both things and stuff. Due to its wide application in many downstream tasks such as video understanding, video editing, and autonomous driving. In order to deal with the task of video panoptic segmentation in… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 2nd Place Solution for CVPR 2024 PVUW VPS Track

  25. arXiv:2406.00494  [pdf, other

    cs.LG cs.AI

    Activation-Descent Regularization for Input Optimization of ReLU Networks

    Authors: Hongzhan Yu, Sicun Gao

    Abstract: We present a new approach for input optimization of ReLU networks that explicitly takes into account the effect of changes in activation patterns. We analyze local optimization steps in both the input space and the space of activation patterns to propose methods with superior local descent properties. To accomplish this, we convert the discrete space of activation patterns into differentiable repr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ICML'24 Proceedings

  26. arXiv:2406.00378  [pdf

    physics.app-ph cs.NE

    Real-Time State Modulation and Acquisition Circuit in Neuromorphic Memristive Systems

    Authors: Shengbo Wang, Cong Li, Tongming Pu, Jian Zhang, Weihao Ma, Luigi Occhipinti, Arokia Nathan, Shuo Gao

    Abstract: Memristive neuromorphic systems are designed to emulate human perception and cognition, where the memristor states represent essential historical information to perform both low-level and high-level tasks. However, current systems face challenges with the separation of state modulation and acquisition, leading to undesired time delays that impact real-time performance. To overcome this issue, we i… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  27. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kai**g Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  28. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  29. arXiv:2405.18416  [pdf, other

    cs.CV

    3D StreetUnveiler with Semantic-Aware 2DGS

    Authors: **gwei Xu, Yikai Wang, Yiqun Zhao, Yanwei Fu, Shenghua Gao

    Abstract: Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ fr… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://streetunveiler.github.io

  30. arXiv:2405.17832  [pdf, other

    cs.LG cs.AI cs.RO

    Mollification Effects of Policy Gradient Methods

    Authors: Tao Wang, Sylvia Herbert, Sicun Gao

    Abstract: Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy sear… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 41 figures

  31. arXiv:2405.17398  [pdf, other

    cs.CV cs.AI

    Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

    Authors: Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li

    Abstract: World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high f… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and model: https://github.com/OpenDriveLab/Vista, video demos: https://vista-demo.github.io

  32. arXiv:2405.16533  [pdf, other

    cs.CL

    Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

    Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, Zhaochun Ren

    Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Work in progress

  33. arXiv:2405.16437  [pdf, other

    cs.CV

    Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation

    Authors: Yawen Zou, Chunzhi Gu, Jun Yu, Shangce Gao, Chao Zhang

    Abstract: Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may s… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  34. arXiv:2405.14347  [pdf, other

    eess.SP cs.AI

    Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng

    Abstract: Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.14169  [pdf, other

    cs.CV

    Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography

    Authors: Nhat Chung, Sensen Gao, Tuan-Anh Vu, Jie Zhang, Aishan Liu, Yun Lin, ** Song Dong, Qing Guo

    Abstract: Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safet… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 tables, 5 figures, work in progress

  36. arXiv:2405.13872  [pdf, other

    cs.AI cs.CL cs.CV

    Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

    Authors: Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT ha… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Correct the case title

  37. arXiv:2405.10504  [pdf

    cs.CV

    Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image

    Authors: Jianshun Zeng, Wang Li, Yanjie Lv, Shuai Gao, YuChu Qin

    Abstract: Street-view image has been widely applied as a crucial mobile map** data source. The inpainting of street-view images is a critical step for street-view image processing, not only for the privacy protection, but also for the urban environment map** applications. This paper presents a novel Deep Neural Network (DNN), multi-scale semantic prior Feature guided image inpainting Network (MFN) for i… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  38. arXiv:2405.05708  [pdf

    cs.IT

    Characteristic-Mode Based Conformal Design of Ultra-Wideband Antenna Array

    Authors: Zhan Chen, Wei Hu, Yuchen Gao, Qi Luo, Xiangbo Wang, Steven Gao

    Abstract: An innovative design method of conformal array antennas is presented by utilizing characteristic mode analysis (CMA) in this work. A single-layer continuous perfect electric conductor under bending conditions is conducted by CMA to evaluate the variations in operating performance. By using this method, the design process of a conformal array is simplified. The results indicate that the operating p… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  39. arXiv:2405.02982  [pdf, other

    cs.CV

    Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

    Authors: Xin **, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

    Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  40. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  41. arXiv:2405.02561  [pdf, other

    cs.LG math.NA

    Understanding the Difficulty of Solving Cauchy Problems with PINNs

    Authors: Tao Wang, Bo Zhao, Sicun Gao, Rose Yu

    Abstract: Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural netwo… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages and 18 figures

  42. arXiv:2404.19326  [pdf, other

    cs.CV

    LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

    Authors: Lingyi Hong, Zhongying Liu, Wenchao Chen, Chenzhi Tan, Yuang Feng, Xinyu Zhou, Pinxue Guo, **glun Li, Zhaoyu Chen, Shuyong Gao, Wei Zhang, Wenqiang Zhang

    Abstract: Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets rest… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: LVOS V2

  43. arXiv:2404.19232  [pdf, other

    cs.CL cs.AI

    GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model

    Authors: Xinzhe Li, Ming Liu, Shang Gao

    Abstract: Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem… ▽ More

    Submitted 29 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  44. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  45. arXiv:2404.18114  [pdf, other

    cs.CV cs.MM

    Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

    Authors: Haiwen Diao, Ying Zhang, Shang Gao, Xiang Ruan, Huchuan Lu

    Abstract: Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a b… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 12 pages, 9 figures, Accepted by TIP2024

  46. arXiv:2404.17381  [pdf, other

    cs.CV

    Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

    Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

    Abstract: We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  47. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  48. arXiv:2404.16280  [pdf, ps, other

    cs.NE cs.AI cs.LG

    An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems

    Authors: Sichen Tao, Ruihan Zhao, Kaiyu Wang, Shangce Gao

    Abstract: Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research ba… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  49. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using metastable 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, **gfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  50. arXiv:2404.16127  [pdf, other

    cs.LG stat.ML

    Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection

    Authors: Elena Albu, Shan Gao, Pieter Stijnen, Frank Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster

    Abstract: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data fro… ▽ More

    Submitted 24 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.