Skip to main content

Showing 1–50 of 240 results for author: Ji, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.14969  [pdf, other

    cs.LG cs.AI

    Uni-Mol2: Exploring Molecular Pretraining Model at Scale

    Authors: Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

    Abstract: In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  5. arXiv:2406.07828  [pdf, other

    cs.CV

    Spatial Annealing Smoothing for Efficient Few-shot Neural Rendering

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities in reconstructing scenes for view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse view inputs, due to the issue of overfitting. While various regularization strategies have been devised to address these challenges, they often depend on inefficient assum… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.04274  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models

    Authors: Xiang Ji, Sanjeev Kulkarni, Mengdi Wang, Tengyang Xie

    Abstract: This work studies the challenge of aligning large language models (LLMs) with offline preference data. We focus on alignment by Reinforcement Learning from Human Feedback (RLHF) in particular. While popular preference optimization methods exhibit good empirical performance in practice, they are not theoretically guaranteed to converge to the optimal policy and can provably fail when the data cover… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2405.18300  [pdf, other

    cs.AI

    CompetEvo: Towards Morphological Evolution from Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Hua** Liu

    Abstract: Training an agent to adapt to specific tasks through co-optimization of morphology and control has widely attracted attention. However, whether there exists an optimal configuration and tactics for agents in a multiagent competition scenario is still an issue that is challenging to definitively conclude. In this context, we propose competitive evolution (CompetEvo), which co-evolves agents' design… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.08886  [pdf, other

    cs.LG stat.ML

    The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

    Authors: Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

    Abstract: In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: ICML2024

  9. arXiv:2405.07626  [pdf, other

    cs.LG cs.AI

    AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

    Authors: Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, **g** Bi

    Abstract: Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edge… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 13pages

  10. arXiv:2405.06536  [pdf, other

    cs.CV

    Mesh Denoising Transformer

    Authors: Wenbo Zhao, Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji

    Abstract: Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  11. arXiv:2405.00718  [pdf, other

    cs.CL cs.AI

    Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models

    Authors: Xu Ji, Jianyi Zhang, Ziyin Zhou, Zhangchi Zhao, Qianqian Qiao, Kaiying Han, Md Imran Hossen, Xiali Hei

    Abstract: Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments r… ▽ More

    Submitted 25 April, 2024; originally announced May 2024.

  12. arXiv:2404.12768  [pdf, other

    cs.CV cs.AI cs.GR

    MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

    Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

    Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  13. arXiv:2404.06666  [pdf, other

    cs.CV cs.AI cs.CL cs.CR

    SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

    Authors: Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu

    Abstract: Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing impro… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Journal ref: ACM Conference on Computer and Communications Security (CCS 2024)

  14. arXiv:2404.03962  [pdf, other

    cs.CV

    RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications

    Authors: Xingyu Liu, Chenyangguang Zhang, Gu Wang, Ruida Zhang, Xiangyang Ji

    Abstract: In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim). In particular, high-fidelity depth data i… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: accepted by ICRA'24

  15. arXiv:2404.01120  [pdf, other

    cs.CV

    Motion Blur Decomposition with Cross-shutter Guidance

    Authors: Xiang Ji, Haiyang Jiang, Yinqiang Zheng

    Abstract: Motion blur is a frequently observed image artifact, especially under insufficient illumination where exposure time has to be prolonged so as to collect more photons for a bright enough image. Rather than simply removing such blurring effects, recent researches have aimed at decomposing a blurry image into multiple sharp images with spatial and temporal coherence. Since motion blur decomposition i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  16. arXiv:2404.00992  [pdf, other

    cs.CV

    SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless,… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  17. arXiv:2403.18512  [pdf, other

    cs.CV

    ParCo: Part-Coordinating Text-to-Motion Synthesis

    Authors: Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji

    Abstract: We study a challenging task: text-to-motion synthesis, aiming to generate motions that align with textual descriptions and exhibit coordinated movements. Currently, the part-based methods introduce part partition into the motion synthesis process to achieve finer-grained generation. However, these methods encounter challenges such as the lack of coordination between different part motions and diff… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  18. arXiv:2403.17994  [pdf, other

    cs.CV cs.LG

    Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

    Authors: Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji

    Abstract: This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet eff… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  19. arXiv:2403.17094  [pdf, other

    cs.CV cs.LG

    SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving

    Authors: Yiming Xie, Henglu Wei, Zhenyi Liu, Xiaoyu Wang, Xiangyang Ji

    Abstract: To advance research in learning-based defogging algorithms, various synthetic fog datasets have been developed. However, existing datasets created using the Atmospheric Scattering Model (ASM) or real-time rendering engines often struggle to produce photo-realistic foggy images that accurately mimic the actual imaging process. This limitation hinders the effective generalization of models from synt… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  20. arXiv:2403.16561  [pdf, other

    cs.LG cs.AI

    FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

    Authors: Xinyuan Ji, Zhaowei Zhu, Wei Xi, Olga Gadyatskaya, Zilong Song, Yong Cai, Yang Liu

    Abstract: Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: accepted by AAA24

  21. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li **, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  22. arXiv:2403.12316  [pdf, other

    cs.CL

    OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

    Authors: Chuang Liu, Linhao Yu, Jiaxuan Li, Renren **, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, **wang Song, Hongying Zan, Sun Li, Deyi Xiong

    Abstract: The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that b… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  23. arXiv:2403.10487  [pdf, other

    cs.RO cs.AI

    Stimulate the Potential of Robots via Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Hua** Liu

    Abstract: It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.10099  [pdf, other

    cs.CV

    KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

    Authors: Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji

    Abstract: In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models from a pre-processed database to tightly match the target. Unlike existing dense matching based methods that typically struggle with noisy partial scans, we propose to leverage category-consisten… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  25. arXiv:2403.06775  [pdf, other

    cs.CV

    FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

    Authors: Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen

    Abstract: Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR2024

  26. arXiv:2403.06461  [pdf, other

    cs.CV

    Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal test-time adaptation (MM-TTA) is proposed to adapt models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner. Previous MM-TTA methods rely on predictions of cross-modal information in each input frame, while they ignore the fact that predictions of geometric neighborhoods within consecutive frames are highly correlated, leading to unsta… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  27. arXiv:2403.06202  [pdf, other

    eess.SY cs.GT

    Pursuit Winning Strategies for Reach-Avoid Games with Polygonal Obstacles

    Authors: Rui Yan, Shuai Mi, Xiaoming Duan, **tao Chen, Xiangyang Ji

    Abstract: This paper studies a multiplayer reach-avoid differential game in the presence of general polygonal obstacles that block the players' motions. The pursuers cooperate to protect a convex region from the evaders who try to reach the region. We propose a multiplayer onsite and close-to-goal (MOCG) pursuit strategy that can tell and achieve an increasing lower bound on the number of guaranteed defeate… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures

  28. arXiv:2403.06168  [pdf, other

    cs.CV cs.AI

    DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

    Authors: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, **long Peng, Zhengkai Jiang, Jiangning Zhang, Taisong **, Chengjie Wang, Rongrong Ji

    Abstract: Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matti… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  29. arXiv:2403.05438  [pdf, other

    cs.CV

    VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

    Authors: Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo

    Abstract: Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) still lag far behind in frame quality and text alignment, owing to insufficient quality and quantity of training videos. In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project page: https://videoelevator.github.io Code: https://github.com/YBYBZhang/VideoElevator

  30. arXiv:2403.05160  [pdf, other

    cs.CV

    MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models

    Authors: Zijie Fang, Yifeng Wang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing Zhang

    Abstract: Recently, pathological diagnosis, the gold standard for cancer diagnosis, has achieved superior performance by combining the Transformer with the multiple instance learning (MIL) framework using whole slide images (WSIs). However, the giga-pixel nature of WSIs poses a great challenge for the quadratic-complexity self-attention mechanism in Transformer to be applied in MIL. Existing studies usually… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 2 figures

  31. arXiv:2403.02905  [pdf, other

    cs.MM

    MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

    Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

    Abstract: The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based o… ▽ More

    Submitted 17 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  32. arXiv:2402.11826  [pdf, other

    cs.CV

    Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

    Authors: Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng, Xiangyang Ji

    Abstract: Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods foc… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  33. arXiv:2402.06131  [pdf, other

    cs.RO cs.CV

    PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

    Authors: Xinggang Hu, Yanmin Wu, Mingyuan Zhao, Linghao Yang, Xiangkui Zhang, Xiangyang Ji

    Abstract: Visual SLAM (Simultaneous Localization and Map**) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and map** in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  34. arXiv:2402.00002  [pdf, other

    cs.NI

    Raptor Encoding for Low-Latency Concurrent Multi-PDU Session Transmission with Security Consideration in B5G Edge Network

    Authors: Zhongfu Guo, Xinsheng Ji, Wei You, Mingyan Xu, Yu Zhao, Zhimo Cheng, Deqiang Zhou

    Abstract: In B5G edge networks, end-to-end low-latency and high-reliability transmissions between edge computing nodes and terminal devices are essential. This paper investigates the queue-aware coding scheduling transmission of randomly arriving data packets, taking into account potential eavesdroppers in edge networks. To address these concerns, we introduce SCLER, a Protocol Data Units (PDU) Raptor-encod… ▽ More

    Submitted 4 October, 2023; originally announced February 2024.

  35. arXiv:2401.11395  [pdf, other

    cs.CV

    UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation

    Authors: Qingdong He, **long Peng, Zhengkai Jiang, Kai Wu, Xiaozhong Ji, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Mingang Chen, Yunsheng Wu

    Abstract: 3D open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space. However, existing works not only fail to fully utilize all the available modal information in the 3D domain but also lack sufficient granularity in representing the features of each modality. In this paper, we propose a unified multimodal 3D open-vocabulary scene understanding network,… ▽ More

    Submitted 20 April, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Accepted by IJCAI 2024

  36. arXiv:2401.06548  [pdf, other

    cs.CV

    Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning

    Authors: Chenyang Wang, Junjun Jiang, Xingyu Hu, Xianming Liu, Xiangyang Ji

    Abstract: Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks, where old data from experienced tasks is unavailable when learning from a new task. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. These methods usually adopt an extra memory to store the data for replay. However, it is not expected… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  37. arXiv:2401.02673  [pdf, other

    eess.AS cs.AI cs.SD

    A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

    Authors: Dongdi Zhao, Jianbo Ma, Lu Lu, **ke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

    Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  38. arXiv:2401.00151  [pdf, other

    cs.CV cs.CR

    CamPro: Camera-based Anti-Facial Recognition

    Authors: Wenjun Zhu, Yuan Sun, Jiani Liu, Yushi Cheng, Xiaoyu Ji, Wenyuan Xu

    Abstract: The proliferation of images captured from millions of cameras and the advancement of facial recognition (FR) technology have made the abuse of FR a severe privacy threat. Existing works typically rely on obfuscation, synthesis, or adversarial examples to modify faces in images to achieve anti-facial recognition (AFR). However, the unmodified images captured by camera modules that contain sensitive… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted by NDSS Symposium 2024

  39. arXiv:2401.00148  [pdf, other

    cs.CR cs.CV

    TPatch: A Triggered Physical Adversarial Patch

    Authors: Wenjun Zhu, Xiaoyu Ji, Yushi Cheng, Shibo Zhang, Wenyuan Xu

    Abstract: Autonomous vehicles increasingly utilize the vision-based perception module to acquire information about driving environments and detect obstacles. Correct detection and classification are important to ensure safe driving decisions. Existing works have demonstrated the feasibility of fooling the perception models such as object detectors and image classifiers with printed adversarial patches. Howe… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Appeared in 32nd USENIX Security Symposium (USENIX Security 23)

  40. arXiv:2312.17266  [pdf

    eess.IV cs.AI cs.CV cs.RO

    Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

    Authors: Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

    Abstract: Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  41. arXiv:2312.07841  [pdf, other

    cs.LG

    On the Dynamics Under the Unhinged Loss and Beyond

    Authors: Xiong Zhou, Xianming Liu, Hanzhang Wang, Deming Zhai, Junjun Jiang, Xiangyang Ji

    Abstract: Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathem… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 62 pages, 19 figures

  42. arXiv:2312.01841  [pdf, other

    cs.CV

    VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

    Authors: Xusen Sun, Longhao Zhang, Hao Zhu, Peng Zhang, Bang Zhang, Xinya Ji, Kangneng Zhou, Daiheng Gao, Liefeng Bo, Xun Cao

    Abstract: Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led or tied on all these metrics due to the one-to-many map** between audio and motion. In this paper, we propose VividTalk, a two-stage generic framework that sup… ▽ More

    Submitted 6 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 10 pages, 8 figures

  43. arXiv:2312.01468  [pdf, other

    cs.RO cs.AI

    Exploring Adversarial Robustness of LiDAR-Camera Fusion Model in Autonomous Driving

    Authors: Bo Yang, Xiaoyu Ji, Zizhi **, Yushi Cheng, Wenyuan Xu

    Abstract: Our study assesses the adversarial robustness of LiDAR-camera fusion models in 3D object detection. We introduce an attack technique that, by simply adding a limited number of physically constrained adversarial points above a car, can make the car undetectable by the fusion model. Experimental results reveal that even without changes to the image data channel, the fusion model can be deceived sole… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  44. arXiv:2311.14189  [pdf, other

    cs.CV

    D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

    Authors: Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

    Abstract: Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  45. Systematic Analysis of Security and Vulnerabilities in Miniapps

    Authors: Yuyang Han, Xu Ji, Zhiqiang Wang, Jianyi Zhang

    Abstract: The past few years have witnessed a boom of miniapps, as lightweight applications, miniapps are of great importance in the mobile internet sector. Consequently, the security of miniapps can directly impact compromising the integrity of sensitive data, posing a potential threat to user privacy. However, after a thorough review of the various research efforts in miniapp security, we found that their… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 9 pages

  46. arXiv:2311.11106  [pdf, other

    cs.CV

    ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

    Authors: Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

    Abstract: In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are t… ▽ More

    Submitted 11 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: CVPR2024

  47. arXiv:2311.08935  [pdf, other

    cs.LG cs.AI

    Supported Trust Region Optimization for Offline Reinforcement Learning

    Authors: Yixiu Mao, Hongchang Zhang, Chen Chen, Yi Xu, Xiangyang Ji

    Abstract: Offline reinforcement learning suffers from the out-of-distribution issue and extrapolation error. Most policy constraint methods regularize the density of the trained policy towards the behavior policy, which is too restrictive in most cases. We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the b… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted at ICML 2023

  48. arXiv:2311.04464  [pdf, other

    cs.CV

    Enhancing Few-shot CLIP with Semantic-Aware Fine-Tuning

    Authors: Yao Zhu, Yuefeng Chen, Wei Wang, Xiaofeng Mao, Xiu Yan, Yue Wang, Zhigang Li, Wang lu, **dong Wang, Xiangyang Ji

    Abstract: Learning generalized representations from limited training samples is crucial for applying deep neural networks in low-resource scenarios. Recently, methods based on Contrastive Language-Image Pre-training (CLIP) have exhibited promising performance in few-shot adaptation tasks. To avoid catastrophic forgetting and overfitting caused by few-shot fine-tuning, existing works usually freeze the param… ▽ More

    Submitted 6 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  49. arXiv:2310.13819  [pdf, other

    cs.RO

    LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly

    Authors: Bowen Fu, Sek Kun Leong, Yan Di, Jiwen Tang, Xiangyang Ji

    Abstract: Comprehending natural language instructions is a critical skill for robots to cooperate effectively with humans. In this paper, we aim to learn 6D poses for roboticassembly by natural language instructions. For this purpose, Language-Instructed 6D Pose Regression Network (LanPose) is proposed to jointly predict the 6D poses of the observed object and the corresponding assembly position. Our propos… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 8 pages

  50. arXiv:2310.11696  [pdf, other

    cs.CV

    MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

    Authors: Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

    Abstract: Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-vie… ▽ More

    Submitted 13 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: CVPR 2024