-
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Authors:
Zeyu Zhang,
Yiran Wang,
Biao Wu,
Shuo Chen,
Zhiyuan Zhang,
Shiya Huang,
Wenbo Zhang,
Meng Fang,
Ling Chen,
Yang Zhao
Abstract:
In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. A…
▽ More
In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. Additionally, while avatar and motion generation predominantly target humans, extending these techniques to animals remains a significant challenge due to inadequate training data and methods. To bridge these gaps, our paper presents three key contributions. Firstly, we proposed a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars with motions through text queries. The method significantly advanced the progress in dynamic 3D character generation. Secondly, we introduced a LLM planner that coordinates both motion and avatar generation, which transforms a discriminative planning into a customizable Q&A fashion. Lastly, we presented an animal motion dataset named Zoo-300K, comprising approximately 300,000 text-motion pairs across 65 animal categories and its building pipeline ZooGen, which serves as a valuable resource for the community. See project website https://steve-zeyu-zhang.github.io/MotionAvatar/
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Authors:
Weitao Feng,
Wenbo Zhou,
Jiyan He,
Jie Zhang,
Tianyi Wei,
Guanlin Li,
Tianwei Zhang,
Weiming Zhang,
Nenghai Yu
Abstract:
Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution a…
▽ More
Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution and unconsented commercial use. To address it, recent works aim to let SD models output watermarked content for post-hoc forensics. Unfortunately, none of them can achieve the challenging white-box protection, wherein the malicious user can easily remove or replace the watermarking module to fail the subsequent verification. For this, we propose \texttt{\method} as the first implementation under this scenario. Briefly, we merge watermark information into the U-Net of Stable Diffusion Models via a watermark Low-Rank Adaptation (LoRA) module in a two-stage manner. For watermark LoRA module, we devise a scaling matrix to achieve flexible message updates without retraining. To guarantee fidelity, we design Prior Preserving Fine-Tuning (PPFT) to ensure watermark learning with minimal impacts on model distribution, validated by proofs. Finally, we conduct extensive experiments and ablation studies to verify our design.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Authors:
Lingdong Kong,
Shaoyuan Xie,
Hanjiang Hu,
Yaru Niu,
Wei Tsang Ooi,
Benoit R. Cottereau,
Lai Xing Ng,
Yuexin Ma,
Wenwei Zhang,
Liang Pan,
Kai Chen,
Ziwei Liu,
Weichao Qiu,
Wei Zhang,
Xu Cao,
Hao Lu,
Ying-Cong Chen,
Caixin Kang,
Xinning Zhou,
Chengyang Ying,
Wentao Shang,
Xingxing Wei,
Yinpeng Dong,
Bo Yang,
Shengyin Jiang
, et al. (66 additional authors not shown)
Abstract:
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c…
▽ More
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Comprehensive Analysis of Access Control Models in Edge Computing: Challenges, Solutions, and Future Directions
Authors:
Tao Xue,
Ying Zhang,
Yanbin Wang,
Wenbo Wang,
Shuailou Li,
Haibin Zhang
Abstract:
Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computi…
▽ More
Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computing resources closer to the data sources, which reduces the latency and simultaneously alleviates the bandwidth pressure for the cloud and enhances data security. While edge computing offers significant advantages, it also presents significant challenges in access control -- a critical component for safeguarding data. For instance, it is crucial to implement access control mechanisms that are both effective and efficient on resource-constrained devices, ensuring high security without compromising the inherent low latency benefits of edge computing. These challenges drive the development of innovative access control solutions tailored to meet the unique requirements of edge computing environments. We classify related references from the perspectives of multiple data lifecycles (including data collection, storage, and usage), which thoroughly investigates the access control techniques and helps readers understand them systematically. Finally, we reflect on the classification and envisage future research directions.
△ Less
Submitted 22 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution
Authors:
Long Peng,
Yang Cao,
Ren**g Pei,
Wenbo Li,
Jiaming Guo,
Xueyang Fu,
Yang Wang,
Zheng-Jun Zha
Abstract:
Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifact…
▽ More
Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifacts introduced by degradation cues during the imaging process in real LR increase the disorder of the overall image details, further complicating the perception of intrinsic gradient arrangement. To address these challenges, we innovatively introduce kernel-wise differential operations within the convolutional kernel and develop several learnable directional gradient convolutions. These convolutions are integrated in parallel with a novel linear weighting mechanism to form an Adaptive Directional Gradient Convolution (DGConv), which adaptively weights and fuses the basic directional gradients to improve the gradient arrangement perception capability for both regular and irregular textures. Coupled with DGConv, we further devise a novel equivalent parameter fusion method for DGConv that maintains its rich representational capabilities while kee** computational costs consistent with a single Vanilla Convolution (VConv), enabling DGConv to improve the performance of existing super-resolution networks without incurring additional computational expenses. To better leverage the superiority of DGConv, we further develop an Adaptive Information Interaction Block (AIIBlock) to adeptly balance the enhancement of texture and contrast while meticulously investigating the interdependencies, culminating in the creation of a DGPNet for Real-SR through simple stacking. Comparative results with 15 SOTA methods across three public datasets underscore the effectiveness and efficiency of our proposed approach.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Mesh Denoising Transformer
Authors:
Wenbo Zhao,
Xianming Liu,
Deming Zhai,
Junjun Jiang,
Xiangyang Ji
Abstract:
Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted…
▽ More
Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted attributes of meshes, and a lack of effective global feature aggregation, hindering their ability to fully understand the mesh's comprehensive structure. To tackle these issues, we propose SurfaceFormer, a pioneering Transformer-based mesh denoising framework. Our first contribution is the development of a new representation known as Local Surface Descriptor, which is crafted by establishing polar systems on each mesh face, followed by sampling points from adjacent surfaces using geodesics. The normals of these points are organized into 2D patches, mimicking images to capture local geometric intricacies, whereas the poles and vertex coordinates are consolidated into a point cloud to embody spatial information. This advancement surmounts the hurdles posed by the irregular and non-Euclidean characteristics of mesh data, facilitating a smooth integration with Transformer architecture. Next, we propose a dual-stream structure consisting of a Geometric Encoder branch and a Spatial Encoder branch, which jointly encode local geometry details and spatial information to fully explore multimodal information for mesh denoising. A subsequent Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation through self-attention operators. Our experimental evaluations demonstrate that this novel approach outperforms existing state-of-the-art methods in both objective and subjective assessments, marking a significant leap forward in mesh denoising.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization
Authors:
Zheyan Qu,
Lu Yin,
Zitong Yu,
Wenbo Wang,
Xing zhang
Abstract:
Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in…
▽ More
Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in education fields of various courses. Given these challenges, we propose CourseGPT-zh, a course-oriented education LLM that supports customization and low-cost deployment. To address the comprehensiveness and diversity requirements of course-specific corpora, we design a high-quality question-answering corpus distillation framework incorporating prompt optimization, which effectively mines textbook knowledge and enhances its diversity. Moreover, considering the alignment of LLM responses with user needs, a novel method for discrete prompt optimization based on LLM-as-Judge is introduced. During optimization, this framework leverages the LLM's ability to reflect on and exploit error feedback and patterns, allowing for prompts that meet user needs and preferences while saving response length. Lastly, we obtain CourseGPT-zh based on the open-source LLM using parameter-efficient fine-tuning. Experimental results show that our discrete prompt optimization framework effectively improves the response quality of ChatGPT, and CourseGPT-zh exhibits strong professional capabilities in specialized knowledge question-answering, significantly outperforming comparable open-source models.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
New Interpretation for error propagation of data-driven Reynolds stress closures via global stability analysis
Authors:
Xianglin Shan,
Wenbo Cao,
Weiwei Zhang
Abstract:
In light of the challenges surrounding convergence and error propagation encountered in Reynolds-averaged Navier-Stokes (RANS) equations with data-driven Reynolds stress closures, researchers commonly attribute these issues to ill-conditioning through conditional number analysis. This paper delves into an additional factor, numerical instability, contributing to these challenges. We conduct global…
▽ More
In light of the challenges surrounding convergence and error propagation encountered in Reynolds-averaged Navier-Stokes (RANS) equations with data-driven Reynolds stress closures, researchers commonly attribute these issues to ill-conditioning through conditional number analysis. This paper delves into an additional factor, numerical instability, contributing to these challenges. We conduct global stability analysis for the RANS equations, closed by the Reynolds stress of direct numerical simulation (DNS), with the time-averaged solution of DNS as the base flow. Our findings reveal that, for turbulent channel flow at high Reynolds numbers, significant ill-conditioning exists, yet the system remains stable. Conversely, for separated flow over periodic hills, notable ill-conditioning is absent, but unstable eigenvalues are present, indicating that error propagation arises from the mechanism of numerical instability. Furthermore, the effectiveness of the decomposition method employing eddy viscosity is compared, results show that the spatial distribution and amplitude of eddy viscosity influences the numerical stability.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids
Authors:
Junchen Liu,
Wenbo Hu,
Zhuo Yang,
Jianteng Chen,
Guoliang Wang,
Xiaoxue Chen,
Yantong Cai,
Huan-ang Gao,
Hao Zhao
Abstract:
Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotrop…
▽ More
Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotropic areas, achieving high-fidelity anti-aliasing renderings. Central to our approach are two key components: Platonic Solid Projection and Ripmap encoding. The Platonic Solid Projection factorizes the 3D space onto the unparalleled faces of a certain Platonic solid, such that the anisotropic 3D areas can be projected onto planes with distinguishable characterization. Meanwhile, each face of the Platonic solid is encoded by the Ripmap encoding, which is constructed by anisotropically pre-filtering a learnable feature grid, to enable featurzing the projected anisotropic areas both precisely and efficiently by the anisotropic area-sampling. Extensive experiments on both well-established synthetic datasets and a newly captured real-world dataset demonstrate that our Rip-NeRF attains state-of-the-art rendering quality, particularly excelling in the fine details of repetitive structures and textures, while maintaining relatively swift training times.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
An analysis and solution of ill-conditioning in physics-informed neural networks
Authors:
Wenbo Cao,
Weiwei Zhang
Abstract:
Physics-informed neural networks (PINNs) have recently emerged as a novel and popular approach for solving forward and inverse problems involving partial differential equations (PDEs). However, achieving stable training and obtaining correct results remain a challenge in many cases, often attributed to the ill-conditioning of PINNs. Nonetheless, further analysis is still lacking, severely limiting…
▽ More
Physics-informed neural networks (PINNs) have recently emerged as a novel and popular approach for solving forward and inverse problems involving partial differential equations (PDEs). However, achieving stable training and obtaining correct results remain a challenge in many cases, often attributed to the ill-conditioning of PINNs. Nonetheless, further analysis is still lacking, severely limiting the progress and applications of PINNs in complex engineering problems. Drawing inspiration from the ill-conditioning analysis in traditional numerical methods, we establish a connection between the ill-conditioning of PINNs and the ill-conditioning of the Jacobian matrix of the PDE system. Specifically, for any given PDE system, we construct its controlled system. This controlled system allows for adjustment of the condition number of the Jacobian matrix while retaining the same solution as the original system. Our numerical findings suggest that the ill-conditioning observed in PINNs predominantly stems from the Jacobian matrix. As the condition number of the Jacobian matrix decreases, PINNs exhibit faster convergence rates and higher accuracy. Building upon this understanding and the natural extension of controlled systems, we present a general approach to mitigate the ill-conditioning of PINNs, leading to successful simulations of the three-dimensional flow around the M6 wing at a Reynolds number of 5,000. To the best of our knowledge, this is the first time that PINNs have been successful in simulating such complex systems, offering a promising new technique for addressing industrial complexity problems. Our findings also offer valuable insights guiding the future development of PINNs.
△ Less
Submitted 24 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Computational Electromagnetics Meets Spin Qubits: Controlling Noise Effects in Quantum Sensing and Computing
Authors:
Wenbo Sun,
Sathwik Bharadwaj,
Runwei Zhou,
Dan Jiao,
Zubin Jacob
Abstract:
Solid-state spin qubits have emerged as promising quantum information platforms but are susceptible to magnetic noise. Despite extensive efforts in controlling noise in spin qubit quantum applications, one important but less controlled noise source is near-field electromagnetic fluctuations. Low-frequency (MHz and GHz) electromagnetic fluctuations are significantly enhanced near nanostructured los…
▽ More
Solid-state spin qubits have emerged as promising quantum information platforms but are susceptible to magnetic noise. Despite extensive efforts in controlling noise in spin qubit quantum applications, one important but less controlled noise source is near-field electromagnetic fluctuations. Low-frequency (MHz and GHz) electromagnetic fluctuations are significantly enhanced near nanostructured lossy material components essential in quantum applications, including metallic/superconducting gates necessary for controlling spin qubits in quantum computing devices and materials/nanostructures to be probed in quantum sensing. Although controlling this low-frequency electromagnetic fluctuation noise is crucial for improving the performance of quantum sensing and computing, current efforts are hindered by computational challenges. In this paper, we leverage advanced computational electromagnetics techniques, especially fast and accurate volume integral equation based solvers, to overcome the computational obstacle. We introduce a theoretical and computational framework to control low-frequency magnetic fluctuation noise for enhancing spin qubit quantum sensing and computing performance. Our framework extends the application of computational electromagnetics to spin qubit quantum devices. We further apply our theoretical framework to control noise effects in realistic quantum computing devices and quantum sensing applications. Our work paves the way for device engineering to control magnetic fluctuations and improve the performance of spin qubit quantum sensing and computing.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
A Distributed Model Identification Algorithm for Multi-Agent Systems
Authors:
Vivek Khatana,
Chin-Yao Chang,
Wenbo Wang
Abstract:
In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantage…
▽ More
In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantages including a large reduction in the communication network's bandwidth requirements by minimizing the data exchanged at each iteration and enabling the model to adapt in real-time to disturbances. Furthermore, we extend our model identification process from linear frameworks to more complex non-linear convex models. This extension is validated through numerical studies demonstrating improved control performance for a synthetic IEEE test case.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations
Authors:
Wenbo Wang,
Hsuan-I Ho,
Chen Guo,
Boxiang Rong,
Artur Grigorev,
Jie Song,
Juan Jose Zarate,
Otmar Hilliges
Abstract:
The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS capture…
▽ More
The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS captures 64 outfits in 520 human motion sequences, amounting to 78k textured scans. Creating a real-world clothing dataset is challenging, particularly in annotating and segmenting the extensive and complex 4D human scans. To address this, we develop a semi-automatic 4D human parsing pipeline. We efficiently combine a human-in-the-loop process with automation to accurately label 4D scans in diverse garments and body movements. Leveraging precise annotations and high-quality garment meshes, we establish several benchmarks for clothing simulation and reconstruction. 4D-DRESS offers realistic and challenging data that complements synthetic sources, paving the way for advancements in research of lifelike human clothing. Website: https://ait.ethz.ch/4d-dress.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Authors:
Yongqi Zhao,
Wenbo Xiao,
Tomislav Mihalj,
Jia Hu,
Arno Eichberger
Abstract:
The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving sc…
▽ More
The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric thresholds, the framework efficiently searches for desired scenarios and converts them into ASAM OpenSCENARIO and IPG CarMaker text files. This methodology streamlines the scenario extraction process and enhances efficiency. Simulations are executed to validate the efficiency of the approach. The framework is presented based on a user-friendly web app and is accessible via the following link: https://github.com/ftgTUGraz/Chat2Scenario.
△ Less
Submitted 26 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Authors:
Kaining Ying,
Fanqing Meng,
** Wang,
Zhiqian Li,
Han Lin,
Yue Yang,
Hao Zhang,
Wenbo Zhang,
Yuqi Lin,
Shuo Liu,
Jiayi Lei,
Quanfeng Lu,
Runjian Chen,
Peng Xu,
Renrui Zhang,
Haozhe Zhang,
Peng Gao,
Yali Wang,
Yu Qiao,
** Luo,
Kaipeng Zhang,
Wenqi Shao
Abstract:
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to…
▽ More
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering $32$ core meta-tasks and $162$ subtasks in multimodal understanding. Due to its extensive task coverage, MMT-Bench enables the evaluation of LVLMs using a task map, facilitating the discovery of in- and out-of-domain tasks. Evaluation results involving $30$ LVLMs such as the proprietary GPT-4V, GeminiProVision, and open-sourced InternVL-Chat, underscore the significant challenges posed by MMT-Bench. We anticipate that MMT-Bench will inspire the community to develop next-generation multimodal foundation models aimed at achieving general-purpose multimodal intelligence.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
FL-TAC: Enhanced Fine-Tuning in Federated Learning via Low-Rank, Task-Specific Adapter Clustering
Authors:
Siqi **,
Yuzhu Mao,
Yang Liu,
Xiao-** Zhang,
Wenbo Ding
Abstract:
Although large-scale pre-trained models hold great potential for adapting to downstream tasks through fine-tuning, the performance of such fine-tuned models is often limited by the difficulty of collecting sufficient high-quality, task-specific data. Federated Learning (FL) offers a promising solution by enabling fine-tuning across large-scale clients with a variety of task data, but it is bottlen…
▽ More
Although large-scale pre-trained models hold great potential for adapting to downstream tasks through fine-tuning, the performance of such fine-tuned models is often limited by the difficulty of collecting sufficient high-quality, task-specific data. Federated Learning (FL) offers a promising solution by enabling fine-tuning across large-scale clients with a variety of task data, but it is bottlenecked by significant communication overhead due to the pre-trained models' extensive size. This paper addresses the high communication cost for fine-tuning large pre-trained models within FL frameworks through low-rank fine-tuning. Specifically, we train a low-rank adapter for each individual task on the client side, followed by server-side clustering for similar group of adapters to achieve task-specific aggregation. Extensive experiments on various language and vision tasks, such as GLUE and CIFAR-10/100, reveal the evolution of task-specific adapters throughout the FL training process and verify the effectiveness of the proposed low-rank task-specific adapter clustering (TAC) method.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Data-Driven Knowledge Transfer in Batch $Q^*$ Learning
Authors:
Elynn Chen,
Xi Chen,
Wenbo **g
Abstract:
In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of…
▽ More
In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications
Authors:
Wenbo Shang,
Xin Huang
Abstract:
A graph is a fundamental data model to represent various entities and their complex relationships in society and nature, such as social networks, transportation networks, financial networks, and biomedical systems. Recently, large language models (LLMs) have showcased a strong generalization ability to handle various NLP and multi-mode tasks to answer users' arbitrary questions and specific-domain…
▽ More
A graph is a fundamental data model to represent various entities and their complex relationships in society and nature, such as social networks, transportation networks, financial networks, and biomedical systems. Recently, large language models (LLMs) have showcased a strong generalization ability to handle various NLP and multi-mode tasks to answer users' arbitrary questions and specific-domain content generation. Compared with graph learning models, LLMs enjoy superior advantages in addressing the challenges of generalizing graph tasks by eliminating the need for training graph learning models and reducing the cost of manual annotation. In this survey, we conduct a comprehensive investigation of existing LLM studies on graph data, which summarizes the relevant graph analytics tasks solved by advanced LLM models and points out the existing remaining challenges and future directions. Specifically, we study the key problems of LLM-based generative graph analytics (LLM-GGA) with three categories: LLM-based graph query processing (LLM-GQP), LLM-based graph inference and learning (LLM-GIL), and graph-LLM-based applications. LLM-GQP focuses on an integration of graph analytics techniques and LLM prompts, including graph understanding and knowledge graph (KG) based augmented retrieval, while LLM-GIL focuses on learning and reasoning over graphs, including graph learning, graph-formed reasoning and graph representation. We summarize the useful prompts incorporated into LLM to handle different graph downstream tasks. Moreover, we give a summary of LLM model evaluation, benchmark datasets/tasks, and a deep pro and cons analysis of LLM models. We also explore open problems and future directions in this exciting interdisciplinary research area of LLMs and graph analytics.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Talk Too Much: Poisoning Large Language Models under Token Limit
Authors:
Jiaming He,
Wenbo Jiang,
Guanyu Hou,
Wenshu Fan,
Rui Zhang,
Hongwei Li
Abstract:
Mainstream poisoning attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of the trigger, we present a poisoning attac…
▽ More
Mainstream poisoning attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of the trigger, we present a poisoning attack against LLMs that is triggered by a generation/output condition-token limitation, which is a commonly adopted strategy by users for reducing costs. The poisoned model performs normally for output without token limitation, while becomes harmful for output with limited tokens. To achieve this objective, we introduce BrieFool, an efficient attack framework. It leverages the characteristics of generation limitation by efficient instruction sampling and poisoning data generation, thereby influencing the behavior of LLMs under target conditions. Our experiments demonstrate that BrieFool is effective across safety domains and knowledge domains. For instance, with only 20 generated poisoning examples against GPT-3.5-turbo, BrieFool achieves a 100% Attack Success Rate (ASR) and a 9.28/10 average Harmfulness Score (HS) under token limitation conditions while maintaining the benign performance.
△ Less
Submitted 11 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
Authors:
Haoyi Qiu,
Wenbo Hu,
Zi-Yi Dou,
Nanyun Peng
Abstract:
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucina…
▽ More
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucinations. Furthermore, current evaluation methods struggle to effectively address the subtle semantic distinctions between model outputs and reference data, as well as the balance between hallucination and informativeness. To address these issues, we introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases. Moreover, we propose a large language model (LLM)-based two-stage evaluation framework that generalizes the popular CHAIR metric and incorporates both faithfulness and coverage into the evaluation. Experiments on 10 established LVLMs demonstrate that our evaluation metric is more comprehensive and better correlated with humans than existing work when evaluating on our challenging human-annotated benchmark dataset. Our work also highlights the critical balance between faithfulness and coverage of model outputs, and encourages future works to address hallucinations in LVLMs while kee** their outputs informative.
△ Less
Submitted 5 June, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Angle-Resolved Magneto-Chiral Anisotropy in a Non-Centrosymmetric Atomic Layer Superlattice
Authors:
Long Cheng,
Mingrui Bao,
**gxian Zhang,
Xue Zhang,
Qun Yang,
Qiang Li,
Hui Cao,
Dawei Qiu,
Jia Liu,
Fei Ye,
Qing Wang,
Genhao Liang,
Hui Li,
Guanglei Cheng,
Hua Zhou,
Jian-Min Zuo,
Xiaodong Zhou,
Jian Shen,
Zhifeng Zhu,
Sai Mu,
Wenbo Wang,
Xiaofang Zhai
Abstract:
Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for develo** chiral materials and devices for electronic integration. Here we demonstrate an angle-…
▽ More
Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for develo** chiral materials and devices for electronic integration. Here we demonstrate an angle-resolved eMChE in an A-B-C-C type atomic-layer superlattice lacking time and space inversion symmetry. We observe non-superimposable enantiomers of left-handed and right-handed tilted uniaxial magnetic anisotropy as the sample rotates under static fields, with the tilting angle reaching a striking 45 degree. Magnetic force microscopy and atomistic simulations correlate the tilt to the emergence and evolution of chiral spin textures. The Dzyaloshinskii-Moriya interaction lock effect in competition with Zeeman effect is demonstrated to be responsible for the angle-resolved eMChE. Our findings open up a new horizon for engineering angle-resolved magneto-chiral anisotropy, shedding light on the development of novel angle-resolved sensing or writing techniques in chiral spintronics.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Authors:
Wenbo Zhang,
Yifan Zhang,
Jianfeng Lin,
Binqiang Huang,
**lu Zhang,
Wenhao Yu
Abstract:
Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks. However, most of them are only applicable to the English context. Subsequent research has focused on this problem and proposed improved models, such as CN-CLIP and AltCLIP, to facilitate their applicability to Chinese and even other languages. Nevertheless, these models suff…
▽ More
Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks. However, most of them are only applicable to the English context. Subsequent research has focused on this problem and proposed improved models, such as CN-CLIP and AltCLIP, to facilitate their applicability to Chinese and even other languages. Nevertheless, these models suffer from high latency and a large memory footprint in inference, which limits their further deployment on resource-constrained edge devices. In this work, we propose a conceptually simple yet effective multilingual CLIP Compression framework and train a lightweight multilingual vision-language model, called DC-CLIP, for both Chinese and English context. In this framework, we collect high-quality Chinese and English text-image pairs and design two training stages, including multilingual vision-language feature distillation and alignment. During the first stage, lightweight image/text student models are designed to learn robust visual/multilingual textual feature representation ability from corresponding teacher models, respectively. Subsequently, the multilingual vision-language alignment stage enables effective alignment of visual and multilingual textual features to further improve the model's multilingual performance. Comprehensive experiments in zero-shot image classification, conducted based on the ELEVATER benchmark, showcase that DC-CLIP achieves superior performance in the English context and competitive performance in the Chinese context, even with less training data, when compared to existing models of similar parameter magnitude. The evaluation demonstrates the effectiveness of our designed training mechanism.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Optimal design of ride-pooling as on-demand feeder services
Authors:
Wenbo Fan,
Weihua Gu,
Meng Xu
Abstract:
The technology-enabled ride-pooling (RP) is designed as an on-demand feeder service to connect remote areas to transit terminals (or activity centers). We propose the so-called ``hold-dispatch'' operation strategy, which imposes a target number of shared rides (termed the ride-pooling size) for each vehicle to enhance RP's transportation efficiency. Analytical models are formulated at the planning…
▽ More
The technology-enabled ride-pooling (RP) is designed as an on-demand feeder service to connect remote areas to transit terminals (or activity centers). We propose the so-called ``hold-dispatch'' operation strategy, which imposes a target number of shared rides (termed the ride-pooling size) for each vehicle to enhance RP's transportation efficiency. Analytical models are formulated at the planning level to estimate the costs of the RP operator and the patrons. Accordingly, the design problem is constructed to minimize the total system cost concerning the system layout (i.e., in terms of service zone partitioning), resource deployment (i.e., fleet size), and operational decision (i.e., ride-pooling size). The proposed models admit spatial heterogeneity arising from the non-uniformity of demand distributions and service locations, and can furnish heterogeneous designs. Closed-form formulas for the optimal zoning and fleet size are developed, which unveil fundamental insights regarding the impacts of key operating factors (e.g., demand density and distance to the terminal). Extensive numerical experiments demonstrate (i) the effectiveness of heterogeneous service designs and (ii) the advantage of the proposed RP service with hold-dispatch strategy over alternative designs studied in the literature, i.e., RP with a ``quick-dispatch'' strategy and flexible-route transit, in a wide range of operating scenarios. These findings can assist transportation network companies and transit agencies in successfully integrating RP and transit services.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms
Authors:
Ting Dong,
Haoxin Wang,
Hengxi Zhang,
Wenbo Ding
Abstract:
When addressing the challenge of complex multi-objective optimization problems, particularly those with non-convex and non-uniform Pareto fronts, Decomposition-based Multi-Objective Evolutionary Algorithms (MOEADs) often converge to local optima, thereby limiting solution diversity. Despite its significance, this issue has received limited theoretical exploration. Through a comprehensive geometric…
▽ More
When addressing the challenge of complex multi-objective optimization problems, particularly those with non-convex and non-uniform Pareto fronts, Decomposition-based Multi-Objective Evolutionary Algorithms (MOEADs) often converge to local optima, thereby limiting solution diversity. Despite its significance, this issue has received limited theoretical exploration. Through a comprehensive geometric analysis, we identify that the traditional method of Reference Point (RP) selection fundamentally contributes to this challenge. In response, we introduce an innovative RP selection strategy, the Weight Vector-Guided and Gaussian-Hybrid method, designed to overcome the local optima issue. This approach employs a novel RP type that aligns with weight vector directions and integrates a Gaussian distribution to combine three distinct RP categories. Our research comprises two main experimental components: an ablation study involving 14 algorithms within the MOEADs framework, spanning from 2014 to 2022, to validate our theoretical framework, and a series of empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives. Results demonstrate that our method achieves remarkable improvements in both population diversity and convergence.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
On testing mean of high dimensional compositional data
Authors:
Qianqian Jiang,
Wenbo Li,
Zeng Li
Abstract:
We investigate one/two-sample mean tests for high-dimensional compositional data when the number of variables is comparable with the sample size, as commonly encountered in microbiome research. Existing methods mainly focus on max-type test statistics which are suitable for detecting sparse signals. However, in this paper, we introduce a novel approach using sum-type test statistics which are capa…
▽ More
We investigate one/two-sample mean tests for high-dimensional compositional data when the number of variables is comparable with the sample size, as commonly encountered in microbiome research. Existing methods mainly focus on max-type test statistics which are suitable for detecting sparse signals. However, in this paper, we introduce a novel approach using sum-type test statistics which are capable of detecting weak but dense signals. By establishing the asymptotic independence between the max-type and sum-type test statistics, we further propose a combined max-sum type test to cover both cases. We derived the asymptotic null distributions and power functions for these test statistics. Simulation studies demonstrate the superiority of our max-sum type test statistics which exhibit robust performance regardless of data sparsity.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Joint transitivity for linear iterates
Authors:
Sebastián Donoso,
Andreas Koutsogiannis,
Wenbo Sun
Abstract:
We establish sufficient and necessary conditions for the joint transitivity of linear iterates in a minimal topological dynamical system with commuting transformations. This result provides the first topological analogue of the classical Berend and Bergelson joint ergodicity criterion in measure-preserving systems.
We establish sufficient and necessary conditions for the joint transitivity of linear iterates in a minimal topological dynamical system with commuting transformations. This result provides the first topological analogue of the classical Berend and Bergelson joint ergodicity criterion in measure-preserving systems.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Discovery of universal phonon thermal Hall effect in crystals
Authors:
Xiaobo **,
Xu Zhang,
Wenbo Wan,
Hanru Wang,
Yihan Jiao,
Shiyan Li
Abstract:
Thermal Hall effect (THE) in insulator is a remarkable phenomenon that arises from the motion of chargeless quasi-particles under a magnetic field. While magnons or exotic spin excitations were considered as the origin of THE in some magnetic materials, there are more and more evidences suggesting that phonons play a significant role. However, the mechanism behind phonon THE is still unknown. Here…
▽ More
Thermal Hall effect (THE) in insulator is a remarkable phenomenon that arises from the motion of chargeless quasi-particles under a magnetic field. While magnons or exotic spin excitations were considered as the origin of THE in some magnetic materials, there are more and more evidences suggesting that phonons play a significant role. However, the mechanism behind phonon THE is still unknown. Here we report the observation of THE, including planar THE, in a broad range of non-magnetic insulators and semiconductors: SrTiO3, SiO2 (quartz), MgO, MgAl2O4, Si and Ge. While the presence of antiferrodistortive domains in SrTiO3 and chiral phonons in SiO2 may complicate the interpretation of THE, the striking observations of THE in trivial insulators MgO and MgAl2O4, as well as in high-purity intrinsic semiconductors Si and Ge, demonstrate that phonon THE is a universal property of crystals. Without other effects on phonons such as from magnons, this universal phonon THE is characterized by a scaling law of |\k{appa}_xy| ~ \k{appa}_xx^2. Our results experimentally discover a fundamental physics of phonons in magnetic field, which should come from the direct coupling between atom vibrations and the field. Starting from this universal phonon THE in crystals, all previous interpretations of THE in magnetic or non-magnetic materials need to be reconsidered.
△ Less
Submitted 2 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Ground-to-UAV sub-Terahertz channel measurement and modeling
Authors:
Da Li,
Peian Li,
Jiabiao Zhao,
Jianjian Liang,
Jiacheng Liu,
Guohao Liu,
Yuanshuai Lei,
Wenbo Liu,
Jianqin Deng,
Fuyong Liu,
Jianjun Ma
Abstract:
Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless…
▽ More
Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless channels leveraging UAVs remain under explored. This work delves into a ground-to-UAV channel at 140 GHz, with a specific focus on the influence of UAV hovering behavior on channel performance. Employing experimental measurements through an unmodulated channel setup and a geometry-based stochastic model (GBSM) that integrates three-dimensional positional coordinates and beamwidth, this work evaluates the impact of UAV dynamic movements and antenna orientation on channel performance. Our findings highlight the minimal impact of UAV orientation adjustments on channel performance and underscore the diminishing necessity for precise alignment between UAVs and ground stations as beamwidth increases.
△ Less
Submitted 28 June, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models
Authors:
Keyan Guo,
Ayush Utkarsh,
Wenbo Ding,
Isabelle Ondracek,
Ziming Zhao,
Guo Freeman,
Nishant Vishwamitra,
Hongxin Hu
Abstract:
Online user-generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promoti…
▽ More
Online user-generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Naive Bayes-based Context Extension for Large Language Models
Authors:
Jianlin Su,
Murtadha Ahmed,
Wenbo,
Luo Ao,
Mingren Zhu,
Yunfeng Liu
Abstract:
Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bay…
▽ More
Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bayes-based Context Extension (NBCE), to enable existing LLMs to perform ICL with an increased number of demonstrations by significantly expanding their context size. Importantly, this expansion does not require fine-tuning or dependence on particular model architectures, all the while preserving linear efficiency. NBCE initially splits the context into equal-sized windows fitting the target LLM's maximum length. Then, it introduces a voting mechanism to select the most relevant window, regarded as the posterior context. Finally, it employs Bayes' theorem to generate the test task. Our experimental results demonstrate that NBCE substantially enhances performance, particularly as the number of demonstration examples increases, consistently outperforming alternative methods. The NBCE code will be made publicly accessible. The code NBCE is available at: https://github.com/amurtadha/NBCE-master
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Active Admittance Control with Iterative Learning for General-Purpose Contact-Rich Manipulation
Authors:
Bo Zhou,
Yuyao Sun,
Wenbo Liu,
Ruixuan Jiao,
Fang Fang,
Shihua Li
Abstract:
Force interaction is inevitable when robots face multiple operation scenarios. How to make the robot competent in force control for generalized operations such as multi-tasks still remains a challenging problem. Aiming at the reproducibility of interaction tasks and the lack of a generalized force control framework for multi-task scenarios, this paper proposes a novel hybrid control framework base…
▽ More
Force interaction is inevitable when robots face multiple operation scenarios. How to make the robot competent in force control for generalized operations such as multi-tasks still remains a challenging problem. Aiming at the reproducibility of interaction tasks and the lack of a generalized force control framework for multi-task scenarios, this paper proposes a novel hybrid control framework based on active admittance control with iterative learning parameters-tunning mechanism. The method adopts admittance control as the underlying algorithm to ensure flexibility, and iterative learning as the high-level algorithm to regulate the parameters of the admittance model. The whole algorithm has flexibility and learning ability, which is capable of achieving the goal of excellent versatility. Four representative interactive robot manipulation tasks are chosen to investigate the consistency and generalisability of the proposed method. Experiments are designed to verify the effectiveness of the whole framework, and an average of 98.21% and 91.52% improvement of RMSE is obtained relative to the traditional admittance control as well as the model-free adaptive control, respectively.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Authors:
Haoyuan Wang,
Wenbo Hu,
Lei Zhu,
Rynson W. H. Lau
Abstract:
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro…
▽ More
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D environmental map, which assumes infinite lights only. Observing the superiority of NeRFs in recovering radiance fields, we propose a novel 5D Neural Plenoptic Function (NeP) based on NeRFs and ray tracing, such that more accurate lighting-object interactions can be formulated via the rendering equation. We also design a material-aware cone sampling strategy to efficiently integrate lights inside the BRDF lobes with the help of pre-filtered radiance fields. Our method has two stages: the geometry of the target object and the pre-filtered environmental radiance fields are reconstructed in the first stage, and materials of the target object are estimated in the second stage with the proposed NeP and material-aware cone sampling strategy. Extensive experiments on the proposed real-world and synthetic datasets demonstrate that our method can reconstruct high-fidelity geometry/materials of challenging glossy objects with complex lighting interactions from nearby objects. Project webpage: https://whyy.site/paper/nep
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Authors:
Zheng Zhang,
Wenbo Hu,
Yixing Lao,
Tong He,
Hengshuang Zhao
Abstract:
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the avera…
▽ More
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Authors:
Zhihao Liang,
Qi Zhang,
Wenbo Hu,
Ying Feng,
Lei Zhu,
Kui Jia
Abstract:
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single po…
▽ More
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity.
△ Less
Submitted 3 April, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Authors:
Tian-Xing Xu,
Wenbo Hu,
Yu-Kun Lai,
Ying Shan,
Song-Hai Zhang
Abstract:
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swap**. To address this issue, we propose a novel approach, namel…
▽ More
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swap**. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture map** module, which consists of a UV map** MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, e.g. a single RTX 2080 Ti GPU.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Authors:
Jiaqing Zhang,
Mingxiang Cao,
Xue Yang,
Weiying Xie,
Jie Lei,
Daixun Li,
Wenbo Huang,
Yunsong Li
Abstract:
Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high perfor…
▽ More
Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high performance with a single training phase. It employs synchronous joint optimization across components to avoid suboptimal solutions tied to individual tasks. Furthermore, it implements a comprehensive optimization strategy in the gradient matrix for shared parameters, ensuring convergence to an optimal fusion detection configuration. Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities, showcasing not only visually appealing image fusion but also impressive detection outcomes, such as a 3.9% and 2.0% mAP50 increase on horizontal object detection dataset M3FD and oriented object detection dataset DroneVehicle, respectively, compared to state-of-the-art approaches. The code is released at https://github.com/icey-zhang/E2E-MFD.
△ Less
Submitted 23 May, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Search for cosmic-ray boosted sub-MeV dark matter-electron scatterings in PandaX-4T
Authors:
Xiaofeng Shang,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Lisheng Geng,
Karl Giboni,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Junting Huang,
Zhou Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji,
Yonglin Ju,
Chenxiang Li
, et al. (67 additional authors not shown)
Abstract:
We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we…
▽ More
We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we set new constraints on DM-electron scattering cross sections for DM masses ranging from 10~eV/$c^2$ to 3~keV/$c^2$.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot
Authors:
Wenbo Zhao,
Shengjie Wang,
Yixuan Fan,
Yang Gao,
Tao Zhang
Abstract:
Space robots have played a critical role in autonomous maintenance and space junk removal. Multi-arm space robots can efficiently complete the target capture and base reorientation tasks due to their flexibility and the collaborative capabilities between the arms. However, the complex coupling properties arising from both the multiple arms and the free-floating base present challenges to the motio…
▽ More
Space robots have played a critical role in autonomous maintenance and space junk removal. Multi-arm space robots can efficiently complete the target capture and base reorientation tasks due to their flexibility and the collaborative capabilities between the arms. However, the complex coupling properties arising from both the multiple arms and the free-floating base present challenges to the motion planning problems of multi-arm space robots. We observe that the octopus elegantly achieves similar goals when grabbing prey and esca** from danger. Inspired by the distributed control of octopuses' limbs, we develop a multi-level decentralized motion planning framework to manage the movement of different arms of space robots. This motion planning framework integrates naturally with the multi-agent reinforcement learning (MARL) paradigm. The results indicate that our method outperforms the previous method (centralized training). Leveraging the flexibility of the decentralized framework, we reassemble policies trained for different tasks, enabling the space robot to complete trajectory planning tasks while adjusting the base attitude without further learning. Furthermore, our experiments confirm the superior robustness of our method in the face of external disturbances, changing base masses, and even the failure of one arm.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Error-Mitigated Quantum Random Access Memory
Authors:
Wenbo Shi,
Neel Kanth Kundu,
Matthew R. McKay,
Robert Malaney
Abstract:
As an alternative to quantum error correction, quantum error mitigation methods, including Zero-Noise Extrapolation (ZNE), have been proposed to alleviate run-time errors in current noisy quantum devices. In this work, we propose a modified version of ZNE that provides for a significant performance enhancement on current noisy devices. Our modified ZNE method extrapolates to zero-noise data by eva…
▽ More
As an alternative to quantum error correction, quantum error mitigation methods, including Zero-Noise Extrapolation (ZNE), have been proposed to alleviate run-time errors in current noisy quantum devices. In this work, we propose a modified version of ZNE that provides for a significant performance enhancement on current noisy devices. Our modified ZNE method extrapolates to zero-noise data by evaluating groups of noisy data obtained from noise-scaled circuits and selecting extrapolation functions for each group with the assistance of estimated noisy simulation results. To quantify enhancement in a real-world quantum application, we embed our modified ZNE in Quantum Random Access Memory (QRAM) - a memory system important for future quantum networks and computers. Our new ZNE-enhanced QRAM designs are experimentally implemented on a 27-qubit noisy superconducting quantum device, the results of which demonstrate that with reasonable estimated simulation results, QRAM fidelity is improved significantly relative to traditional ZNE usage. Our results demonstrate the critical role the extrapolation function plays in ZNE - judicious choice of that function on a per-measurement basis can make the difference between a quantum application being functional or non-functional.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Detecting Neutrinos from Supernova Bursts in PandaX-4T
Authors:
Binyu Pang,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Changbo Fu,
Mengting Fu,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Yanlin Huang,
Junting Huang,
Zhou Huang,
Ruquan Hou
, et al. (71 additional authors not shown)
Abstract:
Neutrinos from core-collapse supernovae are essential for the understanding of neutrino physics and stellar evolution. The dual-phase xenon dark matter detectors can provide a way to track explosions of galactic supernovae by detecting neutrinos through coherent elastic neutrino-nucleus scatterings. In this study, a variation of progenitor masses as well as explosion models are assumed to predict…
▽ More
Neutrinos from core-collapse supernovae are essential for the understanding of neutrino physics and stellar evolution. The dual-phase xenon dark matter detectors can provide a way to track explosions of galactic supernovae by detecting neutrinos through coherent elastic neutrino-nucleus scatterings. In this study, a variation of progenitor masses as well as explosion models are assumed to predict the neutrino fluxes and spectra, which result in the number of expected neutrino events ranging from 6.6 to 13.7 at a distance of 10 kpc over a 10-second duration with negligible backgrounds at PandaX-4T. Two specialized triggering alarms for monitoring supernova burst neutrinos are built. The efficiency of detecting supernova explosions at various distances in the Milky Way is estimated. These alarms will be implemented in the real-time supernova monitoring system at PandaX-4T in the near future, providing the astronomical communities with supernova early warnings.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Reframe Anything: LLM Agent for Open World Video Reframing
Authors:
Jiawang Cao,
Yongliang Wu,
Weiheng Chi,
Wenbo Zhu,
Ziyue Su,
Jay Wu
Abstract:
The proliferation of mobile devices and social media has revolutionized content dissemination, with short-form video becoming increasingly prevalent. This shift has introduced the challenge of video reframing to fit various screen aspect ratios, a process that highlights the most compelling parts of a video. Traditionally, video reframing is a manual, time-consuming task requiring professional exp…
▽ More
The proliferation of mobile devices and social media has revolutionized content dissemination, with short-form video becoming increasingly prevalent. This shift has introduced the challenge of video reframing to fit various screen aspect ratios, a process that highlights the most compelling parts of a video. Traditionally, video reframing is a manual, time-consuming task requiring professional expertise, which incurs high production costs. A potential solution is to adopt some machine learning models, such as video salient object detection, to automate the process. However, these methods often lack generalizability due to their reliance on specific training data. The advent of powerful large language models (LLMs) open new avenues for AI capabilities. Building on this, we introduce Reframe Any Video Agent (RAVA), a LLM-based agent that leverages visual foundation models and human instructions to restructure visual content for video reframing. RAVA operates in three stages: perception, where it interprets user instructions and video content; planning, where it determines aspect ratios and reframing strategies; and execution, where it invokes the editing tools to produce the final video. Our experiments validate the effectiveness of RAVA in video salient object detection and real-world reframing tasks, demonstrating its potential as a tool for AI-powered video editing.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
REPS: Reconstruction-based Point Cloud Sampling
Authors:
Guoqing Zhang,
Wenbo Zhao,
Jian Liu,
Xianming Liu
Abstract:
Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point c…
▽ More
Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point clouds using networks, whereas score-based methods assess the importance of points according to specific rules and then select sampled point clouds based on their scores. However, these methods often result in noticeable clustering effects in high-intensity feature areas, compromising their ability to preserve small-scale features and leading to the loss of some structures, thereby affecting the performance of subsequent tasks. In this paper, we propose REPS, a reconstruction-based scoring strategy that evaluates the importance of each vertex by removing and reconstructing them using surrounding vertices. Our reconstruction process comprises point reconstruction and shape reconstruction. The two aforementioned reconstruction methods effectively evaluate the importance of vertices by removing them at different scales for reconstruction. These reconstructions ensure that our method maintains the overall geometric features of the point cloud and avoids disturbing small-scale structures during sampling. Additionally, we propose the Global-Local Fusion Attention (GLFA) module, which aggregates local and global attention features of point clouds, ensuring high-quality reconstruction and sampling effects. Our method outperforms previous approaches in preserving the structural features of the sampled point clouds. Furthermore, abundant experimental results demonstrate the superior performance of our method across various common tasks.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Signal Response Model in PandaX-4T
Authors:
Yunyang Luo,
Zihao Bo,
Shibo Zhang,
Abdusalam Abdukerim,
Chen Cheng,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Changbo Fu,
Mengting Fu,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Yanlin Huang,
Zhou Huang
, et al. (66 additional authors not shown)
Abstract:
PandaX-4T experiment is a deep-underground dark matter direct search experiment that employs a dual-phase time projection chamber with a sensitive volume containing 3.7 tonne of liquid xenon. The detector of PandaX-4T is capable of simultaneously collecting the primary scintillation and ionization signals, utilizing their ratio to discriminate dark matter signals from background sources such as ga…
▽ More
PandaX-4T experiment is a deep-underground dark matter direct search experiment that employs a dual-phase time projection chamber with a sensitive volume containing 3.7 tonne of liquid xenon. The detector of PandaX-4T is capable of simultaneously collecting the primary scintillation and ionization signals, utilizing their ratio to discriminate dark matter signals from background sources such as gamma rays and beta particles. The signal response model plays a crucial role in interpreting the data obtained by PandaX-4T. It describes the conversion from the deposited energy by dark matter interactions to the detectable signals within the detector. The signal response model is utilized in various PandaX-4T results. This work provides a comprehensive description of the procedures involved in constructing and parameter-fitting the signal response model for the energy range of approximately 1 keV to 25 keV for electronic recoils and 6 keV to 90 keV for nuclear recoils. It also covers the signal reconstruction, selection, and correction methods, which are crucial components integrated into the signal response model.
△ Less
Submitted 14 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
Authors:
Cheng Huang,
Shoudong Han,
Mengyu He,
Wenbo Zheng,
Yuhao Wei
Abstract:
Accurate data association is crucial in reducing confusion, such as ID switches and assignment errors, in multi-object tracking (MOT). However, existing advanced methods often overlook the diversity among trajectories and the ambiguity and conflicts present in motion and appearance cues, leading to confusion among detections, trajectories, and associations when performing simple global data associ…
▽ More
Accurate data association is crucial in reducing confusion, such as ID switches and assignment errors, in multi-object tracking (MOT). However, existing advanced methods often overlook the diversity among trajectories and the ambiguity and conflicts present in motion and appearance cues, leading to confusion among detections, trajectories, and associations when performing simple global data association. To address this issue, we propose a simple, versatile, and highly interpretable data association approach called Decomposed Data Association (DDA). DDA decomposes the traditional association problem into multiple sub-problems using a series of non-learning-based modules and selectively addresses the confusion in each sub-problem by incorporating targeted exploitation of new cues. Additionally, we introduce Occlusion-aware Non-Maximum Suppression (ONMS) to retain more occluded detections, thereby increasing opportunities for association with trajectories and indirectly reducing the confusion caused by missed detections. Finally, based on DDA and ONMS, we design a powerful multi-object tracker named DeconfuseTrack, specifically focused on resolving confusion in MOT. Extensive experiments conducted on the MOT17 and MOT20 datasets demonstrate that our proposed DDA and ONMS significantly enhance the performance of several popular trackers. Moreover, DeconfuseTrack achieves state-of-the-art performance on the MOT17 and MOT20 test sets, significantly outperforms the baseline tracker ByteTrack in metrics such as HOTA, IDF1, AssA. This validates that our tracking design effectively reduces confusion caused by simple global association.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Authors:
Haoyu Chen,
Wenbo Li,
**** Gu,
**g**g Ren,
Haoze Sun,
Xueyi Zou,
Zhensong Zhang,
Youliang Yan,
Lei Zhu
Abstract:
For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L…
▽ More
For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Uncertainty-Aware Prediction and Application in Planning for Autonomous Driving: Definitions, Methods, and Comparison
Authors:
Wenbo Shao,
Jiahui Xu,
Zhong Cao,
Hong Wang,
Jun Li
Abstract:
Autonomous driving systems face the formidable challenge of navigating intricate and dynamic environments with uncertainty. This study presents a unified prediction and planning framework that concurrently models short-term aleatoric uncertainty (SAU), long-term aleatoric uncertainty (LAU), and epistemic uncertainty (EU) to predict and establish a robust foundation for planning in dynamic contexts…
▽ More
Autonomous driving systems face the formidable challenge of navigating intricate and dynamic environments with uncertainty. This study presents a unified prediction and planning framework that concurrently models short-term aleatoric uncertainty (SAU), long-term aleatoric uncertainty (LAU), and epistemic uncertainty (EU) to predict and establish a robust foundation for planning in dynamic contexts. The framework uses Gaussian mixture models and deep ensemble methods, to concurrently capture and assess SAU, LAU, and EU, where traditional methods do not integrate these uncertainties simultaneously. Additionally, uncertainty-aware planning is introduced, considering various uncertainties. The study's contributions include comparisons of uncertainty estimations, risk modeling, and planning methods in comparison to existing approaches. The proposed methods were rigorously evaluated using the CommonRoad benchmark and settings with limited perception. These experiments illuminated the advantages and roles of different uncertainty factors in autonomous driving processes. In addition, comparative assessments of various uncertainty modeling strategies underscore the benefits of modeling multiple types of uncertainties, thus enhancing planning accuracy and reliability. The proposed framework facilitates the development of methods for UAP and surpasses existing uncertainty-aware risk models, particularly when considering diverse traffic scenarios. Project page: https://swb19.github.io/UAP/.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
A Novel Shortest Path Query Algorithm Based on Optimized Adaptive Topology Structure
Authors:
Xiao Fang,
Xuyang Song,
Jiyuan Ma,
Guanhua Liu,
Shurong Pang,
Wenbo Zhao,
Cong Cao,
Ling Fan
Abstract:
Urban rail transit is a fundamental component of public transportation, however, commonly station-based path search algorithms often overlook the impact of transfer times on search results, leading to decreased accuracy. To solve this problem, this paper proposes a novel shortest path query algorithm based on adaptive topology optimization called the Adaptive Topology Extension Road Network Struct…
▽ More
Urban rail transit is a fundamental component of public transportation, however, commonly station-based path search algorithms often overlook the impact of transfer times on search results, leading to decreased accuracy. To solve this problem, this paper proposes a novel shortest path query algorithm based on adaptive topology optimization called the Adaptive Topology Extension Road Network Structure (ATEN). This algorithm categorizes transfer stations into different types and treats travel time and transfer time equivalently as weights for edges in the topological graph. The proposed algorithm introduces virtual stations to differentiate between pedestrian paths and train paths, eliminating the need for additional operations on transfer stations. The algorithm controls the extent of expansion in the urban rail transit topology, overcoming query errors caused by mishandling of transfer stations in the existing algorithm. Finally, a series of simulation experiments were conducted on Bei**g's urban rail transit network to validate both correctness and efficiency of the proposed adaptive topology optimization algorithm. The results demonstrate significant advantages compared to existing similar algorithms.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
3D Hand Reconstruction via Aggregating Intra and Inter Graphs Guided by Prior Knowledge for Hand-Object Interaction Scenario
Authors:
Feng Shuang,
Wenbo He,
Shaodong Li
Abstract:
Recently, 3D hand reconstruction has gained more attention in human-computer cooperation, especially for hand-object interaction scenario. However, it still remains huge challenge due to severe hand-occlusion caused by interaction, which contain the balance of accuracy and physical plausibility, highly nonlinear map** of model parameters and occlusion feature enhancement. To overcome these issue…
▽ More
Recently, 3D hand reconstruction has gained more attention in human-computer cooperation, especially for hand-object interaction scenario. However, it still remains huge challenge due to severe hand-occlusion caused by interaction, which contain the balance of accuracy and physical plausibility, highly nonlinear map** of model parameters and occlusion feature enhancement. To overcome these issues, we propose a 3D hand reconstruction network combining the benefits of model-based and model-free approaches to balance accuracy and physical plausibility for hand-object interaction scenario. Firstly, we present a novel MANO pose parameters regression module from 2D joints directly, which avoids the process of highly nonlinear map** from abstract image feature and no longer depends on accurate 3D joints. Moreover, we further propose a vertex-joint mutual graph-attention model guided by MANO to jointly refine hand meshes and joints, which model the dependencies of vertex-vertex and joint-joint and capture the correlation of vertex-joint for aggregating intra-graph and inter-graph node features respectively. The experimental results demonstrate that our method achieves a competitive performance on recently benchmark datasets HO3DV2 and Dex-YCB, and outperforms all only model-base approaches and model-free approaches.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Eavesdrop** risk evaluation for wavy-surface-assisted terahertz channel in emulated rain
Authors:
Peian Li,
Wenbo Liu,
Da Li,
Mingxia Zhang,
Xiaopeng Wang,
Houjun Sun,
Jianjun Ma
Abstract:
The advancement of non-line-of-sight (NLOS) data transmission through reflective methods plays a pivotal role in enhancing communication efficiency and expanding user reach. However, this innovation introduces significant eavesdrop** risks, particularly magnified by the complex scattering effects encountered under adverse weather conditions. This study delves into the assessment of eavesdrop**…
▽ More
The advancement of non-line-of-sight (NLOS) data transmission through reflective methods plays a pivotal role in enhancing communication efficiency and expanding user reach. However, this innovation introduces significant eavesdrop** risks, particularly magnified by the complex scattering effects encountered under adverse weather conditions. This study delves into the assessment of eavesdrop** vulnerabilities within a metallic wavy-surface-assisted NLOS terahertz (THz) channel, emphasizing the dynamics of bistatic scattering during rainfall. Our observations reveal the feasibility of successful signal interception under these conditions, highlighting a prevalent security concern for outdoor terahertz communication networks utilizing NLOS channels to broaden coverage. This insight underscores the critical need for addressing and mitigating potential eavesdrop** threats to ensure secure and reliable terahertz communications in varied environmental conditions.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Decentralized Uncoded Storage Elastic Computing with Heterogeneous Computation Speeds
Authors:
Wenbo Huang,
Xudong You,
Kai Wan,
Robert Caiming Qiu,
Mingyue Ji
Abstract:
Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in the computation. In 2018, Yang et al. introduced Coded Storage Elastic Computing (CSEC) to address the elasticity using coding technology, with lower storage and…
▽ More
Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in the computation. In 2018, Yang et al. introduced Coded Storage Elastic Computing (CSEC) to address the elasticity using coding technology, with lower storage and computation load requirements. However, CSEC is limited to certain types of computations (e.g., linear) due to the coded data storage based on linear coding. Then Centralized Uncoded Storage Elastic Computing (CUSEC) with heterogeneous computation speeds was proposed, which directly copies parts of data into the virtual machines. In all existing works in elastic computing, the storage assignment is centralized, meaning that the number and identity of all virtual machines possible used in the whole computation process are known during the storage assignment. In this paper, we consider Decentralized Uncoded Storage Elastic Computing (DUSEC) with heterogeneous computation speeds, where any available virtual machine can join the computation which is not predicted and thus coordination among different virtual machines' storage assignments is not allowed. Under a decentralized storage assignment originally proposed in coded caching by Maddah-Ali and Niesen, we propose a computing scheme with closed-form optimal computation time. We also run experiments over MNIST dataset with Softmax regression model through the Tencent cloud platform, and the experiment results demonstrate that the proposed DUSEC system approaches the state-of-art best storage assignment in the CUSEC system in computation time.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.