Search | arXiv e-print repository

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

Authors: Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e… ▽ More This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: HF Demo: hf.co/Stable-X, Video: https://www.youtube.com/watch?v=sylXTxG_U2U

arXiv:2406.14927 [pdf, other]

Gaussian-Informed Continuum for Physical Property Identification and Simulation

Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages, 8 figures

arXiv:2405.15176 [pdf, other]

MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method

Authors: Pan Liao, Feng Yang, Di Wu, Liu Bo

Abstract: Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hyb… ▽ More Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hybrid visual encoder, enhancement of depth prediction mechanisms, and introduction of an innovative query generation strategy, augmented by an advanced depth predictor. Building on MonoDETR, MonoDETRNext introduces two variants: MonoDETRNext-F, which emphasizes speed, and MonoDETRNext-A, which focuses on precision. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 4.60% improvement in the AP3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-F showed a 2.21% increase. Additionally, the computational efficiency of MonoDETRNext-F slightly exceeds that of its predecessor. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14641 [pdf, other]

doi 10.1103/PhysRevResearch.6.023199

Suppression of the skyrmion Hall effect in synthetic ferrimagnets with gradient magnetization

Authors: Lan Bo, Xichao Zhang, Masahito Mochizuki, Xuefeng Zhang

Abstract: Magnetic skyrmions are promising building blocks for future spintronic devices. However, the skyrmion Hall effect (SkHE) remains an obstacle for practical applications based on the in-line transport of skyrmions. Here, we numerically study the static properties and current-driven dynamics of synthetic ferrimagnetic skyrmions. Inspired by graded-index magnonics, we introduce a linear gradient of sa… ▽ More Magnetic skyrmions are promising building blocks for future spintronic devices. However, the skyrmion Hall effect (SkHE) remains an obstacle for practical applications based on the in-line transport of skyrmions. Here, we numerically study the static properties and current-driven dynamics of synthetic ferrimagnetic skyrmions. Inspired by graded-index magnonics, we introduce a linear gradient of saturation magnetization (Ms) in the skyrmion-hosting sample, which effectively modulates the skyrmion Hall angle and suppresses the SkHE. Micromagnetic simulations reveal that ferrimagnetic skyrmions could exhibit greater susceptibility to the variation of Ms as compared to their ferromagnetic counterparts. The Thiele analysis is also applied to support the simulation results, which elucidates that the Ms gradient dynamically modifies the intrinsic normalized size of skyrmions, consequently impacting the SkHE. Our results pave the way to the graded-index skyrmionics, which offers novel insights for designing ferrimagnet-based skyrmionic devices. △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures

Journal ref: Physical Review Research 6, 023199 (2024)

arXiv:2404.02514 [pdf, other]

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Authors: Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

Abstract: This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare… ▽ More This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compared with their high-frequency parts. Moreover, the appearance style is mainly exhibited on the low-frequency components, and the content details especially reside in high-frequency parts. This motivates us to perform editing on low-frequency components, which results in high-fidelity edited scenes. In addition, the editing is performed in the low-frequency feature space, enabling stable intensity control and novel scene transfer. Comprehensive experiments conducted on photorealistic datasets demonstrate the superior performance of high-fidelity and transferable NeRF editing. The project page is at \url{https://aigc3d.github.io/freditor}. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.00269 [pdf, other]

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Authors: Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han

Abstract: Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im… ▽ More Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion. This approach treats the query points for implicit field learning as a noisy point cloud for iterative denoising, allowing for their dynamic adaptation to the target object shape. Such adaptive query points harness diffusion learning's capability for coarse shape recovery and also enhances the implicit representation's ability to delineate finer details. Besides, an additional self-conditioning mechanism is designed to use implicit predictions as the guidance of diffusion learning, leading to a cooperative system. Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods. The generalizability of IPoD is also demonstrated on the MVImgNet dataset. Our project page is at https://yushuang-wu.github.io/IPoD. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2403.15559 [pdf, other]

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models

Authors: Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong, Liefeng Bo, Qixing Huang

Abstract: A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor… ▽ More A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlap** regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12396 [pdf, other]

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

Authors: Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

Abstract: This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for… ▽ More This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12010 [pdf, other]

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

Authors: Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

Abstract: Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,… ▽ More Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Project page: aigc3d.github.io/VideoMV/

arXiv:2402.17485 [pdf, other]

EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Authors: Linrui Tian, Qi Wang, Bang Zhang, Liefeng Bo

Abstract: In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues,… ▽ More In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.15219 [pdf, other]

doi 10.1063/5.0187825

Global Rotation of Skyrmion Bags under Vertical Microwave Fields

Authors: Lan Bo, Rongzhi Zhao, Xichao Zhang, Masahito Mochizuki, Xuefeng Zhang

Abstract: Magnetic skyrmion bags are composite topological spin textures with arbitrary topological charges. Here, we computationally study the transient rotational motion of skyrmion bags, which is characterized by a global rotation of the inner skyrmions around the central point. Distinct from conventional rotational modes found in skyrmions, the observed rotation is a forced motion associated with the br… ▽ More Magnetic skyrmion bags are composite topological spin textures with arbitrary topological charges. Here, we computationally study the transient rotational motion of skyrmion bags, which is characterized by a global rotation of the inner skyrmions around the central point. Distinct from conventional rotational modes found in skyrmions, the observed rotation is a forced motion associated with the breathing mode induced solely by vertical microwave fields. The driving force behind this rotation originates from the interactions between outer and inner skyrmions, with the angular velocity determined by the phase difference resulting from their asynchronous breathing behaviors. It is also found that skyrmion bags with larger skyrmion numbers are more conducive to the occurrence of the rotation. Our results are useful for understanding the cluster dynamics of complex topological spin textures driven by dynamic fields. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

Journal ref: J. Appl. Phys. 135, 063905 (2024)

arXiv:2401.14886 [pdf, other]

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Authors: Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu

Abstract: Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing… ▽ More Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue. In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: To appear in the Technical Track of ICSE 2024

arXiv:2401.14617 [pdf, other]

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

Authors: Sicong Cao, Xiaobing Sun, Ratnadira Widyasari, David Lo, Xiaoxue Wu, Lili Bo, Jiale Zhang, Bin Li, Wei Liu, Di Wu, Yixin Chen

Abstract: The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted… ▽ More The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: submitted to ACM Computing Surveys. arXiv admin note: text overlap with arXiv:2202.06840 by other authors

arXiv:2401.14257 [pdf, other]

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

Authors: Minglin Chen, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo

Abstract: Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we prese… ▽ More Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we present a multi-view sketch-guided text-to-3D generation framework (namely, Sketch2NeRF) to add sketch control to 3D generation. Specifically, our method leverages pretrained 2D diffusion models (e.g., Stable Diffusion and ControlNet) to supervise the optimization of a 3D scene represented by a neural radiance field (NeRF). We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF. In the experiments, we collected two kinds of multi-view sketch datasets to evaluate the proposed method. We demonstrate that our method can synthesize 3D consistent contents with fine-grained sketch control while being high-fidelity to text prompts. Extensive results show that our method achieves state-of-the-art performance in terms of sketch similarity and text alignment. △ Less

Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 11 pages, 9 figures

arXiv:2401.10242 [pdf, other]

DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis

Authors: Xin Gao, Li Hu, Peng Zhang, Bang Zhang, Liefeng Bo

Abstract: In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance… ▽ More In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance movements. Dance poses composed of a series of basic meaningful body postures, while dance movements can reflect dynamic changes such as the rhythm, melody, and style of dance. Taking inspiration from these concepts, we introduce an innovative dance generation pipeline called DanceMeld, which comprising two stages, i.e., the dance decouple stage and the dance generation stage. In the decouple stage, a hierarchical VQ-VAE is used to disentangle dance poses and dance movements in different feature space levels, where the bottom code represents dance poses, and the top code represents dance movements. In the generation stage, we utilize a diffusion model as a prior to model the distribution and generate latent codes conditioned on music features. We have experimentally demonstrated the representational capabilities of top code and bottom code, enabling the explicit decoupling expression of dance poses and dance movements. This disentanglement not only provides control over motion details, styles, and rhythm but also facilitates applications such as dance style transfer and dance unit editing. Our approach has undergone qualitative and quantitative experiments on the AIST++ dataset, demonstrating its superiority over other methods. △ Less

Submitted 30 November, 2023; originally announced January 2024.

Comments: 10 pages, 8 figures

arXiv:2312.17641 [pdf, other]

Motion State: A New Benchmark Multiple Object Tracking

Authors: Yang Feng, Liao Pan, Wu Di, Liu Bo, Zhang Xingle

Abstract: In the realm of video analysis, the field of multiple object tracking (MOT) assumes paramount importance, with the motion state of objects-whether static or dynamic relative to the ground-holding practical significance across diverse scenarios. However, the extant literature exhibits a notable dearth in the exploration of this aspect. Deep learning methodologies encounter challenges in accurately… ▽ More In the realm of video analysis, the field of multiple object tracking (MOT) assumes paramount importance, with the motion state of objects-whether static or dynamic relative to the ground-holding practical significance across diverse scenarios. However, the extant literature exhibits a notable dearth in the exploration of this aspect. Deep learning methodologies encounter challenges in accurately discerning object motion states, while conventional approaches reliant on comprehensive mathematical modeling may yield suboptimal tracking accuracy. To address these challenges, we introduce a Model-Data-Driven Motion State Judgment Object Tracking Method (MoD2T). This innovative architecture adeptly amalgamates traditional mathematical modeling with deep learning-based multi-object tracking frameworks. The integration of mathematical modeling and deep learning within MoD2T enhances the precision of object motion state determination, thereby elevating tracking accuracy. Our empirical investigations comprehensively validate the efficacy of MoD2T across varied scenarios, encompassing unmanned aerial vehicle surveillance and street-level tracking. Furthermore, to gauge the method's adeptness in discerning object motion states, we introduce the Motion State Validation F1 (MVF1) metric. This novel performance metric aims to quantitatively assess the accuracy of motion state classification, furnishing a comprehensive evaluation of MoD2T's performance. Elaborate experimental validations corroborate the rationality of MVF1. In order to holistically appraise MoD2T's performance, we meticulously annotate several renowned datasets and subject MoD2T to stringent testing. Remarkably, under conditions characterized by minimal or moderate camera motion, the achieved MVF1 values are particularly noteworthy, with exemplars including 0.774 for the KITTI dataset, 0.521 for MOT17, and 0.827 for UAVDT. △ Less

Submitted 7 May, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.15430 [pdf, other]

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Authors: Jianqiang Ren, Chao He, Lin Liu, Jiahao Chen, Yutong Wang, Yafei Song, Jianfang Li, Tangli Xue, Siqi Hu, Tao Chen, Kunkun Zheng, Jian**g Xiang, Liefeng Bo

Abstract: There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages th… ▽ More There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages the power of large language and vision models for textual intention understanding and intermediate image generation, followed by a series of human-oriented visual perception and 3D generation modules. Our system offers an intuitive approach for users to craft controllable, realistic, fully-realized 3D characters that meet their expectations within 2 minutes, while also enabling easy integration with existing CG pipeline for dynamic expressiveness. For more information, please visit the project page at https://human3daigc.github.io/MACH/. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Technical Report

arXiv:2312.13309 [pdf, other]

Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style

Authors: Haohan Wang, Wei Feng, Yang Lu, Yaoyu Li, Zheng Zhang, **g**g Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Lixing Bo, **g** Shao

Abstract: The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diff… ▽ More The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diffusion models. Concretely, we propose a Category-Wise Generator to enable large-scale background generation for the first time. A unique identifier in the prompt is assigned to each category, whose attention is located on the background by a mask-guided cross attention layer to learn the category-wise style. Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage. To advance research in this field, the first large-scale e-commerce product background generation dataset BG60k is constructed, which covers more than 60k product images from over 2k categories. Experiments demonstrate that our method could generate high-quality backgrounds for different categories, and maintain the personalized background style of reference images. The link to BG60k and codes will be available soon. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 12 pages, 11 figures

arXiv:2312.12726 [pdf, other]

Reducing Shape-Radiance Ambiguity in Radiance Fields with a Closed-Form Color Estimation Method

Authors: Qihang Fang, Yafei Song, Keqiang Li, Liefeng Bo

Abstract: Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic novel view images of a 3D scene. It includes density and color fields to model the shape and radiance of a scene, respectively. Supervised by the photometric loss in an end-to-end training manner, NeRF inherently suffers from the shape-radiance ambiguity problem, i.e., it can perfectly fit training views but does not guar… ▽ More Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic novel view images of a 3D scene. It includes density and color fields to model the shape and radiance of a scene, respectively. Supervised by the photometric loss in an end-to-end training manner, NeRF inherently suffers from the shape-radiance ambiguity problem, i.e., it can perfectly fit training views but does not guarantee decoupling the two fields correctly. To deal with this issue, existing works have incorporated prior knowledge to provide an independent supervision signal for the density field, including total variation loss, sparsity loss, distortion loss, etc. These losses are based on general assumptions about the density field, e.g., it should be smooth, sparse, or compact, which are not adaptive to a specific scene. In this paper, we propose a more adaptive method to reduce the shape-radiance ambiguity. The key is a rendering method that is only based on the density field. Specifically, we first estimate the color field based on the density field and posed images in a closed form. Then NeRF's rendering process can proceed. We address the problems in estimating the color field, including occlusion and non-uniformly distributed views. Afterward, it is applied to regularize NeRF's density field. As our regularization is guided by photometric loss, it is more adaptive compared to existing ones. Experimental results show that our method improves the density field of NeRF both qualitatively and quantitatively. Our code is available at https://github.com/qihangGH/Closed-form-color-field. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: This work has been published in NeurIPS 2023

arXiv:2312.06947 [pdf, other]

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

Authors: Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang, Ming-Ming Cheng

Abstract: 3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this… ▽ More 3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this framework, first, we introduce a new SDF-based 3D generator which learns local and global representations with proposed SDF and density consistency losses. This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT). Compared to exiting distillation strategies, it mitigates visual ambiguity and avoids mismatch between texture and geometry, thereby producing stable texture and convincing geometry while editing. Additionally, we create the CatMask-HQ dataset, a large-scale high-resolution cat face annotation for exploration of model generalization and expansion. We perform expensive experiments on both the FFHQ and CatMask-HQ datasets to demonstrate the editing quality and stability of the proposed method. Our method faithfully generates a 3D-aware edited face image based on a modified mask and a text prompt. Our code and models will be publicly released. △ Less

Submitted 3 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 13 pages, 13 figures

arXiv:2312.01841 [pdf, other]

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

Authors: Xusen Sun, Longhao Zhang, Hao Zhu, Peng Zhang, Bang Zhang, Xinya Ji, Kangneng Zhou, Daiheng Gao, Liefeng Bo, Xun Cao

Abstract: Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led or tied on all these metrics due to the one-to-many map** between audio and motion. In this paper, we propose VividTalk, a two-stage generic framework that sup… ▽ More Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led or tied on all these metrics due to the one-to-many map** between audio and motion. In this paper, we propose VividTalk, a two-stage generic framework that supports generating high-visual quality talking head videos with all the above properties. Specifically, in the first stage, we map the audio to mesh by learning two motions, including non-rigid expression motion and rigid head motion. For expression motion, both blendshape and vertex are adopted as the intermediate representation to maximize the representation ability of the model. For natural head motion, a novel learnable head pose codebook with a two-phase training mechanism is proposed. In the second stage, we proposed a dual branch motion-vae and a generator to transform the meshes into dense motion and synthesize high-quality video frame-by-frame. Extensive experiments show that the proposed VividTalk can generate high-visual quality talking head videos with lip-sync and realistic enhanced by a large margin, and outperforms previous state-of-the-art works in objective and subjective comparisons. △ Less

Submitted 6 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 10 pages, 8 figures

arXiv:2311.17117 [pdf, other]

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Authors: Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, Liefeng Bo

Abstract: Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from c… ▽ More Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from character remains a formidable problem. In this paper, we leverage the power of diffusion models and propose a novel framework tailored for character animation. To preserve consistency of intricate appearance features from reference image, we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods. Furthermore, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results. △ Less

Submitted 13 June, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Page: https://humanaigc.github.io/animate-anyone/

arXiv:2311.16918 [pdf, other]

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Authors: Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

Abstract: Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to… ▽ More Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/. △ Less

Submitted 24 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Project Page: https://aigc3d.github.io/richdreamer/

arXiv:2311.14318 [pdf, ps, other]

On optimal tracking portfolio in incomplete markets: The classical control and the reinforcement learning approaches

Authors: Lijun Bo, Yijie Huang, Xiang Yu

Abstract: This paper studies an infinite horizon optimal tracking portfolio problem using capital injection in incomplete market models. We consider the benchmark process modelled by a geometric Brownian motion with zero drift driven by some unhedgeable risk. The relaxed tracking formulation is adopted where the portfolio value compensated by the injected capital needs to outperform the benchmark process at… ▽ More This paper studies an infinite horizon optimal tracking portfolio problem using capital injection in incomplete market models. We consider the benchmark process modelled by a geometric Brownian motion with zero drift driven by some unhedgeable risk. The relaxed tracking formulation is adopted where the portfolio value compensated by the injected capital needs to outperform the benchmark process at any time, and the goal is to minimize the cost of the discounted total capital injection. In the first part, we solve the stochastic control problem when the market model is known, for which the equivalent auxiliary control problem with reflections and the associated HJB equation with a Neumann boundary condition are studied. In the second part, the market model is assumed to be unknown, for which we consider the exploratory formulation of the control problem with entropy regularizer and develop the continuous-time q-learning algorithm for the stochastic control problem with state reflections. In an illustrative example, we show the satisfactory performance of the q-learning algorithm. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: Optimal tracking portfolio, capital injection, incomplete market, stochastic control with reflection, continuous-time reinforcement learning, q-learning

arXiv:2311.04555 [pdf, ps, other]

De Finetti's Control Problem with Poisson Observations under Spectrally Positive Markov Additive Process

Authors: Lijun Bo, Wenyuan Wang, Kaixin Yan

Abstract: We study a De Finetti's optimal dividend and capital injection problem under a Markov additive model. The surplus process before dividend and capital injection is assumed to follow a spectrally positive Markov additive process (MAP). Dividend payments are made only at the jump times of an independent Poisson process and capitals are injected to avoid bankruptcy. The aim of the paper is to characte… ▽ More We study a De Finetti's optimal dividend and capital injection problem under a Markov additive model. The surplus process before dividend and capital injection is assumed to follow a spectrally positive Markov additive process (MAP). Dividend payments are made only at the jump times of an independent Poisson process and capitals are injected to avoid bankruptcy. The aim of the paper is to characterize an optimal periodic dividend and capital injection strategy that maximizes the expected total discounted dividends subtracted by the total discounted costs of capital injection. Applying the fluctuation and excursion theory for Levy processes and the stochastic control theory, we first address an auxiliary periodic dividend and capital injection control problem with a terminal payoff under the spectrally positive Levy process. Using results obtained for this auxiliary problem and a fixed point argument for iterations induced by dynamic program, we characterize the optimal strategy of our prime control problem as a regime-modulated double-barrier periodic-continuous-reflection dividend and capital injection strategy. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 33 pages. arXiv admin note: substantial text overlap with arXiv:2210.07549

MSC Class: 60G51; 93E20; 91G80

arXiv:2310.17170 [pdf, other]

DecoderTracker: Decoder-Only Method for Multiple-Object Tracking

Authors: Liao Pan, Yang Feng, Wu Di, Liu Bo, Zhang Xingle

Abstract: Decoder-only models, such as GPT, have demonstrated superior performance in many areas compared to traditional encoder-decoder structure transformer models. Over the years, end-to-end models based on the traditional transformer structure, like MOTR, have achieved remarkable performance in multi-object tracking. However, the significant computational resource consumption of these models leads to le… ▽ More Decoder-only models, such as GPT, have demonstrated superior performance in many areas compared to traditional encoder-decoder structure transformer models. Over the years, end-to-end models based on the traditional transformer structure, like MOTR, have achieved remarkable performance in multi-object tracking. However, the significant computational resource consumption of these models leads to less friendly inference speeds and training times. To address these issues, this paper attempts to construct a lightweight Decoder-only model: DecoderTracker for end-to-end multi-object tracking. Specifically, drawing on some real-time detection models, we have developed an image feature extraction network which can efficiently extract features from images to replace the encoder structure. In addition to minor innovations in the network, we analyze the potential reasons for the slow training of MOTR-like models and propose an effective training strategy to mitigate the issue of prolonged training times. On the DanceTrack dataset, without any bells and whistles, DecoderTracker's tracking performance slightly surpasses that of MOTR, with approximately twice the inference speed. Furthermore, DecoderTracker requires significantly less training time compared to MOTR. △ Less

Submitted 23 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2309.09602 [pdf, other]

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

Authors: Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu

Abstract: Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifyi… ▽ More Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.04288 [pdf, other]

Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On

Authors: Daiheng Gao, Xu Chen, Xindi Zhang, Qi Wang, Ke Sun, Bang Zhang, Liefeng Bo, Qixing Huang

Abstract: Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Sin… ▽ More Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Since traditional war**-based texture generation methods require a significant number of control points to be manually selected for each type of garment, which can be a time-consuming and tedious process. We propose a novel method, called Cloth2Tex, which eliminates the human burden in this process. Cloth2Tex is a self-supervised method that generates texture maps with reasonable layout and structural consistency. Another key feature of Cloth2Tex is that it can be used to support high-fidelity texture inpainting. This is done by combining Cloth2Tex with a prevailing latent diffusion model. We evaluate our approach both qualitatively and quantitatively and demonstrate that Cloth2Tex can generate high-quality texture maps and achieve the best visual effects in comparison to other methods. Project page: tomguluson92.github.io/projects/cloth2tex/ △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 15 pages, 15 figures

arXiv:2307.10583 [pdf]

Deep fused flow and topology features for botnet detection basing on pretrained GCN

Authors: Meng Xiaoyuan, Lang bo, Yanxi Liu, Yuhao Yan

Abstract: Nowadays, botnets have become one of the major threats to cyber security. The characteristics of botnets are mainly reflected in bots network behavior and their intercommunication relationships. Existing botnet detection methods use flow features or topology features individually, which overlook the other type of feature. This affects model performance. In this paper, we propose a botnet detection… ▽ More Nowadays, botnets have become one of the major threats to cyber security. The characteristics of botnets are mainly reflected in bots network behavior and their intercommunication relationships. Existing botnet detection methods use flow features or topology features individually, which overlook the other type of feature. This affects model performance. In this paper, we propose a botnet detection model which uses graph convolutional network (GCN) to deeply fuse flow features and topology features for the first time. We construct communication graphs from network traffic and represent nodes with flow features. Due to the imbalance of existing public traffic flow datasets, it is impossible to train a GCN model on these datasets. Therefore, we use a balanced public communication graph dataset to pretrain a GCN model, thereby guaranteeing its capacity for identify topology features. We then feed the communication graph with flow features into the pretrained GCN. The output from the last hidden layer is treated as the fusion of flow and topology features. Additionally, by adjusting the number of layers in the GCN network, the model can effectively detect botnets under both C2 and P2P structures. Validated on the public ISCX2014 dataset, our approach achieves a remarkable recall rate 92.90% and F1-score 92.76% for C2 botnets, alongside recall rate 94.66% and F1-score of 92.35% for P2P botnets. These results not only demonstrate the effectiveness of our method, but also outperform the performance of the currently leading detection models. △ Less

Submitted 24 March, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09528 [pdf, other]

doi 10.1103/PhysRevB.107.224431

Controllable Creation of Skyrmion Bags in a Ferromagnetic Nanodisk

Authors: Lan Bo, Rongzhi Zhao, Chenglong Hu, Xichao Zhang, Xuefeng Zhang, Masahito Mochizuki

Abstract: Skyrmion bags are composed of an outer skyrmion and arbitrary inner skyrmions, which have recently been observed in bulk chiral magnets, but still remain elusive in magnetic films. Here, we propose a method of creating skyrmion bags in a thin-film nanodisk, which includes three steps. Firstly, the size of outer skyrmion is enlarged by a vertical magnetic field, then inner skyrmions are nucleated a… ▽ More Skyrmion bags are composed of an outer skyrmion and arbitrary inner skyrmions, which have recently been observed in bulk chiral magnets, but still remain elusive in magnetic films. Here, we propose a method of creating skyrmion bags in a thin-film nanodisk, which includes three steps. Firstly, the size of outer skyrmion is enlarged by a vertical magnetic field, then inner skyrmions are nucleated at an off-center area by local current injection, and the system is finally reconstructed due to multiple inter-skyrmion potentials. Thus, skyrmion bags with topological charge up to forty can be created. Simulated Lorentz transmission electron microscopy images are given to facilitate the experimental demonstration. Our proposal is expected to inspire relevant experiments in magnetic films, and pave the way for potential spintronic applications based on skyrmion bags. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Journal ref: Physical Review B 107, 224431 (2023)

arXiv:2306.08312 [pdf, ps, other]

A decomposition-homogenization method for Robin boundary problems on the nonnegative orthant

Authors: Lijun Bo, Yijie Huang, Xiang Yu

Abstract: This paper studies the existence and uniqueness of a classical solution to a type of Robin boundary problems on the nonnegative orthant. We propose a new decomposition-homogenization method for the Robin boundary problem based on probabilistic representations, which leads to two auxiliary Robin boundary problems admitting some simplified probabilistic representations. The auxiliary probabilistic r… ▽ More This paper studies the existence and uniqueness of a classical solution to a type of Robin boundary problems on the nonnegative orthant. We propose a new decomposition-homogenization method for the Robin boundary problem based on probabilistic representations, which leads to two auxiliary Robin boundary problems admitting some simplified probabilistic representations. The auxiliary probabilistic representations allow us to establish the existence of a unique classical solution to the original Robin boundary problem using some stochastic flow analysis. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Keywords: Robin boundary problem, decomposition-homogenization method, probabilistic representation, classical solution, stochastic flow analysis

arXiv:2305.13705 [pdf, other]

DiffHand: End-to-End Hand Mesh Reconstruction via Diffusion Models

Authors: Lijun Li, Li'an Zhuo, Bang Zhang, Liefeng Bo, Chen Chen

Abstract: Hand mesh reconstruction from the monocular image is a challenging task due to its depth ambiguity and severe occlusion, there remains a non-unique map** between the monocular image and hand mesh. To address this, we develop DiffHand, the first diffusion-based framework that approaches hand mesh reconstruction as a denoising diffusion process. Our one-stage pipeline utilizes noise to model the u… ▽ More Hand mesh reconstruction from the monocular image is a challenging task due to its depth ambiguity and severe occlusion, there remains a non-unique map** between the monocular image and hand mesh. To address this, we develop DiffHand, the first diffusion-based framework that approaches hand mesh reconstruction as a denoising diffusion process. Our one-stage pipeline utilizes noise to model the uncertainty distribution of the intermediate hand mesh in a forward process. We reformulate the denoising diffusion process to gradually refine noisy hand mesh and then select mesh with the highest probability of being correct based on the image itself, rather than relying on 2D joints extracted beforehand. To better model the connectivity of hand vertices, we design a novel network module called the cross-modality decoder. Extensive experiments on the popular benchmarks demonstrate that our method outperforms the state-of-the-art hand mesh reconstruction approaches by achieving 5.8mm PA-MPJPE on the Freihand test set, 4.98mm PA-MPJPE on the DexYCB test set. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.12497 [pdf, other]

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Authors: Yuan Dong, Chuan Fang, Liefeng Bo, Zilong Dong, ** Tan

Abstract: Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image. Previous work has made lots of effort to solve the scene understanding task in a bottom-up form, thus each sub-task is processed separately and few correlations are explored in this pr… ▽ More Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image. Previous work has made lots of effort to solve the scene understanding task in a bottom-up form, thus each sub-task is processed separately and few correlations are explored in this procedure. In this paper, we propose a novel method using depth prior for holistic indoor scene understanding which recovers the objects' shapes, oriented bounding boxes and the 3D room layout simultaneously from a single panorama. In order to fully utilize the rich context information, we design a transformer-based context module to predict the representation and relationship among each component of the scene. In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes. Experiments on the synthetic and real-world datasets demonstrate that our method outperforms previous panoramic scene understanding methods in terms of both layout estimation and 3D object detection. △ Less

Submitted 5 June, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2305.04808 [pdf, other]

CAT: A Contextualized Conceptualization and Instantiation Framework for Commonsense Reasoning

Authors: Weiqi Wang, Tianqing Fang, Baixuan Xu, Chun Yi Louis Bo, Yangqiu Song, Lei Chen

Abstract: Commonsense reasoning, aiming at endowing machines with a human-like ability to make situational presumptions, is extremely challenging to generalize. For someone who barely knows about "meditation," while is knowledgeable about "singing," he can still infer that "meditation makes people relaxed" from the existing knowledge that "singing makes people relaxed" by first conceptualizing "singing" as… ▽ More Commonsense reasoning, aiming at endowing machines with a human-like ability to make situational presumptions, is extremely challenging to generalize. For someone who barely knows about "meditation," while is knowledgeable about "singing," he can still infer that "meditation makes people relaxed" from the existing knowledge that "singing makes people relaxed" by first conceptualizing "singing" as a "relaxing event" and then instantiating that event to "meditation." This process, known as conceptual induction and deduction, is fundamental to commonsense reasoning while lacking both labeled data and methodologies to enhance commonsense modeling. To fill such a research gap, we propose CAT (Contextualized ConceptuAlization and InsTantiation), a semi-supervised learning framework that integrates event conceptualization and instantiation to conceptualize commonsense knowledge bases at scale. Extensive experiments show that our framework achieves state-of-the-art performances on two conceptualization tasks, and the acquired abstract commonsense knowledge can significantly improve commonsense inference modeling. Our code, data, and fine-tuned models are publicly available at https://github.com/HKUST-KnowComp/CAT. △ Less

Submitted 10 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: ACL2023 Main Conference

arXiv:2304.10802 [pdf, other]

An extended Merton problem with relaxed benchmark tracking

Authors: Lijun Bo, Yijie Huang, Xiang Yu

Abstract: This paper studies a Merton's optimal portfolio and consumption problem in an extended formulation incorporating the tracking of a benchmark process described by a geometric Brownian motion. We consider a relaxed tracking formulation such that the wealth process compensated by a fictitious capital injection outperforms the benchmark at all times. The fund manager aims to maximize the expected util… ▽ More This paper studies a Merton's optimal portfolio and consumption problem in an extended formulation incorporating the tracking of a benchmark process described by a geometric Brownian motion. We consider a relaxed tracking formulation such that the wealth process compensated by a fictitious capital injection outperforms the benchmark at all times. The fund manager aims to maximize the expected utility of consumption deducted by the cost of the capital injection, where the latter term can also be regarded as the expected largest shortfall of the wealth with reference to the benchmark. By introducing an auxiliary state process with reflection, we formulate and tackle an equivalent stochastic control problem by means of the dual transform and probabilistic representation, where the dual PDE can be solved explicitly. On the strength of the closed-form results, we can derive and verify the optimal feedback control for the primal control problem, allowing us to discuss some new and interesting financial implications induced by the additional risk-taking from the capital injection and the goal of tracking. △ Less

Submitted 7 March, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: Keywords: Benchmark tracking, capital injection, expected largest shortfall, consumption and portfolio choice, Neumann boundary condition

arXiv:2304.05097 [pdf, other]

One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

Authors: Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li

Abstract: Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural render… ▽ More Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/ △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: Accepted by CVPR 2023

arXiv:2304.04351 [pdf, other]

Evaluate Geometry of Radiance Fields with Low-frequency Color Prior

Authors: Qihang Fang, Yafei Song, Keqiang Li, Li Shen, Huaiyu Wu, Gang Xiong, Liefeng Bo

Abstract: A radiance field is an effective representation of 3D scenes, which has been widely adopted in novel-view synthesis and 3D reconstruction. It is still an open and challenging problem to evaluate the geometry, i.e., the density field, as the ground-truth is almost impossible to obtain. One alternative indirect solution is to transform the density field into a point-cloud and compute its Chamfer Dis… ▽ More A radiance field is an effective representation of 3D scenes, which has been widely adopted in novel-view synthesis and 3D reconstruction. It is still an open and challenging problem to evaluate the geometry, i.e., the density field, as the ground-truth is almost impossible to obtain. One alternative indirect solution is to transform the density field into a point-cloud and compute its Chamfer Distance with the scanned ground-truth. However, many widely-used datasets have no point-cloud ground-truth since the scanning process along with the equipment is expensive and complicated. To this end, we propose a novel metric, named Inverse Mean Residual Color (IMRC), which can evaluate the geometry only with the observation images. Our key insight is that the better the geometry, the lower-frequency the computed color field. From this insight, given a reconstructed density field and observation images, we design a closed-form method to approximate the color field with low-frequency spherical harmonics, and compute the inverse mean residual color. Then the higher the IMRC, the better the geometry. Qualitative and quantitative experimental results verify the effectiveness of our proposed IMRC metric. We also benchmark several state-of-the-art methods using IMRC to promote future related research. Our code is available at https://github.com/qihangGH/IMRC. △ Less

Submitted 17 January, 2024; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: This paper has been accepted by AAAI 2024

arXiv:2304.04233 [pdf, other]

ODDFUZZ: Discovering Java Deserialization Vulnerabilities via Structure-Aware Directed Greybox Fuzzing

Authors: Sicong Cao, Biao He, Xiaobing Sun, Yu Ouyang, Chao Zhang, Xiaoxue Wu, Ting Su, Lili Bo, Bin Li, Chuanlei Ma, Jiajia Li, Tao Wei

Abstract: Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover… ▽ More Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover Java deserialization vulnerabilities. First, ODDFUZZ performs lightweight static taint analysis to identify candidate gadget chains that may cause deserialization vulner-abilities. In this step, ODDFUZZ tries to locate all candidates and avoid false negatives. Then, ODDFUZZ performs directed greybox fuzzing (DGF) to explore those candidates and generate PoC testcases to mitigate false positives. Specifically, ODDFUZZ applies a structure-aware seed generation method to guarantee the validity of the testcases, and adopts a novel hybrid feedback and a step-forward strategy to guide the directed fuzzing. We implemented a prototype of ODDFUZZ and evaluated it on the popular Java deserialization repository ysoserial. Results show that, ODDFUZZ could discover 16 out of 34 known gadget chains, while two state-of-the-art baselines only identify three of them. In addition, we evaluated ODDFUZZ on real-world applications including Oracle WebLogic Server, Apache Dubbo, Sonatype Nexus, and protostuff, and found six previously unreported exploitable gadget chains with five CVEs assigned. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: To appear in the Main Track of IEEE S&P 2023

arXiv:2303.07593 [pdf, other]

Improving Java Deserialization Gadget Chain Mining via Overriding-Guided Object Generation

Authors: Sicong Cao, Xiaobing Sun, Xiaoxue Wu, Lili Bo, Bin Li, Rongxin Wu, Wei Liu, Biao He, Yu Ouyang, Jiajia Li

Abstract: Java (de)serialization is prone to causing security-critical vulnerabilities that attackers can invoke existing methods (gadgets) on the application's classpath to construct a gadget chain to perform malicious behaviors. Several techniques have been proposed to statically identify suspicious gadget chains and dynamically generate injection objects for fuzzing. However, due to their incomplete supp… ▽ More Java (de)serialization is prone to causing security-critical vulnerabilities that attackers can invoke existing methods (gadgets) on the application's classpath to construct a gadget chain to perform malicious behaviors. Several techniques have been proposed to statically identify suspicious gadget chains and dynamically generate injection objects for fuzzing. However, due to their incomplete support for dynamic program features (e.g., Java runtime polymorphism) and ineffective injection object generation for fuzzing, the existing techniques are still far from satisfactory. In this paper, we first performed an empirical study to investigate the characteristics of Java deserialization vulnerabilities based on our manually collected 86 publicly known gadget chains. The empirical results show that 1) Java deserialization gadgets are usually exploited by abusing runtime polymorphism, which enables attackers to reuse serializable overridden methods; and 2) attackers usually invoke exploitable overridden methods (gadgets) via dynamic binding to generate injection objects for gadget chain construction. Based on our empirical findings, we propose a novel gadget chain mining approach, \emph{GCMiner}, which captures both explicit and implicit method calls to identify more gadget chains, and adopts an overriding-guided object generation approach to generate valid injection objects for fuzzing. The evaluation results show that \emph{GCMiner} significantly outperforms the state-of-the-art techniques, and discovers 56 unique gadget chains that cannot be identified by the baseline approaches. △ Less

Submitted 3 April, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: To appear in the Technical Track of ICSE 2023

arXiv:2303.06095 [pdf, other]

HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction

Authors: Jie Zhou, Xianshuai Cao, Wenhao Li, Lin Bo, Kun Zhang, Chuan Luo, Qian Yu

Abstract: Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the… ▽ More Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the complex relationships inherent among various scenarios and tasks, resulting in unsatisfactory performance. To tackle the problem, we propose a Hierarchical information extraction Network (HiNet) for multi-scenario and multi-task recommendation, which achieves hierarchical extraction based on coarse-to-fine knowledge transfer scheme. The multiple extraction layers of the hierarchical network enable the model to enhance the capability of transferring valuable information across scenarios while preserving specific features of scenarios and tasks. Furthermore, a novel scenario-aware attentive network module is proposed to model correlations between scenarios explicitly. Comprehensive experiments conducted on real-world industrial datasets from Meituan Meishi platform demonstrate that HiNet achieves a new state-of-the-art performance and significantly outperforms existing solutions. HiNet is currently fully deployed in two scenarios and has achieved 2.87% and 1.75% order quantity gain respectively. △ Less

Submitted 13 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.08678 [pdf, other]

doi 10.1109/TNNLS.2022.3204775

Multi-Behavior Graph Neural Networks for Recommender System

Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo

Abstract: Recommender systems have been demonstrated to be effective to meet user's personalized interests for many online services (e.g., E-commerce and online advertising platforms). Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron… ▽ More Recommender systems have been demonstrated to be effective to meet user's personalized interests for many online services (e.g., E-commerce and online advertising platforms). Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron and autoencoder. However, the majority of them model the user-item relationship with single type of interaction, while overlooking the diversity of user behaviors on interacting with items, which can be click, add-to-cart, tag-as-favorite and purchase. Such various types of interaction behaviors have great potential in providing rich information for understanding the user preferences. In this paper, we pay special attention on user-item relationships with the exploration of multi-typed user behaviors. Technically, we contribute a new multi-behavior graph neural network (MBRec), which specially accounts for diverse interaction patterns as well as the underlying cross-type behavior inter-dependencies. In the MBRec framework, we develop a graph-structured learning framework to perform expressive modeling of high-order connectivity in behavior-aware user-item interaction graph. After that, a mutual relation encoder is proposed to adaptively uncover complex relational structures and make aggregations across layer-specific behavior representations. Through comprehensive evaluation on real-world datasets, the advantages of our MBRec method have been validated under different experimental settings. Further analysis verifies the positive effects of incorporating the multi-behavioral context into the recommendation paradigm. Additionally, the conducted case studies offer insights into the interpretability of user multi-behavior representations. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: Published at IEEE Transactions on Nueral Networks and Learning Systems, 2022

arXiv:2302.08302 [pdf, other]

Stochastic control problems with state-reflections arising from relaxed benchmark tracking

Authors: Lijun Bo, Yijie Huang, Xiang Yu

Abstract: This paper studies stochastic control problems motivated by optimal consumption with wealth benchmark tracking. The benchmark process is modeled by a combination of a geometric Brownian motion and a running maximum process, indicating its increasing trend in the long run. We consider a relaxed tracking formulation such that the wealth compensated by the injected capital always dominates the benchm… ▽ More This paper studies stochastic control problems motivated by optimal consumption with wealth benchmark tracking. The benchmark process is modeled by a combination of a geometric Brownian motion and a running maximum process, indicating its increasing trend in the long run. We consider a relaxed tracking formulation such that the wealth compensated by the injected capital always dominates the benchmark process. The stochastic control problem is to maximize the expected utility of consumption deducted by the cost of the capital injection under the dynamic floor constraint. By introducing two auxiliary state processes with reflections, an equivalent auxiliary control problem is formulated and studied, which leads to the HJB equation with two Neumann boundary conditions. We establish the existence of a unique classical solution to the dual PDE using some novel probabilistic representations involving the local time of some dual processes together with a tailor-made decomposition-homogenization technique. The proof of the verification theorem on the optimal feedback control can be carried out by some stochastic flow analysis and technical estimations of the optimal control. △ Less

Submitted 25 April, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: Keywords: Relaxed benchmark tracking, optimal consumption, Neumann boundary conditions, probabilistic representation, reflected diffusion process

arXiv:2212.04701 [pdf, other]

4K-NeRF: High Fidelity Neural Radiance Fields at Ultra High Resolutions

Authors: Zhongshu Wang, Lingzhi Li, Zhen Shen, Li Shen, Liefeng Bo

Abstract: In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel-wise manner in which rays (or pixels) are treated independently on both training and inference… ▽ More In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel-wise manner in which rays (or pixels) are treated independently on both training and inference phases, limiting its representational ability on describing subtle details, especially when lifting to a extremely high resolution. We address the issue by exploring ray correlation to enhance high-frequency details recovery. Particularly, we use the 3D-aware encoder to model geometric information effectively in a lower resolution space and recover fine details through the 3D-aware decoder, conditioned on ray features and depths estimated by the encoder. Joint training with patch-based sampling further facilitates our method incorporating the supervision from perception oriented regularization beyond pixel-wise loss. Benefiting from the use of geometry-aware local context, our method can significantly boost rendering quality on high-frequency details compared with modern NeRF methods, and achieve the state-of-the-art visual quality on 4K ultra-high-resolution scenarios. Code Available at \url{https://github.com/frozoul/4K-NeRF} △ Less

Submitted 3 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

arXiv:2211.16386 [pdf, other]

Compressing Volumetric Radiance Fields to 1 MB

Authors: Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, Liefeng Bo

Abstract: Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this i… ▽ More Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF} △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.09035 [pdf, other]

A Creative Industry Image Generation Dataset Based on Captions

Authors: Xiang Yuejia, Lv Chuanhao, Liu Qingdazhu, Yang Xiaocui, Liu Bo, Ju Meizhi

Abstract: Most image generation methods are difficult to precisely control the properties of the generated images, such as structure, scale, shape, etc., which limits its large-scale application in creative industries such as conceptual design and graphic design, and so on. Using the prompt and the sketch is a practical solution for controllability. Existing datasets lack either prompt or sketch and are not… ▽ More Most image generation methods are difficult to precisely control the properties of the generated images, such as structure, scale, shape, etc., which limits its large-scale application in creative industries such as conceptual design and graphic design, and so on. Using the prompt and the sketch is a practical solution for controllability. Existing datasets lack either prompt or sketch and are not designed for the creative industry. Here is the main contribution of our work. a) This is the first dataset that covers the 4 most important areas of creative industry domains and is labeled with prompt and sketch. b) We provide multiple reference images in the test set and fine-grained scores for each reference which are useful for measurement. c) We apply two state-of-the-art models to our dataset and then find some shortcomings, such as the prompt is more highly valued than the sketch. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2210.17129 [pdf]

Observation of tungsten impurity suppression with ECRH by an X-ray Crystal Spectrometer on EAST

Authors: Lin Zichao, Zhang Hongming, Wang Fudi, Bae Chenonho, Fu Jia, Shen Yongcai, Lu Dian, ** Yifei, He Liang, Wang Minrui, Lin Guangle, Ye Kaixuan, Wang Shouxin, Zhao Hailin, Lyu Bo

Abstract: Impurity degrades tokamak plasmas confinement by causing energy loss, diluting the fuel concentration, even terminating the discharges in some extreme cases. Previously, the suppression effects of on-axis Electron Cyclotron Resonance Heating (ECRH) on the impurity accumulation have been investigated on EAST by the extreme ultraviolet (EUV) spectroscopy. However, it is difficult to quantify the cha… ▽ More Impurity degrades tokamak plasmas confinement by causing energy loss, diluting the fuel concentration, even terminating the discharges in some extreme cases. Previously, the suppression effects of on-axis Electron Cyclotron Resonance Heating (ECRH) on the impurity accumulation have been investigated on EAST by the extreme ultraviolet (EUV) spectroscopy. However, it is difficult to quantify the changes in impurity tungsten (W) profile since the W line emissions in the EUV range could not be easily resolved. The X-ray Crystal Spectroscopy (XCS), that used to provide the ion temperature and the rotation velocity by measuring lines emissions in the soft X-ray range, also can be used to study the behavior of impurity W emissions. To begin with, in-situ absolute intensity calibration for Tangential XCS (TXCS) is conducted by analyzing the measurements of the bremsstrahlung radiation intensity. After obtaining the calibration coefficient, W44+ ion density profiles are evaluated by Abel inversion using the spectral line of W XLV (3.9095 Å). Thus, a direct observation of W44+ impurity concentration suppressed by ECRH is accomplished. The obtained W density profiles can be used to analyze the W transport by combining with the impurity transport codes in the future. △ Less

Submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.07549 [pdf, ps, other]

On De Finetti's control under Poisson observations: optimality of a double barrier strategy in a Markov additive model

Authors: Lijun Bo, Wenyuan Wang, Kaixin Yan

Abstract: In this paper we consider the De Finetti's optimal dividend and capital injection problem under a Markov additive model. We assume that the surplus process before dividends and capital injections follows a spectrally positive Markov additive process. Dividend payments are made only at the jump times of an independent Poisson process. Capitals are required to be injected whenever needed to ensure a… ▽ More In this paper we consider the De Finetti's optimal dividend and capital injection problem under a Markov additive model. We assume that the surplus process before dividends and capital injections follows a spectrally positive Markov additive process. Dividend payments are made only at the jump times of an independent Poisson process. Capitals are required to be injected whenever needed to ensure a non-negative surplus process to avoid bankruptcy. Our purpose is to characterize the optimal periodic dividend and capital injection strategy that maximizes the expected total discounted dividends subtracted by the total discounted costs of capital injection. To this end, we first consider an auxiliary optimal periodic dividend and capital injection problem with final payoff under a single spectrally positive Lévy process and conjecture that the optimal strategy is a double barrier strategy. Using the fluctuation theory and excursion-theoretical approach of the spectrally positive Lévy process and the Hamilton-Jacobi-Bellman inequality approach of the control theory, we are able to verify the conjecture that some double barrier periodic dividend and capital injection strategy solves the auxiliary problem. With the results for the auxiliary control problem and a fixed point argument for recursive iterations induced by the dynamic programming principle, the optimality of a regime-modulated double barrier periodic dividend and capital injection strategy is proved for our target control problem. △ Less

Submitted 26 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.02661

arXiv:2209.12304 [pdf]

Issues in Implementing Regression Calibration Analyses

Authors: Lillian Boe, Pamela A. Shaw, Douglas Midthune, Paul Gustafson, Victor Kipnis, Eunyoung Park, Daniela Sotres-Alvarez, Laurence Freedman

Abstract: Regression calibration is a popular approach for correcting biases in estimated regression parameters when exposure variables are measured with error. This approach involves building a calibration equation to estimate the value of the unknown true exposure given the error-prone measurement and other confounding covariates. The estimated, or calibrated, exposure is then substituted for the true exp… ▽ More Regression calibration is a popular approach for correcting biases in estimated regression parameters when exposure variables are measured with error. This approach involves building a calibration equation to estimate the value of the unknown true exposure given the error-prone measurement and other confounding covariates. The estimated, or calibrated, exposure is then substituted for the true exposure in the health outcome regression model. When used properly, regression calibration can greatly reduce the bias induced by exposure measurement error. Here, we first provide an overview of the statistical framework for regression calibration, specifically discussing how a special type of error, called Berkson error, arises in the estimated exposure. We then present practical issues to consider when applying regression calibration, including: (1) how to develop the calibration equation and which covariates to include; (2) valid ways to calculate standard errors (SE) of estimated regression coefficients; and (3) problems arising if one of the covariates in the calibration model is a mediator of the relationship between the exposure and outcome. Throughout the paper, we provide illustrative examples using data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) and simulations. We conclude with recommendations for how to perform regression calibration. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.10061 [pdf, ps, other]

Practical considerations for sandwich variance estimation in two-stage regression settings

Authors: Lillian A. Boe, Thomas Lumley, Pamela A. Shaw

Abstract: We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite that it requires less computation time than… ▽ More We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite that it requires less computation time than popular resampling approaches for variance estimation, specifically the bootstrap. This is likely due to requiring specialized statistical coding. In practice, a simple bootstrap approach with Wald confidence intervals is often applied, but this approach can yield confidence intervals that do not achieve the nominal coverage level. We first outline the steps needed to compute the sandwich variance estimator. We then develop a convenient method of computation in R for sandwich variance estimation, which leverages standard regression model outputs and existing R functions and can be applied in the case of a simple random sample or complex survey design. We use a simulation study to compare the performance of the sandwich to a resampling variance approach for both data settings. Finally, we further compare these two variance estimation approaches for data examples from the Women's Health Initiative (WHI) and Hispanic Community Health Study/Study of Latinos (HCHS/SOL). △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 18 pages of main manuscript including 2 figures and 4 tables; 14 pages of supplementary materials and references (including 2 tables)

arXiv:2206.13341 [pdf, other]

A mean field game approach to equilibrium consumption under external habit formation

Authors: Lijun Bo, Shihua Wang, Xiang Yu

Abstract: This paper studies the equilibrium consumption under external habit formation in a large population of agents. We first formulate problems under two types of conventional habit formation preferences, namely linear and multiplicative external habit formation, in a mean field game framework. In a log-normal market model with the asset specialization, we characterize one mean field equilibrium in ana… ▽ More This paper studies the equilibrium consumption under external habit formation in a large population of agents. We first formulate problems under two types of conventional habit formation preferences, namely linear and multiplicative external habit formation, in a mean field game framework. In a log-normal market model with the asset specialization, we characterize one mean field equilibrium in analytical form in each problem, allowing us to understand some quantitative properties of the equilibrium strategy and conclude some financial implications caused by consumption habits from a mean-field perspective. In each problem with n agents, we construct an approximate Nash equilibrium for the n-player game using the obtained mean field equilibrium when n is sufficiently large. The explicit convergence order in each problem can also be obtained. △ Less

Submitted 8 March, 2024; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Keywords: Catching up with the Joneses, linear habit formation, multiplicative habit formation, mean field equilibrium, approximate Nash equilibrium

Showing 1–50 of 124 results for author: Bo, L