-
StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal
Authors:
Chongjie Ye,
Lingteng Qiu,
Xiaodong Gu,
Qi Zuo,
Yushuang Wu,
Zilong Dong,
Liefeng Bo,
Yuliang Xiu,
Xiaoguang Han
Abstract:
This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e…
▽ More
This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Gaussian-Informed Continuum for Physical Property Identification and Simulation
Authors:
Junhao Cai,
Yuji Yang,
Weihao Yuan,
Yisheng He,
Zilong Dong,
Liefeng Bo,
Hui Cheng,
Qifeng Chen
Abstract:
This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a…
▽ More
This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method
Authors:
Pan Liao,
Feng Yang,
Di Wu,
Liu Bo
Abstract:
Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hyb…
▽ More
Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hybrid visual encoder, enhancement of depth prediction mechanisms, and introduction of an innovative query generation strategy, augmented by an advanced depth predictor. Building on MonoDETR, MonoDETRNext introduces two variants: MonoDETRNext-F, which emphasizes speed, and MonoDETRNext-A, which focuses on precision. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 4.60% improvement in the AP3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-F showed a 2.21% increase. Additionally, the computational efficiency of MonoDETRNext-F slightly exceeds that of its predecessor.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Suppression of the skyrmion Hall effect in synthetic ferrimagnets with gradient magnetization
Authors:
Lan Bo,
Xichao Zhang,
Masahito Mochizuki,
Xuefeng Zhang
Abstract:
Magnetic skyrmions are promising building blocks for future spintronic devices. However, the skyrmion Hall effect (SkHE) remains an obstacle for practical applications based on the in-line transport of skyrmions. Here, we numerically study the static properties and current-driven dynamics of synthetic ferrimagnetic skyrmions. Inspired by graded-index magnonics, we introduce a linear gradient of sa…
▽ More
Magnetic skyrmions are promising building blocks for future spintronic devices. However, the skyrmion Hall effect (SkHE) remains an obstacle for practical applications based on the in-line transport of skyrmions. Here, we numerically study the static properties and current-driven dynamics of synthetic ferrimagnetic skyrmions. Inspired by graded-index magnonics, we introduce a linear gradient of saturation magnetization (Ms) in the skyrmion-hosting sample, which effectively modulates the skyrmion Hall angle and suppresses the SkHE. Micromagnetic simulations reveal that ferrimagnetic skyrmions could exhibit greater susceptibility to the variation of Ms as compared to their ferromagnetic counterparts. The Thiele analysis is also applied to support the simulation results, which elucidates that the Ms gradient dynamically modifies the intrinsic normalized size of skyrmions, consequently impacting the SkHE. Our results pave the way to the graded-index skyrmionics, which offers novel insights for designing ferrimagnet-based skyrmionic devices.
△ Less
Submitted 24 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Authors:
Yisheng He,
Weihao Yuan,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare…
▽ More
This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compared with their high-frequency parts. Moreover, the appearance style is mainly exhibited on the low-frequency components, and the content details especially reside in high-frequency parts. This motivates us to perform editing on low-frequency components, which results in high-fidelity edited scenes. In addition, the editing is performed in the low-frequency feature space, enabling stable intensity control and novel scene transfer. Comprehensive experiments conducted on photorealistic datasets demonstrate the superior performance of high-fidelity and transferable NeRF editing. The project page is at \url{https://aigc3d.github.io/freditor}.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Authors:
Yushuang Wu,
Luyue Shi,
Junhao Cai,
Weihao Yuan,
Lingteng Qiu,
Zilong Dong,
Liefeng Bo,
Shuguang Cui,
Xiaoguang Han
Abstract:
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im…
▽ More
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion. This approach treats the query points for implicit field learning as a noisy point cloud for iterative denoising, allowing for their dynamic adaptation to the target object shape. Such adaptive query points harness diffusion learning's capability for coarse shape recovery and also enhances the implicit representation's ability to delineate finer details. Besides, an additional self-conditioning mechanism is designed to use implicit predictions as the guidance of diffusion learning, leading to a cooperative system. Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods. The generalizability of IPoD is also demonstrated on the MVImgNet dataset. Our project page is at https://yushuang-wu.github.io/IPoD.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models
Authors:
Zhengyi Zhao,
Chen Song,
Xiaodong Gu,
Yuan Dong,
Qi Zuo,
Weihao Yuan,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor…
▽ More
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlap** regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
Authors:
Junhao Cai,
Yisheng He,
Weihao Yuan,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qifeng Chen
Abstract:
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for…
▽ More
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
Authors:
Qi Zuo,
Xiaodong Gu,
Lingteng Qiu,
Yuan Dong,
Zhengyi Zhao,
Weihao Yuan,
Rui Peng,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,…
▽ More
Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Authors:
Linrui Tian,
Qi Wang,
Bang Zhang,
Liefeng Bo
Abstract:
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues,…
▽ More
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Global Rotation of Skyrmion Bags under Vertical Microwave Fields
Authors:
Lan Bo,
Rongzhi Zhao,
Xichao Zhang,
Masahito Mochizuki,
Xuefeng Zhang
Abstract:
Magnetic skyrmion bags are composite topological spin textures with arbitrary topological charges. Here, we computationally study the transient rotational motion of skyrmion bags, which is characterized by a global rotation of the inner skyrmions around the central point. Distinct from conventional rotational modes found in skyrmions, the observed rotation is a forced motion associated with the br…
▽ More
Magnetic skyrmion bags are composite topological spin textures with arbitrary topological charges. Here, we computationally study the transient rotational motion of skyrmion bags, which is characterized by a global rotation of the inner skyrmions around the central point. Distinct from conventional rotational modes found in skyrmions, the observed rotation is a forced motion associated with the breathing mode induced solely by vertical microwave fields. The driving force behind this rotation originates from the interactions between outer and inner skyrmions, with the angular velocity determined by the phase difference resulting from their asynchronous breathing behaviors. It is also found that skyrmion bags with larger skyrmion numbers are more conducive to the occurrence of the rotation. Our results are useful for understanding the cluster dynamics of complex topological spin textures driven by dynamic fields.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems
Authors:
Sicong Cao,
Xiaobing Sun,
Xiaoxue Wu,
David Lo,
Lili Bo,
Bin Li,
Wei Liu
Abstract:
Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing…
▽ More
Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue.
In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research
Authors:
Sicong Cao,
Xiaobing Sun,
Ratnadira Widyasari,
David Lo,
Xiaoxue Wu,
Lili Bo,
Jiale Zhang,
Bin Li,
Wei Liu,
Di Wu,
Yixin Chen
Abstract:
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted…
▽ More
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Authors:
Minglin Chen,
Weihao Yuan,
Yukun Wang,
Zhe Sheng,
Yisheng He,
Zilong Dong,
Liefeng Bo,
Yulan Guo
Abstract:
Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we prese…
▽ More
Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we present a multi-view sketch-guided text-to-3D generation framework (namely, Sketch2NeRF) to add sketch control to 3D generation. Specifically, our method leverages pretrained 2D diffusion models (e.g., Stable Diffusion and ControlNet) to supervise the optimization of a 3D scene represented by a neural radiance field (NeRF). We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF. In the experiments, we collected two kinds of multi-view sketch datasets to evaluate the proposed method. We demonstrate that our method can synthesize 3D consistent contents with fine-grained sketch control while being high-fidelity to text prompts. Extensive results show that our method achieves state-of-the-art performance in terms of sketch similarity and text alignment.
△ Less
Submitted 27 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis
Authors:
Xin Gao,
Li Hu,
Peng Zhang,
Bang Zhang,
Liefeng Bo
Abstract:
In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance…
▽ More
In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance movements. Dance poses composed of a series of basic meaningful body postures, while dance movements can reflect dynamic changes such as the rhythm, melody, and style of dance. Taking inspiration from these concepts, we introduce an innovative dance generation pipeline called DanceMeld, which comprising two stages, i.e., the dance decouple stage and the dance generation stage. In the decouple stage, a hierarchical VQ-VAE is used to disentangle dance poses and dance movements in different feature space levels, where the bottom code represents dance poses, and the top code represents dance movements. In the generation stage, we utilize a diffusion model as a prior to model the distribution and generate latent codes conditioned on music features. We have experimentally demonstrated the representational capabilities of top code and bottom code, enabling the explicit decoupling expression of dance poses and dance movements. This disentanglement not only provides control over motion details, styles, and rhythm but also facilitates applications such as dance style transfer and dance unit editing. Our approach has undergone qualitative and quantitative experiments on the AIST++ dataset, demonstrating its superiority over other methods.
△ Less
Submitted 30 November, 2023;
originally announced January 2024.
-
Motion State: A New Benchmark Multiple Object Tracking
Authors:
Yang Feng,
Liao Pan,
Wu Di,
Liu Bo,
Zhang Xingle
Abstract:
In the realm of video analysis, the field of multiple object tracking (MOT) assumes paramount importance, with the motion state of objects-whether static or dynamic relative to the ground-holding practical significance across diverse scenarios. However, the extant literature exhibits a notable dearth in the exploration of this aspect. Deep learning methodologies encounter challenges in accurately…
▽ More
In the realm of video analysis, the field of multiple object tracking (MOT) assumes paramount importance, with the motion state of objects-whether static or dynamic relative to the ground-holding practical significance across diverse scenarios. However, the extant literature exhibits a notable dearth in the exploration of this aspect. Deep learning methodologies encounter challenges in accurately discerning object motion states, while conventional approaches reliant on comprehensive mathematical modeling may yield suboptimal tracking accuracy. To address these challenges, we introduce a Model-Data-Driven Motion State Judgment Object Tracking Method (MoD2T). This innovative architecture adeptly amalgamates traditional mathematical modeling with deep learning-based multi-object tracking frameworks. The integration of mathematical modeling and deep learning within MoD2T enhances the precision of object motion state determination, thereby elevating tracking accuracy. Our empirical investigations comprehensively validate the efficacy of MoD2T across varied scenarios, encompassing unmanned aerial vehicle surveillance and street-level tracking. Furthermore, to gauge the method's adeptness in discerning object motion states, we introduce the Motion State Validation F1 (MVF1) metric. This novel performance metric aims to quantitatively assess the accuracy of motion state classification, furnishing a comprehensive evaluation of MoD2T's performance. Elaborate experimental validations corroborate the rationality of MVF1. In order to holistically appraise MoD2T's performance, we meticulously annotate several renowned datasets and subject MoD2T to stringent testing. Remarkably, under conditions characterized by minimal or moderate camera motion, the achieved MVF1 values are particularly noteworthy, with exemplars including 0.774 for the KITTI dataset, 0.521 for MOT17, and 0.827 for UAVDT.
△ Less
Submitted 7 May, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes
Authors:
Jianqiang Ren,
Chao He,
Lin Liu,
Jiahao Chen,
Yutong Wang,
Yafei Song,
Jianfang Li,
Tangli Xue,
Siqi Hu,
Tao Chen,
Kunkun Zheng,
Jian**g Xiang,
Liefeng Bo
Abstract:
There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages th…
▽ More
There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages the power of large language and vision models for textual intention understanding and intermediate image generation, followed by a series of human-oriented visual perception and 3D generation modules. Our system offers an intuitive approach for users to craft controllable, realistic, fully-realized 3D characters that meet their expectations within 2 minutes, while also enabling easy integration with existing CG pipeline for dynamic expressiveness. For more information, please visit the project page at https://human3daigc.github.io/MACH/.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style
Authors:
Haohan Wang,
Wei Feng,
Yang Lu,
Yaoyu Li,
Zheng Zhang,
**g**g Lv,
Xin Zhu,
Junjie Shen,
Zhangang Lin,
Lixing Bo,
**g** Shao
Abstract:
The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diff…
▽ More
The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diffusion models. Concretely, we propose a Category-Wise Generator to enable large-scale background generation for the first time. A unique identifier in the prompt is assigned to each category, whose attention is located on the background by a mask-guided cross attention layer to learn the category-wise style. Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage. To advance research in this field, the first large-scale e-commerce product background generation dataset BG60k is constructed, which covers more than 60k product images from over 2k categories. Experiments demonstrate that our method could generate high-quality backgrounds for different categories, and maintain the personalized background style of reference images. The link to BG60k and codes will be available soon.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Reducing Shape-Radiance Ambiguity in Radiance Fields with a Closed-Form Color Estimation Method
Authors:
Qihang Fang,
Yafei Song,
Keqiang Li,
Liefeng Bo
Abstract:
Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic novel view images of a 3D scene. It includes density and color fields to model the shape and radiance of a scene, respectively. Supervised by the photometric loss in an end-to-end training manner, NeRF inherently suffers from the shape-radiance ambiguity problem, i.e., it can perfectly fit training views but does not guar…
▽ More
Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic novel view images of a 3D scene. It includes density and color fields to model the shape and radiance of a scene, respectively. Supervised by the photometric loss in an end-to-end training manner, NeRF inherently suffers from the shape-radiance ambiguity problem, i.e., it can perfectly fit training views but does not guarantee decoupling the two fields correctly. To deal with this issue, existing works have incorporated prior knowledge to provide an independent supervision signal for the density field, including total variation loss, sparsity loss, distortion loss, etc. These losses are based on general assumptions about the density field, e.g., it should be smooth, sparse, or compact, which are not adaptive to a specific scene. In this paper, we propose a more adaptive method to reduce the shape-radiance ambiguity. The key is a rendering method that is only based on the density field. Specifically, we first estimate the color field based on the density field and posed images in a closed form. Then NeRF's rendering process can proceed. We address the problems in estimating the color field, including occlusion and non-uniformly distributed views. Afterward, it is applied to regularize NeRF's density field. As our regularization is guided by photometric loss, it is more adaptive compared to existing ones. Experimental results show that our method improves the density field of NeRF both qualitatively and quantitatively. Our code is available at https://github.com/qihangGH/Closed-form-color-field.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
Authors:
Kangneng Zhou,
Daiheng Gao,
Xuan Wang,
Jie Zhang,
Peng Zhang,
Xusen Sun,
Longhao Zhang,
Shiqi Yang,
Bang Zhang,
Liefeng Bo,
Yaxing Wang,
Ming-Ming Cheng
Abstract:
3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this…
▽ More
3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this framework, first, we introduce a new SDF-based 3D generator which learns local and global representations with proposed SDF and density consistency losses. This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT). Compared to exiting distillation strategies, it mitigates visual ambiguity and avoids mismatch between texture and geometry, thereby producing stable texture and convincing geometry while editing. Additionally, we create the CatMask-HQ dataset, a large-scale high-resolution cat face annotation for exploration of model generalization and expansion. We perform expensive experiments on both the FFHQ and CatMask-HQ datasets to demonstrate the editing quality and stability of the proposed method. Our method faithfully generates a 3D-aware edited face image based on a modified mask and a text prompt. Our code and models will be publicly released.
△ Less
Submitted 3 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Authors:
Xusen Sun,
Longhao Zhang,
Hao Zhu,
Peng Zhang,
Bang Zhang,
Xinya Ji,
Kangneng Zhou,
Daiheng Gao,
Liefeng Bo,
Xun Cao
Abstract:
Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led or tied on all these metrics due to the one-to-many map** between audio and motion. In this paper, we propose VividTalk, a two-stage generic framework that sup…
▽ More
Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality. However, no model has yet led or tied on all these metrics due to the one-to-many map** between audio and motion. In this paper, we propose VividTalk, a two-stage generic framework that supports generating high-visual quality talking head videos with all the above properties. Specifically, in the first stage, we map the audio to mesh by learning two motions, including non-rigid expression motion and rigid head motion. For expression motion, both blendshape and vertex are adopted as the intermediate representation to maximize the representation ability of the model. For natural head motion, a novel learnable head pose codebook with a two-phase training mechanism is proposed. In the second stage, we proposed a dual branch motion-vae and a generator to transform the meshes into dense motion and synthesize high-quality video frame-by-frame. Extensive experiments show that the proposed VividTalk can generate high-visual quality talking head videos with lip-sync and realistic enhanced by a large margin, and outperforms previous state-of-the-art works in objective and subjective comparisons.
△ Less
Submitted 6 December, 2023; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Authors:
Li Hu,
Xin Gao,
Peng Zhang,
Ke Sun,
Bang Zhang,
Liefeng Bo
Abstract:
Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from c…
▽ More
Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from character remains a formidable problem. In this paper, we leverage the power of diffusion models and propose a novel framework tailored for character animation. To preserve consistency of intricate appearance features from reference image, we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods. Furthermore, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results.
△ Less
Submitted 13 June, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Authors:
Lingteng Qiu,
Guanying Chen,
Xiaodong Gu,
Qi Zuo,
Mutian Xu,
Yushuang Wu,
Weihao Yuan,
Zilong Dong,
Liefeng Bo,
Xiaoguang Han
Abstract:
Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to…
▽ More
Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/.
△ Less
Submitted 24 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
On optimal tracking portfolio in incomplete markets: The classical control and the reinforcement learning approaches
Authors:
Lijun Bo,
Yijie Huang,
Xiang Yu
Abstract:
This paper studies an infinite horizon optimal tracking portfolio problem using capital injection in incomplete market models. We consider the benchmark process modelled by a geometric Brownian motion with zero drift driven by some unhedgeable risk. The relaxed tracking formulation is adopted where the portfolio value compensated by the injected capital needs to outperform the benchmark process at…
▽ More
This paper studies an infinite horizon optimal tracking portfolio problem using capital injection in incomplete market models. We consider the benchmark process modelled by a geometric Brownian motion with zero drift driven by some unhedgeable risk. The relaxed tracking formulation is adopted where the portfolio value compensated by the injected capital needs to outperform the benchmark process at any time, and the goal is to minimize the cost of the discounted total capital injection. In the first part, we solve the stochastic control problem when the market model is known, for which the equivalent auxiliary control problem with reflections and the associated HJB equation with a Neumann boundary condition are studied. In the second part, the market model is assumed to be unknown, for which we consider the exploratory formulation of the control problem with entropy regularizer and develop the continuous-time q-learning algorithm for the stochastic control problem with state reflections. In an illustrative example, we show the satisfactory performance of the q-learning algorithm.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
De Finetti's Control Problem with Poisson Observations under Spectrally Positive Markov Additive Process
Authors:
Lijun Bo,
Wenyuan Wang,
Kaixin Yan
Abstract:
We study a De Finetti's optimal dividend and capital injection problem under a Markov additive model. The surplus process before dividend and capital injection is assumed to follow a spectrally positive Markov additive process (MAP). Dividend payments are made only at the jump times of an independent Poisson process and capitals are injected to avoid bankruptcy. The aim of the paper is to characte…
▽ More
We study a De Finetti's optimal dividend and capital injection problem under a Markov additive model. The surplus process before dividend and capital injection is assumed to follow a spectrally positive Markov additive process (MAP). Dividend payments are made only at the jump times of an independent Poisson process and capitals are injected to avoid bankruptcy. The aim of the paper is to characterize an optimal periodic dividend and capital injection strategy that maximizes the expected total discounted dividends subtracted by the total discounted costs of capital injection. Applying the fluctuation and excursion theory for Levy processes and the stochastic control theory, we first address an auxiliary periodic dividend and capital injection control problem with a terminal payoff under the spectrally positive Levy process. Using results obtained for this auxiliary problem and a fixed point argument for iterations induced by dynamic program, we characterize the optimal strategy of our prime control problem as a regime-modulated double-barrier periodic-continuous-reflection dividend and capital injection strategy.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
DecoderTracker: Decoder-Only Method for Multiple-Object Tracking
Authors:
Liao Pan,
Yang Feng,
Wu Di,
Liu Bo,
Zhang Xingle
Abstract:
Decoder-only models, such as GPT, have demonstrated superior performance in many areas compared to traditional encoder-decoder structure transformer models. Over the years, end-to-end models based on the traditional transformer structure, like MOTR, have achieved remarkable performance in multi-object tracking. However, the significant computational resource consumption of these models leads to le…
▽ More
Decoder-only models, such as GPT, have demonstrated superior performance in many areas compared to traditional encoder-decoder structure transformer models. Over the years, end-to-end models based on the traditional transformer structure, like MOTR, have achieved remarkable performance in multi-object tracking. However, the significant computational resource consumption of these models leads to less friendly inference speeds and training times. To address these issues, this paper attempts to construct a lightweight Decoder-only model: DecoderTracker for end-to-end multi-object tracking. Specifically, drawing on some real-time detection models, we have developed an image feature extraction network which can efficiently extract features from images to replace the encoder structure. In addition to minor innovations in the network, we analyze the potential reasons for the slow training of MOTR-like models and propose an effective training strategy to mitigate the issue of prolonged training times. On the DanceTrack dataset, without any bells and whistles, DecoderTracker's tracking performance slightly surpasses that of MOTR, with approximately twice the inference speed. Furthermore, DecoderTracker requires significantly less training time compared to MOTR.
△ Less
Submitted 23 May, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark
Authors:
Conghui Niu,
Mengyang Hu,
Lin Bo,
Xiaoli He,
Dong Yu,
Pengyuan Liu
Abstract:
Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifyi…
▽ More
Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On
Authors:
Daiheng Gao,
Xu Chen,
Xindi Zhang,
Qi Wang,
Ke Sun,
Bang Zhang,
Liefeng Bo,
Qixing Huang
Abstract:
Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Sin…
▽ More
Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Since traditional war**-based texture generation methods require a significant number of control points to be manually selected for each type of garment, which can be a time-consuming and tedious process. We propose a novel method, called Cloth2Tex, which eliminates the human burden in this process. Cloth2Tex is a self-supervised method that generates texture maps with reasonable layout and structural consistency. Another key feature of Cloth2Tex is that it can be used to support high-fidelity texture inpainting. This is done by combining Cloth2Tex with a prevailing latent diffusion model. We evaluate our approach both qualitatively and quantitatively and demonstrate that Cloth2Tex can generate high-quality texture maps and achieve the best visual effects in comparison to other methods. Project page: tomguluson92.github.io/projects/cloth2tex/
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Deep fused flow and topology features for botnet detection basing on pretrained GCN
Authors:
Meng Xiaoyuan,
Lang bo,
Yanxi Liu,
Yuhao Yan
Abstract:
Nowadays, botnets have become one of the major threats to cyber security. The characteristics of botnets are mainly reflected in bots network behavior and their intercommunication relationships. Existing botnet detection methods use flow features or topology features individually, which overlook the other type of feature. This affects model performance. In this paper, we propose a botnet detection…
▽ More
Nowadays, botnets have become one of the major threats to cyber security. The characteristics of botnets are mainly reflected in bots network behavior and their intercommunication relationships. Existing botnet detection methods use flow features or topology features individually, which overlook the other type of feature. This affects model performance. In this paper, we propose a botnet detection model which uses graph convolutional network (GCN) to deeply fuse flow features and topology features for the first time. We construct communication graphs from network traffic and represent nodes with flow features. Due to the imbalance of existing public traffic flow datasets, it is impossible to train a GCN model on these datasets. Therefore, we use a balanced public communication graph dataset to pretrain a GCN model, thereby guaranteeing its capacity for identify topology features. We then feed the communication graph with flow features into the pretrained GCN. The output from the last hidden layer is treated as the fusion of flow and topology features. Additionally, by adjusting the number of layers in the GCN network, the model can effectively detect botnets under both C2 and P2P structures. Validated on the public ISCX2014 dataset, our approach achieves a remarkable recall rate 92.90% and F1-score 92.76% for C2 botnets, alongside recall rate 94.66% and F1-score of 92.35% for P2P botnets. These results not only demonstrate the effectiveness of our method, but also outperform the performance of the currently leading detection models.
△ Less
Submitted 24 March, 2024; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Controllable Creation of Skyrmion Bags in a Ferromagnetic Nanodisk
Authors:
Lan Bo,
Rongzhi Zhao,
Chenglong Hu,
Xichao Zhang,
Xuefeng Zhang,
Masahito Mochizuki
Abstract:
Skyrmion bags are composed of an outer skyrmion and arbitrary inner skyrmions, which have recently been observed in bulk chiral magnets, but still remain elusive in magnetic films. Here, we propose a method of creating skyrmion bags in a thin-film nanodisk, which includes three steps. Firstly, the size of outer skyrmion is enlarged by a vertical magnetic field, then inner skyrmions are nucleated a…
▽ More
Skyrmion bags are composed of an outer skyrmion and arbitrary inner skyrmions, which have recently been observed in bulk chiral magnets, but still remain elusive in magnetic films. Here, we propose a method of creating skyrmion bags in a thin-film nanodisk, which includes three steps. Firstly, the size of outer skyrmion is enlarged by a vertical magnetic field, then inner skyrmions are nucleated at an off-center area by local current injection, and the system is finally reconstructed due to multiple inter-skyrmion potentials. Thus, skyrmion bags with topological charge up to forty can be created. Simulated Lorentz transmission electron microscopy images are given to facilitate the experimental demonstration. Our proposal is expected to inspire relevant experiments in magnetic films, and pave the way for potential spintronic applications based on skyrmion bags.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
A decomposition-homogenization method for Robin boundary problems on the nonnegative orthant
Authors:
Lijun Bo,
Yijie Huang,
Xiang Yu
Abstract:
This paper studies the existence and uniqueness of a classical solution to a type of Robin boundary problems on the nonnegative orthant. We propose a new decomposition-homogenization method for the Robin boundary problem based on probabilistic representations, which leads to two auxiliary Robin boundary problems admitting some simplified probabilistic representations. The auxiliary probabilistic r…
▽ More
This paper studies the existence and uniqueness of a classical solution to a type of Robin boundary problems on the nonnegative orthant. We propose a new decomposition-homogenization method for the Robin boundary problem based on probabilistic representations, which leads to two auxiliary Robin boundary problems admitting some simplified probabilistic representations. The auxiliary probabilistic representations allow us to establish the existence of a unique classical solution to the original Robin boundary problem using some stochastic flow analysis.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
DiffHand: End-to-End Hand Mesh Reconstruction via Diffusion Models
Authors:
Lijun Li,
Li'an Zhuo,
Bang Zhang,
Liefeng Bo,
Chen Chen
Abstract:
Hand mesh reconstruction from the monocular image is a challenging task due to its depth ambiguity and severe occlusion, there remains a non-unique map** between the monocular image and hand mesh. To address this, we develop DiffHand, the first diffusion-based framework that approaches hand mesh reconstruction as a denoising diffusion process. Our one-stage pipeline utilizes noise to model the u…
▽ More
Hand mesh reconstruction from the monocular image is a challenging task due to its depth ambiguity and severe occlusion, there remains a non-unique map** between the monocular image and hand mesh. To address this, we develop DiffHand, the first diffusion-based framework that approaches hand mesh reconstruction as a denoising diffusion process. Our one-stage pipeline utilizes noise to model the uncertainty distribution of the intermediate hand mesh in a forward process. We reformulate the denoising diffusion process to gradually refine noisy hand mesh and then select mesh with the highest probability of being correct based on the image itself, rather than relying on 2D joints extracted beforehand. To better model the connectivity of hand vertices, we design a novel network module called the cross-modality decoder. Extensive experiments on the popular benchmarks demonstrate that our method outperforms the state-of-the-art hand mesh reconstruction approaches by achieving 5.8mm PA-MPJPE on the Freihand test set, 4.98mm PA-MPJPE on the DexYCB test set.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Authors:
Yuan Dong,
Chuan Fang,
Liefeng Bo,
Zilong Dong,
** Tan
Abstract:
Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image. Previous work has made lots of effort to solve the scene understanding task in a bottom-up form, thus each sub-task is processed separately and few correlations are explored in this pr…
▽ More
Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image. Previous work has made lots of effort to solve the scene understanding task in a bottom-up form, thus each sub-task is processed separately and few correlations are explored in this procedure. In this paper, we propose a novel method using depth prior for holistic indoor scene understanding which recovers the objects' shapes, oriented bounding boxes and the 3D room layout simultaneously from a single panorama. In order to fully utilize the rich context information, we design a transformer-based context module to predict the representation and relationship among each component of the scene. In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes. Experiments on the synthetic and real-world datasets demonstrate that our method outperforms previous panoramic scene understanding methods in terms of both layout estimation and 3D object detection.
△ Less
Submitted 5 June, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
CAT: A Contextualized Conceptualization and Instantiation Framework for Commonsense Reasoning
Authors:
Weiqi Wang,
Tianqing Fang,
Baixuan Xu,
Chun Yi Louis Bo,
Yangqiu Song,
Lei Chen
Abstract:
Commonsense reasoning, aiming at endowing machines with a human-like ability to make situational presumptions, is extremely challenging to generalize. For someone who barely knows about "meditation," while is knowledgeable about "singing," he can still infer that "meditation makes people relaxed" from the existing knowledge that "singing makes people relaxed" by first conceptualizing "singing" as…
▽ More
Commonsense reasoning, aiming at endowing machines with a human-like ability to make situational presumptions, is extremely challenging to generalize. For someone who barely knows about "meditation," while is knowledgeable about "singing," he can still infer that "meditation makes people relaxed" from the existing knowledge that "singing makes people relaxed" by first conceptualizing "singing" as a "relaxing event" and then instantiating that event to "meditation." This process, known as conceptual induction and deduction, is fundamental to commonsense reasoning while lacking both labeled data and methodologies to enhance commonsense modeling. To fill such a research gap, we propose CAT (Contextualized ConceptuAlization and InsTantiation), a semi-supervised learning framework that integrates event conceptualization and instantiation to conceptualize commonsense knowledge bases at scale. Extensive experiments show that our framework achieves state-of-the-art performances on two conceptualization tasks, and the acquired abstract commonsense knowledge can significantly improve commonsense inference modeling. Our code, data, and fine-tuned models are publicly available at https://github.com/HKUST-KnowComp/CAT.
△ Less
Submitted 10 May, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
An extended Merton problem with relaxed benchmark tracking
Authors:
Lijun Bo,
Yijie Huang,
Xiang Yu
Abstract:
This paper studies a Merton's optimal portfolio and consumption problem in an extended formulation incorporating the tracking of a benchmark process described by a geometric Brownian motion. We consider a relaxed tracking formulation such that the wealth process compensated by a fictitious capital injection outperforms the benchmark at all times. The fund manager aims to maximize the expected util…
▽ More
This paper studies a Merton's optimal portfolio and consumption problem in an extended formulation incorporating the tracking of a benchmark process described by a geometric Brownian motion. We consider a relaxed tracking formulation such that the wealth process compensated by a fictitious capital injection outperforms the benchmark at all times. The fund manager aims to maximize the expected utility of consumption deducted by the cost of the capital injection, where the latter term can also be regarded as the expected largest shortfall of the wealth with reference to the benchmark. By introducing an auxiliary state process with reflection, we formulate and tackle an equivalent stochastic control problem by means of the dual transform and probabilistic representation, where the dual PDE can be solved explicitly. On the strength of the closed-form results, we can derive and verify the optimal feedback control for the primal control problem, allowing us to discuss some new and interesting financial implications induced by the additional risk-taking from the capital injection and the goal of tracking.
△ Less
Submitted 7 March, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Authors:
Weichuang Li,
Longhao Zhang,
Dong Wang,
Bin Zhao,
Zhigang Wang,
Mulin Chen,
Bang Zhang,
Zhongjian Wang,
Liefeng Bo,
Xuelong Li
Abstract:
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural render…
▽ More
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Evaluate Geometry of Radiance Fields with Low-frequency Color Prior
Authors:
Qihang Fang,
Yafei Song,
Keqiang Li,
Li Shen,
Huaiyu Wu,
Gang Xiong,
Liefeng Bo
Abstract:
A radiance field is an effective representation of 3D scenes, which has been widely adopted in novel-view synthesis and 3D reconstruction. It is still an open and challenging problem to evaluate the geometry, i.e., the density field, as the ground-truth is almost impossible to obtain. One alternative indirect solution is to transform the density field into a point-cloud and compute its Chamfer Dis…
▽ More
A radiance field is an effective representation of 3D scenes, which has been widely adopted in novel-view synthesis and 3D reconstruction. It is still an open and challenging problem to evaluate the geometry, i.e., the density field, as the ground-truth is almost impossible to obtain. One alternative indirect solution is to transform the density field into a point-cloud and compute its Chamfer Distance with the scanned ground-truth. However, many widely-used datasets have no point-cloud ground-truth since the scanning process along with the equipment is expensive and complicated. To this end, we propose a novel metric, named Inverse Mean Residual Color (IMRC), which can evaluate the geometry only with the observation images. Our key insight is that the better the geometry, the lower-frequency the computed color field. From this insight, given a reconstructed density field and observation images, we design a closed-form method to approximate the color field with low-frequency spherical harmonics, and compute the inverse mean residual color. Then the higher the IMRC, the better the geometry. Qualitative and quantitative experimental results verify the effectiveness of our proposed IMRC metric. We also benchmark several state-of-the-art methods using IMRC to promote future related research. Our code is available at https://github.com/qihangGH/IMRC.
△ Less
Submitted 17 January, 2024; v1 submitted 9 April, 2023;
originally announced April 2023.
-
ODDFUZZ: Discovering Java Deserialization Vulnerabilities via Structure-Aware Directed Greybox Fuzzing
Authors:
Sicong Cao,
Biao He,
Xiaobing Sun,
Yu Ouyang,
Chao Zhang,
Xiaoxue Wu,
Ting Su,
Lili Bo,
Bin Li,
Chuanlei Ma,
Jiajia Li,
Tao Wei
Abstract:
Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover…
▽ More
Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover Java deserialization vulnerabilities. First, ODDFUZZ performs lightweight static taint analysis to identify candidate gadget chains that may cause deserialization vulner-abilities. In this step, ODDFUZZ tries to locate all candidates and avoid false negatives. Then, ODDFUZZ performs directed greybox fuzzing (DGF) to explore those candidates and generate PoC testcases to mitigate false positives. Specifically, ODDFUZZ applies a structure-aware seed generation method to guarantee the validity of the testcases, and adopts a novel hybrid feedback and a step-forward strategy to guide the directed fuzzing. We implemented a prototype of ODDFUZZ and evaluated it on the popular Java deserialization repository ysoserial. Results show that, ODDFUZZ could discover 16 out of 34 known gadget chains, while two state-of-the-art baselines only identify three of them. In addition, we evaluated ODDFUZZ on real-world applications including Oracle WebLogic Server, Apache Dubbo, Sonatype Nexus, and protostuff, and found six previously unreported exploitable gadget chains with five CVEs assigned.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Improving Java Deserialization Gadget Chain Mining via Overriding-Guided Object Generation
Authors:
Sicong Cao,
Xiaobing Sun,
Xiaoxue Wu,
Lili Bo,
Bin Li,
Rongxin Wu,
Wei Liu,
Biao He,
Yu Ouyang,
Jiajia Li
Abstract:
Java (de)serialization is prone to causing security-critical vulnerabilities that attackers can invoke existing methods (gadgets) on the application's classpath to construct a gadget chain to perform malicious behaviors. Several techniques have been proposed to statically identify suspicious gadget chains and dynamically generate injection objects for fuzzing. However, due to their incomplete supp…
▽ More
Java (de)serialization is prone to causing security-critical vulnerabilities that attackers can invoke existing methods (gadgets) on the application's classpath to construct a gadget chain to perform malicious behaviors. Several techniques have been proposed to statically identify suspicious gadget chains and dynamically generate injection objects for fuzzing. However, due to their incomplete support for dynamic program features (e.g., Java runtime polymorphism) and ineffective injection object generation for fuzzing, the existing techniques are still far from satisfactory.
In this paper, we first performed an empirical study to investigate the characteristics of Java deserialization vulnerabilities based on our manually collected 86 publicly known gadget chains. The empirical results show that 1) Java deserialization gadgets are usually exploited by abusing runtime polymorphism, which enables attackers to reuse serializable overridden methods; and 2) attackers usually invoke exploitable overridden methods (gadgets) via dynamic binding to generate injection objects for gadget chain construction. Based on our empirical findings, we propose a novel gadget chain mining approach, \emph{GCMiner}, which captures both explicit and implicit method calls to identify more gadget chains, and adopts an overriding-guided object generation approach to generate valid injection objects for fuzzing. The evaluation results show that \emph{GCMiner} significantly outperforms the state-of-the-art techniques, and discovers 56 unique gadget chains that cannot be identified by the baseline approaches.
△ Less
Submitted 3 April, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction
Authors:
Jie Zhou,
Xianshuai Cao,
Wenhao Li,
Lin Bo,
Kun Zhang,
Chuan Luo,
Qian Yu
Abstract:
Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the…
▽ More
Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the complex relationships inherent among various scenarios and tasks, resulting in unsatisfactory performance. To tackle the problem, we propose a Hierarchical information extraction Network (HiNet) for multi-scenario and multi-task recommendation, which achieves hierarchical extraction based on coarse-to-fine knowledge transfer scheme. The multiple extraction layers of the hierarchical network enable the model to enhance the capability of transferring valuable information across scenarios while preserving specific features of scenarios and tasks. Furthermore, a novel scenario-aware attentive network module is proposed to model correlations between scenarios explicitly. Comprehensive experiments conducted on real-world industrial datasets from Meituan Meishi platform demonstrate that HiNet achieves a new state-of-the-art performance and significantly outperforms existing solutions. HiNet is currently fully deployed in two scenarios and has achieved 2.87% and 1.75% order quantity gain respectively.
△ Less
Submitted 13 March, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Multi-Behavior Graph Neural Networks for Recommender System
Authors:
Lianghao Xia,
Chao Huang,
Yong Xu,
Peng Dai,
Liefeng Bo
Abstract:
Recommender systems have been demonstrated to be effective to meet user's personalized interests for many online services (e.g., E-commerce and online advertising platforms). Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron…
▽ More
Recommender systems have been demonstrated to be effective to meet user's personalized interests for many online services (e.g., E-commerce and online advertising platforms). Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron and autoencoder. However, the majority of them model the user-item relationship with single type of interaction, while overlooking the diversity of user behaviors on interacting with items, which can be click, add-to-cart, tag-as-favorite and purchase. Such various types of interaction behaviors have great potential in providing rich information for understanding the user preferences. In this paper, we pay special attention on user-item relationships with the exploration of multi-typed user behaviors. Technically, we contribute a new multi-behavior graph neural network (MBRec), which specially accounts for diverse interaction patterns as well as the underlying cross-type behavior inter-dependencies. In the MBRec framework, we develop a graph-structured learning framework to perform expressive modeling of high-order connectivity in behavior-aware user-item interaction graph. After that, a mutual relation encoder is proposed to adaptively uncover complex relational structures and make aggregations across layer-specific behavior representations. Through comprehensive evaluation on real-world datasets, the advantages of our MBRec method have been validated under different experimental settings. Further analysis verifies the positive effects of incorporating the multi-behavioral context into the recommendation paradigm. Additionally, the conducted case studies offer insights into the interpretability of user multi-behavior representations.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Stochastic control problems with state-reflections arising from relaxed benchmark tracking
Authors:
Lijun Bo,
Yijie Huang,
Xiang Yu
Abstract:
This paper studies stochastic control problems motivated by optimal consumption with wealth benchmark tracking. The benchmark process is modeled by a combination of a geometric Brownian motion and a running maximum process, indicating its increasing trend in the long run. We consider a relaxed tracking formulation such that the wealth compensated by the injected capital always dominates the benchm…
▽ More
This paper studies stochastic control problems motivated by optimal consumption with wealth benchmark tracking. The benchmark process is modeled by a combination of a geometric Brownian motion and a running maximum process, indicating its increasing trend in the long run. We consider a relaxed tracking formulation such that the wealth compensated by the injected capital always dominates the benchmark process. The stochastic control problem is to maximize the expected utility of consumption deducted by the cost of the capital injection under the dynamic floor constraint. By introducing two auxiliary state processes with reflections, an equivalent auxiliary control problem is formulated and studied, which leads to the HJB equation with two Neumann boundary conditions. We establish the existence of a unique classical solution to the dual PDE using some novel probabilistic representations involving the local time of some dual processes together with a tailor-made decomposition-homogenization technique. The proof of the verification theorem on the optimal feedback control can be carried out by some stochastic flow analysis and technical estimations of the optimal control.
△ Less
Submitted 25 April, 2024; v1 submitted 16 February, 2023;
originally announced February 2023.
-
4K-NeRF: High Fidelity Neural Radiance Fields at Ultra High Resolutions
Authors:
Zhongshu Wang,
Lingzhi Li,
Zhen Shen,
Li Shen,
Liefeng Bo
Abstract:
In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel-wise manner in which rays (or pixels) are treated independently on both training and inference…
▽ More
In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel-wise manner in which rays (or pixels) are treated independently on both training and inference phases, limiting its representational ability on describing subtle details, especially when lifting to a extremely high resolution. We address the issue by exploring ray correlation to enhance high-frequency details recovery. Particularly, we use the 3D-aware encoder to model geometric information effectively in a lower resolution space and recover fine details through the 3D-aware decoder, conditioned on ray features and depths estimated by the encoder. Joint training with patch-based sampling further facilitates our method incorporating the supervision from perception oriented regularization beyond pixel-wise loss. Benefiting from the use of geometry-aware local context, our method can significantly boost rendering quality on high-frequency details compared with modern NeRF methods, and achieve the state-of-the-art visual quality on 4K ultra-high-resolution scenarios. Code Available at \url{https://github.com/frozoul/4K-NeRF}
△ Less
Submitted 3 April, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Compressing Volumetric Radiance Fields to 1 MB
Authors:
Lingzhi Li,
Zhen Shen,
Zhongshu Wang,
Li Shen,
Liefeng Bo
Abstract:
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this i…
▽ More
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF}
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
A Creative Industry Image Generation Dataset Based on Captions
Authors:
Xiang Yuejia,
Lv Chuanhao,
Liu Qingdazhu,
Yang Xiaocui,
Liu Bo,
Ju Meizhi
Abstract:
Most image generation methods are difficult to precisely control the properties of the generated images, such as structure, scale, shape, etc., which limits its large-scale application in creative industries such as conceptual design and graphic design, and so on. Using the prompt and the sketch is a practical solution for controllability. Existing datasets lack either prompt or sketch and are not…
▽ More
Most image generation methods are difficult to precisely control the properties of the generated images, such as structure, scale, shape, etc., which limits its large-scale application in creative industries such as conceptual design and graphic design, and so on. Using the prompt and the sketch is a practical solution for controllability. Existing datasets lack either prompt or sketch and are not designed for the creative industry. Here is the main contribution of our work. a) This is the first dataset that covers the 4 most important areas of creative industry domains and is labeled with prompt and sketch. b) We provide multiple reference images in the test set and fine-grained scores for each reference which are useful for measurement. c) We apply two state-of-the-art models to our dataset and then find some shortcomings, such as the prompt is more highly valued than the sketch.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Observation of tungsten impurity suppression with ECRH by an X-ray Crystal Spectrometer on EAST
Authors:
Lin Zichao,
Zhang Hongming,
Wang Fudi,
Bae Chenonho,
Fu Jia,
Shen Yongcai,
Lu Dian,
** Yifei,
He Liang,
Wang Minrui,
Lin Guangle,
Ye Kaixuan,
Wang Shouxin,
Zhao Hailin,
Lyu Bo
Abstract:
Impurity degrades tokamak plasmas confinement by causing energy loss, diluting the fuel concentration, even terminating the discharges in some extreme cases. Previously, the suppression effects of on-axis Electron Cyclotron Resonance Heating (ECRH) on the impurity accumulation have been investigated on EAST by the extreme ultraviolet (EUV) spectroscopy. However, it is difficult to quantify the cha…
▽ More
Impurity degrades tokamak plasmas confinement by causing energy loss, diluting the fuel concentration, even terminating the discharges in some extreme cases. Previously, the suppression effects of on-axis Electron Cyclotron Resonance Heating (ECRH) on the impurity accumulation have been investigated on EAST by the extreme ultraviolet (EUV) spectroscopy. However, it is difficult to quantify the changes in impurity tungsten (W) profile since the W line emissions in the EUV range could not be easily resolved. The X-ray Crystal Spectroscopy (XCS), that used to provide the ion temperature and the rotation velocity by measuring lines emissions in the soft X-ray range, also can be used to study the behavior of impurity W emissions. To begin with, in-situ absolute intensity calibration for Tangential XCS (TXCS) is conducted by analyzing the measurements of the bremsstrahlung radiation intensity. After obtaining the calibration coefficient, W44+ ion density profiles are evaluated by Abel inversion using the spectral line of W XLV (3.9095 Å). Thus, a direct observation of W44+ impurity concentration suppressed by ECRH is accomplished. The obtained W density profiles can be used to analyze the W transport by combining with the impurity transport codes in the future.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
On De Finetti's control under Poisson observations: optimality of a double barrier strategy in a Markov additive model
Authors:
Lijun Bo,
Wenyuan Wang,
Kaixin Yan
Abstract:
In this paper we consider the De Finetti's optimal dividend and capital injection problem under a Markov additive model. We assume that the surplus process before dividends and capital injections follows a spectrally positive Markov additive process. Dividend payments are made only at the jump times of an independent Poisson process. Capitals are required to be injected whenever needed to ensure a…
▽ More
In this paper we consider the De Finetti's optimal dividend and capital injection problem under a Markov additive model. We assume that the surplus process before dividends and capital injections follows a spectrally positive Markov additive process. Dividend payments are made only at the jump times of an independent Poisson process. Capitals are required to be injected whenever needed to ensure a non-negative surplus process to avoid bankruptcy. Our purpose is to characterize the optimal periodic dividend and capital injection strategy that maximizes the expected total discounted dividends subtracted by the total discounted costs of capital injection. To this end, we first consider an auxiliary optimal periodic dividend and capital injection problem with final payoff under a single spectrally positive Lévy process and conjecture that the optimal strategy is a double barrier strategy. Using the fluctuation theory and excursion-theoretical approach of the spectrally positive Lévy process and the Hamilton-Jacobi-Bellman inequality approach of the control theory, we are able to verify the conjecture that some double barrier periodic dividend and capital injection strategy solves the auxiliary problem. With the results for the auxiliary control problem and a fixed point argument for recursive iterations induced by the dynamic programming principle, the optimality of a regime-modulated double barrier periodic dividend and capital injection strategy is proved for our target control problem.
△ Less
Submitted 26 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Issues in Implementing Regression Calibration Analyses
Authors:
Lillian Boe,
Pamela A. Shaw,
Douglas Midthune,
Paul Gustafson,
Victor Kipnis,
Eunyoung Park,
Daniela Sotres-Alvarez,
Laurence Freedman
Abstract:
Regression calibration is a popular approach for correcting biases in estimated regression parameters when exposure variables are measured with error. This approach involves building a calibration equation to estimate the value of the unknown true exposure given the error-prone measurement and other confounding covariates. The estimated, or calibrated, exposure is then substituted for the true exp…
▽ More
Regression calibration is a popular approach for correcting biases in estimated regression parameters when exposure variables are measured with error. This approach involves building a calibration equation to estimate the value of the unknown true exposure given the error-prone measurement and other confounding covariates. The estimated, or calibrated, exposure is then substituted for the true exposure in the health outcome regression model. When used properly, regression calibration can greatly reduce the bias induced by exposure measurement error. Here, we first provide an overview of the statistical framework for regression calibration, specifically discussing how a special type of error, called Berkson error, arises in the estimated exposure. We then present practical issues to consider when applying regression calibration, including: (1) how to develop the calibration equation and which covariates to include; (2) valid ways to calculate standard errors (SE) of estimated regression coefficients; and (3) problems arising if one of the covariates in the calibration model is a mediator of the relationship between the exposure and outcome. Throughout the paper, we provide illustrative examples using data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) and simulations. We conclude with recommendations for how to perform regression calibration.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Practical considerations for sandwich variance estimation in two-stage regression settings
Authors:
Lillian A. Boe,
Thomas Lumley,
Pamela A. Shaw
Abstract:
We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite that it requires less computation time than…
▽ More
We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite that it requires less computation time than popular resampling approaches for variance estimation, specifically the bootstrap. This is likely due to requiring specialized statistical coding. In practice, a simple bootstrap approach with Wald confidence intervals is often applied, but this approach can yield confidence intervals that do not achieve the nominal coverage level. We first outline the steps needed to compute the sandwich variance estimator. We then develop a convenient method of computation in R for sandwich variance estimation, which leverages standard regression model outputs and existing R functions and can be applied in the case of a simple random sample or complex survey design. We use a simulation study to compare the performance of the sandwich to a resampling variance approach for both data settings. Finally, we further compare these two variance estimation approaches for data examples from the Women's Health Initiative (WHI) and Hispanic Community Health Study/Study of Latinos (HCHS/SOL).
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
A mean field game approach to equilibrium consumption under external habit formation
Authors:
Lijun Bo,
Shihua Wang,
Xiang Yu
Abstract:
This paper studies the equilibrium consumption under external habit formation in a large population of agents. We first formulate problems under two types of conventional habit formation preferences, namely linear and multiplicative external habit formation, in a mean field game framework. In a log-normal market model with the asset specialization, we characterize one mean field equilibrium in ana…
▽ More
This paper studies the equilibrium consumption under external habit formation in a large population of agents. We first formulate problems under two types of conventional habit formation preferences, namely linear and multiplicative external habit formation, in a mean field game framework. In a log-normal market model with the asset specialization, we characterize one mean field equilibrium in analytical form in each problem, allowing us to understand some quantitative properties of the equilibrium strategy and conclude some financial implications caused by consumption habits from a mean-field perspective. In each problem with n agents, we construct an approximate Nash equilibrium for the n-player game using the obtained mean field equilibrium when n is sufficiently large. The explicit convergence order in each problem can also be obtained.
△ Less
Submitted 8 March, 2024; v1 submitted 27 June, 2022;
originally announced June 2022.