-
Boosting Box-supervised Instance Segmentation with Pseudo Depth
Authors:
Xinyi Yu,
Ling Yan,
Pengtao Jiang,
Hao Chen,
Bo Li,
Lin Yuanbo Wu,
Linlin Ou
Abstract:
The realm of Weakly Supervised Instance Segmentation (WSIS) under box supervision has garnered substantial attention, showcasing remarkable advancements in recent years. However, the limitations of box supervision become apparent in its inability to furnish effective information for distinguishing foreground from background within the specified target box. This research addresses this challenge by…
▽ More
The realm of Weakly Supervised Instance Segmentation (WSIS) under box supervision has garnered substantial attention, showcasing remarkable advancements in recent years. However, the limitations of box supervision become apparent in its inability to furnish effective information for distinguishing foreground from background within the specified target box. This research addresses this challenge by introducing pseudo-depth maps into the training process of the instance segmentation network, thereby boosting its performance by capturing depth differences between instances. These pseudo-depth maps are generated using a readily available depth predictor and are not necessary during the inference stage. To enable the network to discern depth features when predicting masks, we integrate a depth prediction layer into the mask prediction head. This innovative approach empowers the network to simultaneously predict masks and depth, enhancing its ability to capture nuanced depth-related information during the instance segmentation process. We further utilize the mask generated in the training process as supervision to distinguish the foreground from the background. When selecting the best mask for each box through the Hungarian algorithm, we use depth consistency as one calculation cost item. The proposed method achieves significant improvements on Cityscapes and COCO dataset.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
De novo protein design using geometric vector field networks
Authors:
Weian Mao,
Muzhi Zhu,
Zheng Sun,
Shuaike Shen,
Lin Yuanbo Wu,
Hao Chen,
Chunhua Shen
Abstract:
Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far,…
▽ More
Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far, only several simple encoders, such as IPA, have been proposed for this scenario, exposing the frame modeling as a bottleneck. In this work, we proffer the Vector Field Network (VFN), which enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. The vector computation operates in a manner similar to a linear layer, with each input channel receiving 3D virtual atom coordinates instead of scalar values. The multiple feature vectors output by the vector computation are then used to update the residue representations and virtual atom coordinates via attention aggregation. Remarkably, VFN also excels in modeling both frames and atoms, as the real atoms can be treated as the virtual atoms for modeling, positioning VFN as a potential universal encoder. In protein diffusion (frame modeling), VFN exhibits an impressive performance advantage over IPA, excelling in terms of both designability (67.04% vs. 53.58%) and diversity (66.54% vs. 51.98%). In inverse folding (frame and atom modeling), VFN outperforms the previous SoTA model, PiFold (54.7% vs. 51.66%), on sequence recovery rate. We also propose a method of equip** VFN with the ESM model, which significantly surpasses the previous ESM-based SoTA (62.67% vs. 55.65%), LM-Design, by a substantial margin.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
CTVIS: Consistent Training for Online Video Instance Segmentation
Authors:
Kaining Ying,
Qing Zhong,
Weian Mao,
Zhenhua Wang,
Hao Chen,
Lin Yuanbo Wu,
Yifan Liu,
Chengxiang Fan,
Yunzhi Zhuge,
Chunhua Shen
Abstract:
The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which…
▽ More
The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings. Intuitively, a possible strategy to enhance CIs is replicating the inference phase during training. To this end, we propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines in terms of building CIs. Specifically, CTVIS constructs CIs by referring inference the momentum-averaged embedding and the memory bank storage mechanisms, and adding noise to the relevant embeddings. Such an extension allows a reliable comparison between embeddings of current instances and the stable representations of historical instances, thereby conferring an advantage in modeling VIS challenges such as occlusion, re-identification, and deformation. Empirically, CTVIS outstrips the SOTA VIS models by up to +5.0 points on three VIS benchmarks, including YTVIS19 (55.1% AP), YTVIS21 (50.1% AP) and OVIS (35.5% AP). Furthermore, we find that pseudo-videos transformed from images can train robust models surpassing fully-supervised ones.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Exotic spin-dependent interactions through unparticle exchange
Authors:
L. Y. Wu,
K. Y. Zhang,
H. Yan
Abstract:
The potential discovery of unparticles could have far-reaching implications for particle physics and cosmology. For over a decade, high-energy physicists have extensively studied the effects of unparticles. In this study, we derive six types of nonrelativistic potentials between fermions induced by unparticle exchange in coordinate space. We consider all possible combinations of scalar, pseudo-sca…
▽ More
The potential discovery of unparticles could have far-reaching implications for particle physics and cosmology. For over a decade, high-energy physicists have extensively studied the effects of unparticles. In this study, we derive six types of nonrelativistic potentials between fermions induced by unparticle exchange in coordinate space. We consider all possible combinations of scalar, pseudo-scalar, vector, and axial-vector couplings to explore the full range of possibilities. Previous studies have only examined scalar-scalar (SS), pseudoscalar-pseudoscalar (PP), vector-vector (VV), and axial-axial-vector (AA) type interactions, which are all parity even. We propose SP and VA interactions to extend our understanding of unparticle physics, noting that parity conservation is not always guaranteed in modern physics. We explore the possibilities of detecting unparticles through the long-range interactions they may mediate with ordinary matter. Dedicated experiments using precision measurement methods can be employed to search for such interactions. We discuss the properties of these potentials and estimate constraints on several coupling constants based on existing experimental data. Our findings indicate that the coupling between vector unparticles and fermions is constrained by up to 9 orders of magnitude more tightly than the previous limits.
△ Less
Submitted 9 June, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Using the Sun and the Moon as Source masses and the Earth's Rotation as a Modulation to Search for Exotic Spin-Dependent Interactions at Astronomical Distances
Authors:
L. Y. Wu,
K. Y. Zhang,
M. Peng,
J. Gong,
H. Yan
Abstract:
Exotic spin-dependent interactions mediated by new light particles led to solutions to several important questions in modern physics. Such interactions involving a scalar coupling $g_S^N$ at one vertex and a pseudo-scalar coupling $g_P^n$ at the polarized neutron vertex can be induced by the exchange of spin-0 bosons, or a vector/axial-vector coupling $g_V^N$/$g_A^N$ at one vertex and an axial-vec…
▽ More
Exotic spin-dependent interactions mediated by new light particles led to solutions to several important questions in modern physics. Such interactions involving a scalar coupling $g_S^N$ at one vertex and a pseudo-scalar coupling $g_P^n$ at the polarized neutron vertex can be induced by the exchange of spin-0 bosons, or a vector/axial-vector coupling $g_V^N$/$g_A^N$ at one vertex and an axial-vector coupling $g_A^n$ at the polarized neutron vertex can be induced by the exchange of spin-1 bosons. If such new interactions exist, the Sun and the Moon can induce sidereal variations of effective fields along the direction perpendicular to the Earth's rotation axis.
We derived new experimental upper limits on such exotic spin-dependent interactions at astronomical interaction ranges by analyzing existing data from laboratory measurements on the Lorentz and CPT violation. We set the most stringent experimental limits on $g_S^Ng_P^n$ ranging from $\sim 2\times 10^{10}$m to $\sim 10^{14}$m. Previously, the best limit on $g_S^Ng_P^n$ at this range is from astrophysics. The result is the first time laboratory limits surpass the astrophysical ones on the scalar-pseudoscalar type interaction, to our best knowledge. We report new constraints on vector-axial-vector and axial-axial-vector type interaction at the range of astronomical scales. The new limits on vector-axial-vector are improved by as much as $\sim$12 orders of magnitude.
We also apply the analysis to the Hari-Dass interactions and obtain corresponding new constraints on the interactions. We discuss the possibilities of using the beam method to further search the interaction involving other particles, such as electrons, muons, etc., based on the same idea.
△ Less
Submitted 15 June, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Asymmetric Cross-Scale Alignment for Text-Based Person Search
Authors:
Zhong Ji,
Junhua Hu,
Deyin Liu,
Lin Yuanbo Wu,
Ye zhao
Abstract:
Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description. This retrieval task is characterized with both modal heterogeneity and fine-grained matching. To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform…
▽ More
Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description. This retrieval task is characterized with both modal heterogeneity and fine-grained matching. To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment. However, most existing approaches only consider the alignment confined at their individual scales, e.g., an image-sentence or a region-phrase scale. Such a strategy adopts the presumable alignment in feature extraction, while overlooking the cross-scale alignment, e.g., image-phrase. In this paper, we present a transformer-based model to extract multi-scale representations, and perform Asymmetric Cross-Scale Alignment (ACSA) to precisely align the two modalities. Specifically, ACSA consists of a global-level alignment module and an asymmetric cross-attention module, where the former aligns an image and texts on a global scale, and the latter applies the cross-attention mechanism to dynamically align the cross-modal entities in region/image-phrase scales. Extensive experiments on two benchmark datasets CUHK-PEDES and RSTPReid demonstrate the effectiveness of our approach. Codes are available at \href{url}{https://github.com/mul-hjh/ACSA}.
△ Less
Submitted 26 November, 2022;
originally announced December 2022.
-
T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency and Manifold Mix-Up
Authors:
Deyin Liu,
Lin Yuanbo Wu,
Bo Li,
Zongyuan Ge
Abstract:
In this paper, we present an end-to-end approach to generate high-resolution person images conditioned on texts only. State-of-the-art text-to-image generation models are mainly designed for center-object generation, e.g., flowers and birds. Unlike center-placed objects with similar shapes and orientation, person image generation is a more challenging task, for which we observe the followings: 1)…
▽ More
In this paper, we present an end-to-end approach to generate high-resolution person images conditioned on texts only. State-of-the-art text-to-image generation models are mainly designed for center-object generation, e.g., flowers and birds. Unlike center-placed objects with similar shapes and orientation, person image generation is a more challenging task, for which we observe the followings: 1) the generated images for the same person exhibit visual details with identity-consistency, e.g., identity-related textures/clothes/shoes across the images, and 2) those images should be discriminant for being robust against the inter-person variations caused by visual ambiguities. To address the above challenges, we develop an effective generative model to produce person images with two novel mechanisms. In particular, our first mechanism (called T-Person-GAN-ID) is to integrate the one-stream generator with an identity-preserving network such that the representations of generated data are regularized in their feature space to ensure the identity-consistency. The second mechanism (called T-Person-GAN-ID-MM) is based on the manifold mix-up to produce mixed images via the linear interpolation across generated images from different manifold identities, and we further enforce such interpolated images to be linearly classified in the feature space. This amounts to learning a linear classification boundary that can perfectly separate images from two identities. Our proposed method is empirically validated to achieve a remarkable improvement in text-to-person image generation. Our architecture is orthogonal to StackGAN++ , and focuses on person image generation, with all of them together to enrich the spectrum of GANs for the image generation task. Codes are available on \url{https://github.com/linwu-github/Person-Image-Generation.git}.
△ Less
Submitted 2 July, 2023; v1 submitted 18 August, 2022;
originally announced August 2022.
-
Longitudinal boost-invariance of charge balance function in hadron-hadron and nucleus-nucleus collisions
Authors:
Na LI Zhiming LI Yuanfang WU
Abstract:
Using Monte Carlo generators of the PYTHIA model for hadron-hadron collisions and a multi-phase transport (AMPT) model for nucleus-nucleus collisions, the longitudinal boost-invariance of charge balance function and its transverse momentum dependence are carefully studied. It shows that the charge balance function is boost-invariant in both {\it p}+{\it p} and Au+Au collisions in these two model…
▽ More
Using Monte Carlo generators of the PYTHIA model for hadron-hadron collisions and a multi-phase transport (AMPT) model for nucleus-nucleus collisions, the longitudinal boost-invariance of charge balance function and its transverse momentum dependence are carefully studied. It shows that the charge balance function is boost-invariant in both {\it p}+{\it p} and Au+Au collisions in these two models, consistent with experimental data. The balance function properly scaled by the width of the pseudorapidity window is independent of the position or the size of the window and is corresponding to the balance function of the whole pseudorapidity range. This longitudinal property of balance function also holds for particles in small transverse momentum ranges in the PYTHIA and the AMPT default models, but is violated in the AMPT with string melting. The physical origin of the results are discussed.
△ Less
Submitted 9 October, 2009;
originally announced October 2009.