Search | arXiv e-print repository

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Authors: Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

Abstract: We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed att… ▽ More We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed attributes, e.g., color, object number and layout. To address this challenge, we propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs). KEDs implicitly models the attributes of the reference images by incorporating a database. The database enriches the pseudo-word tokens by providing relevant images and captions, emphasizing shared attribute information in various aspects. In this way, KEDs recognizes the reference image from diverse perspectives. Moreover, KEDs adopts an extra stream that aligns pseudo-word tokens with textual concepts, leveraging pseudo-triplets mined from image-text pairs. The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space. Extensive experiments on widely used benchmarks, i.e. ImageNet-R, COCO object, Fashion-IQ and CIRR, show that KEDs outperforms previous zero-shot composed image retrieval methods. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2310.18049 [pdf, other]

Text Augmented Spatial-aware Zero-shot Referring Image Segmentation

Authors: Yucheng Suo, Linchao Zhu, Yi Yang

Abstract: In this paper, we study a challenging task of zero-shot referring image segmentation. This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations. Previous research takes advantage of pre-trained cross-modal models, e.g., CLIP, to align instance-level masks with referring expressions. %Yet, CLIP only considers image-text p… ▽ More In this paper, we study a challenging task of zero-shot referring image segmentation. This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations. Previous research takes advantage of pre-trained cross-modal models, e.g., CLIP, to align instance-level masks with referring expressions. %Yet, CLIP only considers image-text pair level alignment, which neglects fine-grained image region and complex sentence matching. Yet, CLIP only considers the global-level alignment of image-text pairs, neglecting fine-grained matching between the referring sentence and local image regions. To address this challenge, we introduce a Text Augmented Spatial-aware (TAS) zero-shot referring image segmentation framework that is training-free and robust to various visual encoders. TAS incorporates a mask proposal network for instance-level mask extraction, a text-augmented visual-text matching score for mining the image-text correlation, and a spatial rectifier for mask post-processing. Notably, the text-augmented visual-text matching score leverages a $P$ score and an $N$-score in addition to the typical visual-text matching score. The $P$-score is utilized to close the visual-text domain gap through a surrogate captioning model, where the score is computed between the surrogate model-generated texts and the referring expression. The $N$-score considers the fine-grained alignment of region-text pairs via negative phrase mining, encouraging the masked image to be repelled from the mined distracting phrases. Extensive experiments are conducted on various datasets, including RefCOCO, RefCOCO+, and RefCOCOg. The proposed method clearly outperforms state-of-the-art zero-shot referring image segmentation methods. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP2023

arXiv:2212.05500 [pdf, other]

doi 10.1109/TIFS.2023.3340098

Security Defense of Large Scale Networks Under False Data Injection Attacks: An Attack Detection Scheduling Approach

Authors: Yuhan Suo, Senchun Chai, Runqi Chai, Zhong-Hua Pang, Yuanqing Xia, Guo-** Liu

Abstract: In large-scale networks, communication links between nodes are easily injected with false data by adversaries. This paper proposes a novel security defense strategy from the perspective of attack detection scheduling to ensure the security of the network. Based on the proposed strategy, each sensor can directly exclude suspicious sensors from its neighboring set. First, the problem of selecting su… ▽ More In large-scale networks, communication links between nodes are easily injected with false data by adversaries. This paper proposes a novel security defense strategy from the perspective of attack detection scheduling to ensure the security of the network. Based on the proposed strategy, each sensor can directly exclude suspicious sensors from its neighboring set. First, the problem of selecting suspicious sensors is formulated as a combinatorial optimization problem, which is non-deterministic polynomial-time hard (NP-hard). To solve this problem, the original function is transformed into a submodular function. Then, we propose an attack detection scheduling algorithm based on the sequential submodular optimization theory, which incorporates \emph{expert problem} to better utilize historical information to guide the sensor selection task at the current moment. For different attack strategies, theoretical results show that the average optimization rate of the proposed algorithm has a lower bound, and the error expectation is bounded. In addition, under two kinds of insecurity conditions, the proposed algorithm can guarantee the security of the entire network from the perspective of the augmented estimation error. Finally, the effectiveness of the developed method is verified by the numerical simulation and practical experiment. △ Less

Submitted 17 December, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

Comments: 14 pages, 13 figures

arXiv:2207.03714 [pdf, other]

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Authors: Yucheng Suo, Zhedong Zheng, Xiaohan Wang, Bang Zhang, Yi Yang

Abstract: Sign language is the window for people differently-abled to express their feelings as well as emotions. However, it remains challenging for people to learn sign language in a short time. To address this real-world challenge, in this work, we study the motion transfer system, which can transfer the user photo to the sign language video of specific words. In particular, the appearance content of the… ▽ More Sign language is the window for people differently-abled to express their feelings as well as emotions. However, it remains challenging for people to learn sign language in a short time. To address this real-world challenge, in this work, we study the motion transfer system, which can transfer the user photo to the sign language video of specific words. In particular, the appearance content of the output video comes from the provided user image, while the motion of the video is extracted from the specified tutorial video. We observe two primary limitations in adopting the state-of-the-art motion transfer methods to sign language generation:(1) Existing motion transfer works ignore the prior geometrical knowledge of the human body. (2) The previous image animation methods only take image pairs as input in the training stage, which could not fully exploit the temporal information within videos. In an attempt to address the above-mentioned limitations, we propose Structure-aware Temporal Consistency Network (STCNet) to jointly optimize the prior structure of human with the temporal consistency for sign language video generation. There are two main contributions in this paper. (1) We harness a fine-grained skeleton detector to provide prior knowledge of the body keypoints. In this way, we ensure the keypoint movement in a valid range and make the model become more explainable and robust. (2) We introduce two cycle-consistency losses, i.e., short-term cycle loss and long-term cycle loss, which are conducted to assure the continuity of the generated video. We optimize the two losses and keypoint detector network in an end-to-end manner. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2107.00985 [pdf]

doi 10.1364/OE.439732

Wavelength-switchable ultra-narrow linewidth fiber laser enabled by a figure-8 compound-ring-cavity filter and a polarization-managed four-channel filter

Authors: Ting Feng, Da Wei, Wenwen Bi, Weiwei Sun, Shengbao Wu, Meili Jiang, Feng** Yan, Yu** Suo, X. Steve Yao

Abstract: We propose and demonstrate a high performance four-wavelength erbium-doped fiber laser (EDFL), enabled by a figure-8 compound-ring-cavity (F8-CRC) filter for single-longitudinal-mode (SLM) selection and a polarization-managed four-channel filter (PM-FCF) for defining four lasing wavelengths. We introduce a novel methodology utilizing signal-flow graph combined with Mason's rule to analyze a CRC fi… ▽ More We propose and demonstrate a high performance four-wavelength erbium-doped fiber laser (EDFL), enabled by a figure-8 compound-ring-cavity (F8-CRC) filter for single-longitudinal-mode (SLM) selection and a polarization-managed four-channel filter (PM-FCF) for defining four lasing wavelengths. We introduce a novel methodology utilizing signal-flow graph combined with Mason's rule to analyze a CRC filter in general and apply it to obtain the important design parameters for the F8-CRC filter used in this paper. By combining the functions of the F8-CRC filter and the PM-FCF filter assisted by the enhanced polarization hole-burning and polarization dependent loss, we achieve the EDFL with fifteen lasing states, including four single-, six dual-, four tri- and one quad-wavelength lasing operations. In particular, all the four single-wavelength operations are in stable SLM oscillation, typically with a linewidth of <600 Hz, a RIN of <=-154.58 dB/Hz@>=3 MHz and an output power fluctuation of <=+/-3.45%. In addition, all the six dual-wavelength operations have very similar performances, with the performance parameters close to those of the single-wavelength lasing operations. Finally, we achieve the wavelength spacing tuning of the dual-wavelength operations for the photonic generation of tunable microwave signals, and successfully obtain a signal at 23.10 GHz as a demonstration. △ Less

Submitted 9 September, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: 21 pages, 14 figures

Journal ref: Optics Express, 2021

arXiv:2105.14341 [pdf, ps, other]

Distribution dependent SDEs driven by fractional Brownian motions

Authors: Xiliang Fan, Xing Huang, Yongqiang Suo, Chenggui Yuan

Abstract: In this paper we study a class of distribution dependent stochastic differential equations driven by fractional Brownian motions with Hurst parameter H\in(1/2,1). We prove the well-posedness of this type equations, and then establish a general result on the Bismut formula for the Lions derivative by using Malliavin calculus. As applications, we provide the Bismut formulas of this kind for both non… ▽ More In this paper we study a class of distribution dependent stochastic differential equations driven by fractional Brownian motions with Hurst parameter H\in(1/2,1). We prove the well-posedness of this type equations, and then establish a general result on the Bismut formula for the Lions derivative by using Malliavin calculus. As applications, we provide the Bismut formulas of this kind for both non-degenerate and degenerate cases, and obtain the estimates of the Lions derivative and the total variation distance between the laws of two solutions. △ Less

Submitted 29 May, 2021; originally announced May 2021.

Comments: 42pages

MSC Class: 60H10; 60G22

arXiv:2103.01323 [pdf, ps, other]

Estimate of Heat Kernel for Euler-Maruyama Scheme of SDEs Driven by α-Stable Noise and Applications

Authors: Xing Huang, Yongqiang Suo, Chenggui Yuan

Abstract: In this paper, the discrete parameter expansion is adopted to investigate the estimation of heat kernel for Euler-Maruyama scheme of SDEs driven by α-stable noise, which implies krylov's estimate and khasminskii's estimate. As an application, the convergence rate of Euler-Maruyama scheme of a class of multidimensional SDEs with singular drift( in aid of Zvonkin's transformation) is obtained. In this paper, the discrete parameter expansion is adopted to investigate the estimation of heat kernel for Euler-Maruyama scheme of SDEs driven by α-stable noise, which implies krylov's estimate and khasminskii's estimate. As an application, the convergence rate of Euler-Maruyama scheme of a class of multidimensional SDEs with singular drift( in aid of Zvonkin's transformation) is obtained. △ Less

Submitted 30 July, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 24pages

arXiv:2007.14652 [pdf, ps, other]

TCI for SDEs with irregular drifts

Authors: Yongqiang Suo, Chenggui Yuan, Shao-Qin Zhang

Abstract: We obtain $T_2(C)$ for stochastic differential equations with Dini continuous drift and $T_1(C)$ stochastic differential equations with singular coefficients. We obtain $T_2(C)$ for stochastic differential equations with Dini continuous drift and $T_1(C)$ stochastic differential equations with singular coefficients. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2005.04631 [pdf, ps, other]

Weak convergence of Euler scheme for SDEs with singular drift

Authors: Yongqiang Suo, Chenggui Yuan, Shao-Qin Zhang

Abstract: In this paper, we investigate the weak convergence rate of Euler-Maruyama's approximation for stochastic differential equations with irregular drifts. Explicit weak convergence rates are presented if drifts satisfy an integrability condition including discontinuous functions which can be non-piecewise continuous or in fractional Sobolev space. In this paper, we investigate the weak convergence rate of Euler-Maruyama's approximation for stochastic differential equations with irregular drifts. Explicit weak convergence rates are presented if drifts satisfy an integrability condition including discontinuous functions which can be non-piecewise continuous or in fractional Sobolev space. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 12 pages

MSC Class: 60H10; 34K26; 65C30

arXiv:1910.04418 [pdf, ps, other]

CLT and MDP for McKean-Vlasov SDEs

Authors: Yongqiang Suo, Chenggui Yuan

Abstract: Under a Lipschitz condition on distribution dependent coefficients, the central limit theorem and the moderate deviation principle are obtained for solutions of McKean-Vlasov type stochastic differential equations, which extend from the corresponding results for classical stochastic differential equations to the distribution dependent setting. Under a Lipschitz condition on distribution dependent coefficients, the central limit theorem and the moderate deviation principle are obtained for solutions of McKean-Vlasov type stochastic differential equations, which extend from the corresponding results for classical stochastic differential equations to the distribution dependent setting. △ Less

Submitted 11 November, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: 18pages

arXiv:1907.02293 [pdf, ps, other]

Weak convergence of path-dependent SDEs driven by fractional Brownian motion with irregular coefficients

Authors: Yongqiang Suo, Chenggui Yuan, shaoqin Zhang

Abstract: In this paper, by using Girsanov's transformation and the property of the corresponding reference stochastic differential equations, we investigate weak existence and uniqueness of solutions and weak convergence of Euler-Maruyama scheme to stochastic functional differential equations with Hölder continuous drift driven by fractional Brownian motion with Hurst index $H\in (1/2,1)$. In this paper, by using Girsanov's transformation and the property of the corresponding reference stochastic differential equations, we investigate weak existence and uniqueness of solutions and weak convergence of Euler-Maruyama scheme to stochastic functional differential equations with Hölder continuous drift driven by fractional Brownian motion with Hurst index $H\in (1/2,1)$. △ Less

Submitted 4 July, 2019; originally announced July 2019.

Comments: 26 pages

arXiv:1903.06441 [pdf, ps, other]

Large deviations for neutral stochastic functional differential equations

Authors: Yongqiang Suo, Chenggui Yuan

Abstract: In this paper, under a one-sided Lipschitz condition on the drift coefficient we adopt (via contraction principle) a exponential approximation argument to investigate large deviations for neutral stochastic functional differential equations. In this paper, under a one-sided Lipschitz condition on the drift coefficient we adopt (via contraction principle) a exponential approximation argument to investigate large deviations for neutral stochastic functional differential equations. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: 17pages

arXiv:1806.11003 [pdf, ps, other]

Moderate deviation and central limit theorem for SDDEs with plynomial growth

Authors: Yongqiang Suo, ** Tao, Wei Zhang

Abstract: In this paper, employing the weak convergence method, based on a variational representation for expected values of positive functionals of a Brownian motion, we investigate moderate deviation %(CLT for abbreviation) for a class of stochastic differential delay equations with small noises, where the coefficients are allowed to be highly nonlinear growth with respect to the variables. Moreover, we o… ▽ More In this paper, employing the weak convergence method, based on a variational representation for expected values of positive functionals of a Brownian motion, we investigate moderate deviation %(CLT for abbreviation) for a class of stochastic differential delay equations with small noises, where the coefficients are allowed to be highly nonlinear growth with respect to the variables. Moreover, we obtain the central limit theorem for stochastic differential delay equations which the coefficients are polynomial growth with respect to the delay variables. △ Less

Submitted 28 June, 2018; originally announced June 2018.

arXiv:1509.03088 [pdf, ps, other]

On $Q$-Tensors

Authors: Zheng-Hai Huang, Yun-Yang Suo, Jie Wang

Abstract: One of the central problems in the theory of linear complementarity problems (LCPs) is to study the class of $Q$-matrices since it characterizes the solvability of LCP. Recently, the concept of $Q$-matrix has been extended to the case of tensor, called $Q$-tensor, which characterizes the solvability of the corresponding tensor complementarity problem -- a generalization of LCP; and some basic resu… ▽ More One of the central problems in the theory of linear complementarity problems (LCPs) is to study the class of $Q$-matrices since it characterizes the solvability of LCP. Recently, the concept of $Q$-matrix has been extended to the case of tensor, called $Q$-tensor, which characterizes the solvability of the corresponding tensor complementarity problem -- a generalization of LCP; and some basic results related to $Q$-tensors have been obtained in the literature. In this paper, we extend two famous results related to $Q$-matrices to the tensor space, i.e., we show that within the class of strong $P_0$-tensors or nonnegative tensors, four classes of tensors, i.e., $R_0$-tensors, $R$-tensors, $ER$-tensors and $Q$-tensors, are all equivalent. We also construct several examples to show that three famous results related to $Q$-matrices cannot be extended to the tensor space; and one of which gives a negative answer to a question raised recently by Song and Qi. △ Less

Submitted 10 September, 2015; originally announced September 2015.

arXiv:1501.07867 [pdf, other]

Multi-task Image Classification via Collaborative, Hierarchical Spike-and-Slab Priors

Authors: Hojjat Seyed Mousavi, Umamahesh Srinivas, Vishal Monga, Yuanming Suo, Minh Dao, Trac. D. Tran

Abstract: Promising results have been achieved in image classification problems by exploiting the discriminative power of sparse representations for classification (SRC). Recently, it has been shown that the use of \emph{class-specific} spike-and-slab priors in conjunction with the class-specific dictionaries from SRC is particularly effective in low training scenarios. As a logical extension, we build on t… ▽ More Promising results have been achieved in image classification problems by exploiting the discriminative power of sparse representations for classification (SRC). Recently, it has been shown that the use of \emph{class-specific} spike-and-slab priors in conjunction with the class-specific dictionaries from SRC is particularly effective in low training scenarios. As a logical extension, we build on this framework for multitask scenarios, wherein multiple representations of the same physical phenomena are available. We experimentally demonstrate the benefits of mining joint information from different camera views for multi-view face recognition. △ Less

Submitted 30 January, 2015; originally announced January 2015.

Comments: Accepted to International Conference in Image Processing (ICIP) 2014

arXiv:1406.1943 [pdf, other]

Structured Dictionary Learning for Classification

Authors: Yuanming Suo, Minh Dao, Umamahesh Srinivas, Vishal Monga, Trac D. Tran

Abstract: Sparsity driven signal processing has gained tremendous popularity in the last decade. At its core, the assumption is that the signal of interest is sparse with respect to either a fixed transformation or a signal dependent dictionary. To better capture the data characteristics, various dictionary learning methods have been proposed for both reconstruction and classification tasks. For classificat… ▽ More Sparsity driven signal processing has gained tremendous popularity in the last decade. At its core, the assumption is that the signal of interest is sparse with respect to either a fixed transformation or a signal dependent dictionary. To better capture the data characteristics, various dictionary learning methods have been proposed for both reconstruction and classification tasks. For classification particularly, most approaches proposed so far have focused on designing explicit constraints on the sparse code to improve classification accuracy while simply adopting $l_0$-norm or $l_1$-norm for sparsity regularization. Motivated by the success of structured sparsity in the area of Compressed Sensing, we propose a structured dictionary learning framework (StructDL) that incorporates the structure information on both group and task levels in the learning process. Its benefits are two-fold: (i) the label consistency between dictionary atoms and training data are implicitly enforced; and (ii) the classification performance is more robust in the cases of a small dictionary size or limited training data than other techniques. Using the subspace model, we derive the conditions for StructDL to guarantee the performance and show theoretically that StructDL is superior to $l_0$-norm or $l_1$-norm regularized dictionary learning for classification. Extensive experiments have been performed on both synthetic simulations and real world applications, such as face recognition and object classification, to demonstrate the validity of the proposed DL framework. △ Less

Submitted 7 June, 2014; originally announced June 2014.

Showing 1–16 of 16 results for author: Suo, Y