Search | arXiv e-print repository

VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models

Authors: Haowen Hou, Peigen Zeng, Fei Ma, Fei Richard Yu

Abstract: Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We pro… ▽ More Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We propose a data-dependent recurrence and sandwich prompts to enhance our modeling capabilities, along with a 2D image scanning mechanism to enrich the processing of visual sequences. Extensive experiments demonstrate that VisualRWKV achieves competitive performance compared to Transformer-based models like LLaVA-1.5 on various benchmarks. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at the following GitHub repository: \href{https://github.com/howard-hou/VisualRWKV}{https://github.com/howard-hou/VisualRWKV}. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 18 pages,14 tables,6 figures

arXiv:2406.13150 [pdf]

MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

Authors: Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

Abstract: Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g.… ▽ More Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g., patients' clinical tabular, resulting in compromised reconstruction with limited diagnostic utility. Moreover, they often overlook the semantic consistency between real SPET and reconstructed images, leading to distorted semantic contexts. To tackle these problems, we propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular. Specifically, our MCAD incorporates a Multi-modal conditional Encoder (Mc-Encoder) to extract multi-modal features, followed by a conditional diffusion process to blend noise with multi-modal features and gradually map blended features to the target SPET images. To balance multi-modal inputs, the Mc-Encoder embeds Optimal Multi-modal Transport co-Attention (OMTA) to narrow the heterogeneity gap between image and tabular while capturing their interactions, providing sufficient guidance for reconstruction. In addition, to mitigate semantic distortions, we introduce the Multi-Modal Masked Text Reconstruction (M3TRec), which leverages semantic knowledge extracted from denoised PET images to restore the masked clinical tabular, thereby compelling the network to maintain accurate semantics during reconstruction. To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps. Experiments show that our method achieves the state-of-the-art performance both qualitatively and quantitatively. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Early accepted by MICCAI2024

arXiv:2406.12131 [pdf, other]

Gram2Vec: An Interpretable Document Vectorizer

Authors: Peter Zeng, Eric Sclafani, Owen Rambow

Abstract: We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a map** of authors to do… ▽ More We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a map** of authors to documents based on their Gram2Vec vectors and highlight the ability to drop or add features to view which authors make certain linguistic choices. Next, we use authorship attribution as an application to show how Gram2Vec can explain why a document is attributed to a certain author, using cosine similarities between the Gram2Vec feature vectors to calculate the distances between candidate documents and a query document. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures

arXiv:2406.04307 [pdf, other]

High-precision and low-depth eigenstate property estimation: theory and resource estimation

Authors: **zhao Sun, Pei Zeng, Tom Gur, M. S. Kim

Abstract: Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target pre… ▽ More Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target precision. However, QSP is challenging for both near-term noisy quantum computers and early fault-tolerant quantum computers (FTQC), which are limited by the number of logical qubits and circuit depth. To date, early FTQC algorithms have focused on querying the perfect time evolution $e^{-iHt}$. It remains uncertain whether early FTQC algorithms can maintain good asymptotic scaling at the gate level. Moreover, when considering qubit connectivity, the circuit depth of existing FTQC algorithms may scale suboptimally with system size. Here, we present a full-stack design of a random sampling algorithm for estimating the eigenenergy and the observable expectations on the eigenstates, which can achieve high precision and good system size scaling. The gate complexity has a logarithmic dependence on precision $ {O}(\log^{1+o(1)} (1/ε))$ for generic Hamiltonians, which cannot achieved by methods using Trottersiation to realise $e^{-iHt}$ like in QETU. For $n$-qubit lattice Hamiltonians, our method achieves near-optimal system size dependence with the gate complexity $O(n^{1+o(1)})$. When restricting the qubit connectivity to a linear nearest-neighbour architecture, The method shows advantages in circuit depth, with $O(n^{o(1)})$ for lattice models and $O(n^{2+o(1)})$ for electronic structure problems. We compare the resource requirements (CNOT gates, T gates and qubit numbers) by phase estimation, QSP, and QETU, in lattice and molecular problems. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 48 pages, 7 figures, and 4 tables

arXiv:2405.12710 [pdf, other]

Text-Video Retrieval with Global-Local Semantic Consistent Learning

Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, **gkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: https://github.com/zchoi/GLSCL. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 9 pages

arXiv:2405.11299 [pdf, other]

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving

Authors: Pai Zeng, Zhenyu Ning, Jieru Zhao, Weihao Cui, Mengwei Xu, Liwei Guo, Xusheng Chen, Yizhou Shan

Abstract: We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving… ▽ More We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving accuracy (A), and improving serving performance (P). Drawing inspiration from the CAP theorem in databases, we propose a CAP principle for LLM serving, which suggests that any optimization can improve at most two of these three goals simultaneously. Our survey categorizes existing works within this framework. We find the definition and continuity of user-perceived measurement metrics are crucial in determining whether a goal has been met, akin to prior CAP databases in the wild. We recognize the CAP principle for LLM serving as a guiding principle, rather than a formal theorem, to inform designers of the inherent and dynamic trade-offs in serving models. As serving accuracy and performance have been extensively studied, this survey focuses on works that extend serving context length and address the resulting challenges. △ Less

Submitted 26 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2403.07284 [pdf, other]

SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection

Authors: Hongcheng Zhang, Liu Liang, Pengxin Zeng, Xiao Song, Zhe Wang

Abstract: Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-per… ▽ More Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-performance fully sparse detector for end-to-end multi-modality 3D object detection. The detector, termed SparseLIF, contains three key designs, which are (1) Perspective-Aware Query Generation (PAQG) to generate high-quality 3D queries with perspective priors, (2) RoI-Aware Sampling (RIAS) to further refine prior queries by sampling RoI features from each modality, (3) Uncertainty-Aware Fusion (UAF) to precisely quantify the uncertainty of each sensor modality and adaptively conduct final multi-modality fusion, thus achieving great robustness against sensor noises. By the time of submission (2024/03/08), SparseLIF achieves state-of-the-art performance on the nuScenes dataset, ranking 1st on both validation set and test benchmark, outperforming all state-of-the-art 3D object detectors by a notable margin. The source code will be released upon acceptance. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.02451 [pdf, other]

Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground

Authors: Adil Soubki, John Murzaku, Arash Yousefi Jordehi, Peter Zeng, Magdalena Markowska, Seyed Abolghasem Mirroshandel, Owen Rambow

Abstract: Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then s… ▽ More Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM. △ Less

Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Journal ref: ACL 2024 Findings

arXiv:2402.00376 [pdf]

doi 10.1109/ICASSP48485.2024.10446360

Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction

Authors: Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen

Abstract: To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le… ▽ More To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, leading to compromised reconstruction. In this paper, we propose a 3D point-based context clusters GAN, namely PCC-GAN, to reconstruct high-quality SPET images from LPET. Specifically, inspired by the geometric representation power of points, we resort to a point-based representation to enhance the explicit expression of the image structure, thus facilitating the reconstruction with finer details. Moreover, a context clustering strategy is applied to explore the contextual relationships among points, which mitigates the ambiguities of small structures in the reconstructed images. Experiments on both clinical and phantom datasets demonstrate that our PCC-GAN outperforms the state-of-the-art reconstruction methods qualitatively and quantitatively. Code is available at https://github.com/gluucose/PCCGAN. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted by ICASSP 2024

arXiv:2312.12478 [pdf, other]

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

Authors: Kaipeng Fang, **gkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Abstract: The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text… ▽ More The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at https://github.com/fangkaipeng/ProS. △ Less

Submitted 29 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2310.20578 [pdf, other]

Fault-Tolerant Operation of Bosonic Qubits with Discrete-Variable Ancillae

Authors: Qian Xu, Pei Zeng, Daohong Xu, Liang Jiang

Abstract: Fault-tolerant quantum computation with bosonic qubits often necessitates the use of noisy discrete-variable ancillae. In this work, we establish a comprehensive and practical fault-tolerance framework for such a hybrid system and synthesize it with fault-tolerant protocols by combining bosonic quantum error correction (QEC) and advanced quantum control techniques. We introduce essential building… ▽ More Fault-tolerant quantum computation with bosonic qubits often necessitates the use of noisy discrete-variable ancillae. In this work, we establish a comprehensive and practical fault-tolerance framework for such a hybrid system and synthesize it with fault-tolerant protocols by combining bosonic quantum error correction (QEC) and advanced quantum control techniques. We introduce essential building blocks of error-corrected gadgets by leveraging ancilla-assisted bosonic operations using a generalized variant of path-independent quantum control (GPI). Using these building blocks, we construct a universal set of error-corrected gadgets that tolerate a single photon loss and an arbitrary ancilla fault for four-legged cat qubits. Notably, our construction only requires dispersive coupling between bosonic modes and ancillae, as well as beam-splitter coupling between bosonic modes, both of which have been experimentally demonstrated with strong strengths and high accuracy. Moreover, each error-corrected bosonic qubit is only comprised of a single bosonic mode and a three-level ancilla, featuring the hardware efficiency of bosonic QEC in the full fault-tolerant setting. We numerically demonstrate the feasibility of our schemes using current experimental parameters in the circuit-QED platform. Finally, we present a hardware-efficient architecture for fault-tolerant quantum computing by concatenating the four-legged cat qubits with an outer qubit code utilizing only beam-splitter couplings. Our estimates suggest that the overall noise threshold can be reached using existing hardware. These developed fault-tolerant schemes extend beyond their applicability to four-legged cat qubits and can be adapted for other rotation-symmetrical codes, offering a promising avenue toward scalable and robust quantum computation with bosonic qubits. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 23 pages, 10 figures. Comments are welcome

arXiv:2310.16428 [pdf, ps, other]

Similarity-driven and Task-driven Models for Diversity of Opinion in Crowdsourcing Markets

Authors: Chen Jason Zhang, Yunrui Liu, Pengcheng Zeng, Ting Wu, Lei Chen, Pan Hui, Fei Hao

Abstract: The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to constru… ▽ More The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to construct a sound crowd - Diversity of Opinion, Independence, Decentralization and Aggregation. However, while the other three components have already been investigated and implemented in existing crowdsourcing platforms, 'Diversity of Opinion' has not been functionally enabled yet. From a computational point of view, constructing a wise crowd necessitates quantitatively modeling and taking diversity into account. There are usually two paradigms in a crowdsourcing marketplace for worker selection: building a crowd to wait for tasks to come and selecting workers for a given task. We propose similarity-driven and task-driven models for both paradigms. Also, we develop efficient and effective algorithms for recruiting a limited number of workers with optimal diversity in both models. To validate our solutions, we conduct extensive experiments using both synthetic datasets and real data sets. △ Less

Submitted 28 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 37 pages, 11 figures

arXiv:2309.03789 [pdf, other]

Pilot-reference-free continuous-variable quantum key distribution with efficient decoy-state analysis

Authors: Anran **, Xingjian Zhang, Liang Jiang, Richard V. Penty, Pei Zeng

Abstract: Continuous-variable quantum key distribution (CV QKD) using optical coherent detectors is practically favorable due to its low implementation cost, flexibility of wavelength division multiplexing, and compatibility with standard coherent communication technologies. However, the security analysis and parameter estimation of CV QKD are complicated due to the infinite-dimensional latent Hilbert space… ▽ More Continuous-variable quantum key distribution (CV QKD) using optical coherent detectors is practically favorable due to its low implementation cost, flexibility of wavelength division multiplexing, and compatibility with standard coherent communication technologies. However, the security analysis and parameter estimation of CV QKD are complicated due to the infinite-dimensional latent Hilbert space. Also, the transmission of strong reference pulses undermines the security and complicates the experiments. In this work, we tackle these two problems by presenting a time-bin-encoding CV protocol with a simple phase-error-based security analysis valid under general coherent attacks. With the key encoded into the relative intensity between two optical modes, the need for global references is removed. Furthermore, phase randomization can be introduced to decouple the security analysis of different photon-number components. We can hence tag the photon number for each round, effectively estimate the associated privacy using a carefully designed coherent-detection method, and independently extract encryption keys from each component. Simulations manifest that the protocol using multi-photon components increases the key rate by two orders of magnitude compared to the one using only the single-photon component. Meanwhile, the protocol with four-intensity decoy analysis is sufficient to yield tight parameter estimation with a short-distance key-rate performance comparable to the best Bennett-Brassard-1984 implementation. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 27 pages, 6 figures, 6 tables. Comments are welcomed

arXiv:2308.05365 [pdf]

TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms

Authors: Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen

Abstract: To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminishe… ▽ More To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from prorogating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.04802 [pdf, other]

Generalized Unbiased Scene Graph Generation

Authors: Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen

Abstract: Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-obj… ▽ More Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-object combinations). This concept-level imbalance poses a more pervasive and challenging issue compared to the predicate-level imbalance since subject-object pairs are inherently complex in combinations. Hence, we introduce a novel research problem: Generalized Unbiased Scene Graph Generation (G-USGG), which takes into account both predicate-level and concept-level imbalance. To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts. MCL first quantifies the concept-level imbalance across predicates in terms of different amounts of concepts, representing as multiple concept-prototypes within the same class. It then effectively learns concept-prototypes by applying the Concept Regularization (CR) technique. Furthermore, to achieve balanced learning over different concepts, we introduce the Balanced Prototypical Memory (BPM), which guides SGG models to generate balanced representations for concept-prototypes. Extensive experiments demonstrate the remarkable efficacy of our model-agnostic strategy in enhancing the performance of benchmark models on both VG-SGG and OI-SGG datasets, leading to new state-of-the-art achievements in two key aspects: predicate-level unbiased relation recognition and concept-level compositional generability. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2306.17496 [pdf, other]

Performance Analysis for Polar Codes under Successive Cancellation List Decoding with Fixed List Size

Authors: **nan Piao, Dong Li, Xueting Yu, Zhibo Li, Ming Yang, **di Liu, Peng Zeng

Abstract: In this paper, we first indicate that the block error event of polar codes under successive cancellation list (SCL) decoding is composed of path loss (PL) error event and path selection (PS) error event, where the PL error event is that correct codeword is lost during the SCL decoding and the PS error event is that correct codeword is reserved in the decoded list but not selected as the decoded co… ▽ More In this paper, we first indicate that the block error event of polar codes under successive cancellation list (SCL) decoding is composed of path loss (PL) error event and path selection (PS) error event, where the PL error event is that correct codeword is lost during the SCL decoding and the PS error event is that correct codeword is reserved in the decoded list but not selected as the decoded codeword. Then, we simplify the PL error event by assuming the all-zero codeword is transmitted and derive the probability lower bound via the joint probability density of the log-likelihood ratios of information bits. Meanwhile, the union bound calculated by the minimum weight distribution is used to evaluate the probability of the PS error event. With the performance analysis, we design a greedy bit-swap** (BS) algorithm to construct polar codes by gradually swap** information bit and frozen bit to reduce the performance lower bound of SCL decoding. The simulation results show that the BLER performance of SCL decoding is close to the lower bound in the medium to high signal-to-noise ratio region and we can optimize the lower bound to improve the BLER performance of SCL decoding by the BS algorithm. △ Less

Submitted 6 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2305.12743 [pdf, other]

doi 10.1109/TPAMI.2023.3332967

Semantic Invariant Multi-view Clustering with Fully Incomplete Information

Authors: Pengxin Zeng, Mouxing Yang, Yiding Lu, Changqing Zhang, Peng Hu, Xi Peng

Abstract: Robust multi-view learning with incomplete information has received significant attention due to issues such as incomplete correspondences and incomplete instances that commonly affect real-world multi-view applications. Existing approaches heavily rely on paired samples to realign or impute defective ones, but such preconditions cannot always be satisfied in practice due to the complexity of data… ▽ More Robust multi-view learning with incomplete information has received significant attention due to issues such as incomplete correspondences and incomplete instances that commonly affect real-world multi-view applications. Existing approaches heavily rely on paired samples to realign or impute defective ones, but such preconditions cannot always be satisfied in practice due to the complexity of data collection and transmission. To address this problem, we present a novel framework called SeMantic Invariance LEarning (SMILE) for multi-view clustering with incomplete information that does not require any paired samples. To be specific, we discover the existence of invariant semantic distribution across different views, which enables SMILE to alleviate the cross-view discrepancy to learn consensus semantics without requiring any paired samples. The resulting consensus semantics remain unaffected by cross-view distribution shifts, making them useful for realigning/imputing defective instances and forming clusters. We demonstrate the effectiveness of SMILE through extensive comparison experiments with 13 state-of-the-art baselines on five benchmarks. Our approach improves the clustering accuracy of NoisyMNIST from 19.3\%/23.2\% to 82.7\%/69.0\% when the correspondences/instances are fully incomplete. The code could be accessed from https://pengxi.me. △ Less

Submitted 21 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.07481 [pdf, other]

Extended ADMM for general penalized quantile regression with linear constraints in big data

Authors: Yongxin Liu, Peng Zeng

Abstract: Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Differe… ▽ More Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Different formulations to handle the linear constraints and general penalty are explored and compared. The most efficient one has explicit expressions for each parameter and avoids nested-loop iterations in some existing algorithms. Additionally, parallel ADMM algorithm for big data is also developed when data are stored in a distributed fashion. The stop** criterion and convergence of the algorithm are established. Extensive numerical experiments and a real data example demonstrate the computational efficiency of the proposed algorithms. The details of theoretical proofs and different algorithm variations are presented in Appendix. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2304.08915 [pdf, other]

Differentiable Genetic Programming for High-dimensional Symbolic Regression

Authors: Peng Zeng, Xiaotian Song, Andrew Lensen, Yuwei Ou, Yanan Sun, Mengjie Zhang, Jiancheng Lv

Abstract: Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-… ▽ More Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer esca** from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.08339 [pdf, other]

Development of Nb-GaAs based superconductor semiconductor hybrid platform by combining in-situ dc magnetron sputtering and molecular beam epitaxy

Authors: Clemens Todt, Sjoerd Telkamp, Filip Krizek, Christian Reichl, Mihai Gabureac, Rüdiger Schott, Erik Cheah, Peng Zeng, Thomas Weber, Arnold Müller, Christof Vockenhuber, Mohsen Bahrami Panah, Werner Wegscheider

Abstract: We present Nb thin films deposited in-situ on GaAs by combining molecular beam epitaxy and magnetron sputtering within an ultra-high vacuum cluster. Nb films deposited at varying power, and a reference film from a commercial system, are compared. The results show clear variation between the in-situ and ex-situ deposition which we relate to differences in magnetron sputtering conditions and chamber… ▽ More We present Nb thin films deposited in-situ on GaAs by combining molecular beam epitaxy and magnetron sputtering within an ultra-high vacuum cluster. Nb films deposited at varying power, and a reference film from a commercial system, are compared. The results show clear variation between the in-situ and ex-situ deposition which we relate to differences in magnetron sputtering conditions and chamber geometry. The Nb films have critical temperatures of around $9 \textrm{K}$. and critical perpendicular magnetic fields of up to $B_{c2} = 1.4 \textrm{T}$ at $4.2 \textrm{K}$. From STEM images of the GaAs-Nb interface we find the formation of an amorphous interlayer between the GaAs and the Nb for both the ex-situ and in-situ deposited material. △ Less

Submitted 18 April, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 12 pages paper, 9 pages supplementary, 6 figures paper, 7 figures supplementary

arXiv:2303.13019 [pdf, other]

Construction Methods Based Minimum Weight Distribution for Polar Codes with Successive Cancellation List Decoding

Authors: **nan Piao, Dong Li, **di Liu, Xueting Yu, Zhibo Li, Ming Yang, Peng Zeng

Abstract: In this paper, we focus on the construction methods based MWD for polar codes to improve the performance with successive cancellation list (SCL) decoding. We first propose an ordered and nested reliability sequence, namely MWD sequence, to improve the ML performance of polar codes and apply fast construction without the original channel information. In the MWD sequence, the synthetic channels are… ▽ More In this paper, we focus on the construction methods based MWD for polar codes to improve the performance with successive cancellation list (SCL) decoding. We first propose an ordered and nested reliability sequence, namely MWD sequence, to improve the ML performance of polar codes and apply fast construction without the original channel information. In the MWD sequence, the synthetic channels are sorted by the partial MWD which is used to evaluate the influence of information bit on MWD and we prove the MWD sequence is the optimum sequence under ML decoding. Then, since the list size of SCL decoding is limited, we introduce an entropy constraint to establish a relationship between the list size and the ML performance and propose a heuristic and greedy construction method named bit grou** reorder based MWD (BGR-MWD) algorithm. In the algorithm, we divide the synthetic channels into groups by the partial MWD and greedily reorder the synthetic channels in some groups until the entropy constraint is satisfied. The simulation results show the MWD sequence is suitable for constructing polar codes with short code length. Meanwhile, the BGR-MWD algorithm has superior performance over the traditional construction methods for long code length. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.04296 [pdf, ps, other]

Event-Triggered Active Disturbance Rejection Control for Uncertain Random Nonlinear Systems

Authors: Ze-Hao Wu, Feiqi Deng, Pengyu Zeng, Hua-Cheng Zhou, Hongyi Li

Abstract: In this paper, event-triggered active disturbance rejection control (ADRC) is first addressed for a class of uncertain random nonlinear systems driven by bounded noise and colored noise. The event-triggered extended state observer (ESO) and ADRC controller are designed, where two respective event-triggering mechanisms with a fixed positive lower bound for the inter-execution times are proposed. Th… ▽ More In this paper, event-triggered active disturbance rejection control (ADRC) is first addressed for a class of uncertain random nonlinear systems driven by bounded noise and colored noise. The event-triggered extended state observer (ESO) and ADRC controller are designed, where two respective event-triggering mechanisms with a fixed positive lower bound for the inter-execution times are proposed. The random total disturbance representing the coupling of nonlinear unmodeled dynamics, external deterministic disturbance, bounded noise, and colored noise is estimated in real time by the event-triggered ESO and compensated in the event-triggered feedback loop. Both the mean square and almost surely practical convergence of the closed-loop systems is shown with rigorous theoretical analysis. Finally, some numerical simulations are implemented to validate the proposed control scheme and theoretical results. △ Less

Submitted 8 May, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.02395

arXiv:2301.06795 [pdf, other]

doi 10.1103/PhysRevMaterials.7.073403

Control over epitaxy and the role of the InAs/Al interface in hybrid two-dimensional electron gas systems

Authors: E. Cheah, D. Z. Haxell, R. Schott, P. Zeng, E. Paysen, S. C. ten Kate, M. Coraiola, M. Landstetter, A. B. Zadeh, A. Trampert, M. Sousa, H. Riel, F. Nichele, W. Wegscheider, F. Krizek

Abstract: In-situ synthesised semiconductor/superconductor hybrid structures became an important material platform in condensed matter physics. Their development enabled a plethora of novel quantum transport experiments with focus on Andreev and Majorana physics. The combination of InAs and Al has become the workhorse material and has been successfully implemented in the form of one-dimensional structures a… ▽ More In-situ synthesised semiconductor/superconductor hybrid structures became an important material platform in condensed matter physics. Their development enabled a plethora of novel quantum transport experiments with focus on Andreev and Majorana physics. The combination of InAs and Al has become the workhorse material and has been successfully implemented in the form of one-dimensional structures and two-dimensional electron gases. In contrast to the well-developed semiconductor parts of the hybrid materials, the direct effect of the crystal nanotexture of Al films on the electron transport still remains unclear. This is mainly due to the complex epitaxial relation between Al and the semiconductor. We present a study of Al films on shallow InAs two-dimensional electron gas systems grown by molecular beam epitaxy, with focus on control of the Al crystal structure. We identify the dominant grain types present in our Al films and show that the formation of grain boundaries can be significantly reduced by controlled roughening of the epitaxial interface. Finally, we demonstrate that the implemented roughening does not negatively impact either the electron mobility of the two-dimensional electron gas or the basic superconducting properties of the proximitized system. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 12 pages, 7 figures and supplementary material

Journal ref: Physical Review Materials 7, 073403 (2023)

arXiv:2212.04566 [pdf, other]

Simple and high-precision Hamiltonian simulation by compensating Trotter error with linear combination of unitary operations

Authors: Pei Zeng, **zhao Sun, Liang Jiang, Qi Zhao

Abstract: Trotter and linear-combination-of-unitary (LCU) are two popular Hamiltonian simulation methods. We propose Hamiltonian simulation algorithms using LCU to compensate Trotter error, which enjoy both of their advantages. By adding few gates after the Kth-order Trotter, we realize a better time scaling than 2Kth-order Trotter. Our first algorithm exponentially improves the accuracy scaling of the Kth-… ▽ More Trotter and linear-combination-of-unitary (LCU) are two popular Hamiltonian simulation methods. We propose Hamiltonian simulation algorithms using LCU to compensate Trotter error, which enjoy both of their advantages. By adding few gates after the Kth-order Trotter, we realize a better time scaling than 2Kth-order Trotter. Our first algorithm exponentially improves the accuracy scaling of the Kth-order Trotter formula. In the second algorithm, we consider the detailed structure of Hamiltonians and construct LCU for Trotter errors with commutator scaling. Consequently, for lattice Hamiltonians, the algorithm enjoys almost linear system-size dependence and quadratically improves the accuracy of the Kth-order Trotter. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 74 pages, 15 figures. Comments are welcome

arXiv:2212.01209 [pdf, other]

FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting

Authors: Maowei Jiang, Pengyu Zeng, Kai Wang, Huan Liu, Wenbo Chen, Haoran Liu

Abstract: Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real… ▽ More Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real world datasets. At present, the mainstream frequency information extraction methods are Fourier transform(FT) based. However, use of FT is problematic due to Gibbs phenomenon. If the values on both sides of sequences differ significantly, oscillatory approximations are observed around both sides and high frequency noise will be introduced. Therefore We propose a novel frequency enhanced channel attention that adaptively modelling frequency interdependencies between channels based on Discrete Cosine Transform which would intrinsically avoid high frequency noise caused by problematic periodity during Fourier Transform, which is defined as Gibbs Phenomenon. We show that this network generalize extremely effectively across six real-world datasets and achieve state-of-the-art performance, we further demonstrate that frequency enhanced channel attention mechanism module can be flexibly applied to different networks. This module can improve the prediction ability of existing mainstream networks, which reduces 35.99% MSE on LSTM, 10.01% on Reformer, 8.71% on Informer, 8.29% on Autoformer, 8.06% on Transformer, etc., at a slight computational cost ,with just a few line of code. Our codes and data are available at https://github.com/Zero-coder/FECAM. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 11pages.10 figures,conference. arXiv admin note: text overlap with arXiv:2205.14415 by other authors

arXiv:2211.14017 [pdf, other]

Learnable Blur Kernel for Single-Image Defocus Deblurring in the Wild

Authors: Jucai Zhai, Pengcheng Zeng, Chihao Ma, Yong Zhao, Jie Chen

Abstract: Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue… ▽ More Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue, we propose a novel defocus deblurring method that uses the guidance of the defocus map to implement image deblurring. The proposed method consists of a learnable blur kernel to estimate the defocus map, which is an unsupervised method, and a single-image defocus deblurring generative adversarial network (DefocusGAN) for the first time. The proposed network can learn the deblurring of different regions and recover realistic details. We propose a defocus adversarial loss to guide this training process. Competitive experimental results confirm that with a learnable blur kernel, the generated defocus map can achieve results comparable to supervised methods. In the single-image defocus deblurring task, the proposed method achieves state-of-the-art results, especially significant improvements in perceptual quality, where PSNR reaches 25.56 dB and LPIPS reaches 0.111. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 9 pages, 7 figures

arXiv:2211.10541 [pdf, ps, other]

Phase transition and higher order analysis of $L_q$ regularization under dependence

Authors: Hanwen Huang, Peng Zeng, Qinglong Yang

Abstract: We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the… ▽ More We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the formulation $\hat{\mbox{$β$}}(λ,q)=\text{argmin}_{\mbox{$β$}\in{\bf R}^p}\frac{1}{2}\|{\bf y}-{\bf X}{\mbox{$β$}}\|^2_2+λ\|{\mbox{$β$}}\|_q^q$, where $\|\cdot\|_q$ $(0\le q\le 2)$ denotes the $L_q$-norm. In the setting $p,n,k\rightarrow\infty$ with fixed $k/p=ε$ and $n/p=δ$, we derive the asymptotic risk of $\hat{\mbox{$β$}}(λ,q)$ for arbitrary covariance matrix ${\mbox{$Σ$}}$ which generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim}N(0,1)$. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\mbox{$Σ$}}$ in the cases $0\le q< 1$ and $1< q\le 2$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\mbox{$Σ$}}$ on the performance of LQLS in the cases $0\le q< 1$ and $1<q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results. △ Less

Submitted 1 December, 2022; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: 35 pages, 11 figures

arXiv:2211.09469 [pdf, other]

Visual Commonsense-aware Representation Network for Video Captioning

Authors: Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, ** Qian, Heng Tao Shen

Abstract: Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video its… ▽ More Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video itself without considering the intrinsic visual commonsense knowledge that existed in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple yet effective method, called Visual Commonsense-aware Representation Network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in the video domain, which is utilized in our proposed Visual Concept Selection (VCS) to obtain a video-related concept feature. Next, a Conceptual Integration Generation (CIG) is proposed to enhance the caption generation. Extensive experiments on three publicly video captioning benchmarks: MSVD, MSR-VTT, and VATEX, demonstrate that our method reaches state-of-the-art performance, indicating the effectiveness of our method. In addition, our approach is integrated into the existing method of video question answering and improves this performance, further showing the generalization of our method. Source code has been released at https://github.com/zchoi/VCRN. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.09460 [pdf, other]

Progressive Tree-Structured Prototype Network for End-to-End Image Captioning

Authors: Pengpeng Zeng, **kuan Zhu, **gkuan Song, Lianli Gao

Abstract: Studies of image captioning are shifting towards a trend of a fully end-to-end paradigm by leveraging powerful visual pre-trained models and transformer-based generation architecture for more flexible model training and faster inference speed. State-of-the-art approaches simply extract isolated concepts or attributes to assist description generation. However, such approaches do not consider the hi… ▽ More Studies of image captioning are shifting towards a trend of a fully end-to-end paradigm by leveraging powerful visual pre-trained models and transformer-based generation architecture for more flexible model training and faster inference speed. State-of-the-art approaches simply extract isolated concepts or attributes to assist description generation. However, such approaches do not consider the hierarchical semantic structure in the textual domain, which leads to an unpredictable map** between visual representations and concept words. To this end, we propose a novel Progressive Tree-Structured prototype Network (dubbed PTSN), which is the first attempt to narrow down the scope of prediction words with appropriate semantics by modeling the hierarchical textual semantics. Specifically, we design a novel embedding method called tree-structured prototype, producing a set of hierarchical representative embeddings which capture the hierarchical semantic structure in textual space. To utilize such tree-structured prototypes into visual cognition, we also propose a progressive aggregation module to exploit semantic relationships within the image and prototypes. By applying our PTSN to the end-to-end captioning framework, extensive experiments conducted on MSCOCO dataset show that our method achieves a new state-of-the-art performance with 144.2% (single model) and 146.5% (ensemble of 4 models) CIDEr scores on `Karpathy' split and 141.4% (c5) and 143.9% (c40) CIDEr scores on the official online test server. Trained models and source code have been released at: https://github.com/NovaMind-Z/PTSN. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2209.12396 [pdf, other]

Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric

Authors: Pengxin Zeng, Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Xi Peng

Abstract: Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo**… ▽ More Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo** a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}. △ Less

Submitted 20 April, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2208.05649 [pdf, other]

doi 10.1103/PhysRevLett.130.030801

Experimental mode-pairing measurement-device-independent quantum key distribution without global phase-locking

Authors: Hao-Tao Zhu, Yizhi Huang, Hui Liu, Pei Zeng, Mi Zou, Yunqi Dai, Shibiao Tang, Hao Li, Lixing You, Zhen Wang, Yu-Ao Chen, Xiongfeng Ma, Teng-Yun Chen, Jian-Wei Pan

Abstract: In the past two decades, quantum key distribution networks based on telecom fibers have been implemented on metropolitan and intercity scales. One of the bottlenecks lies in the exponential decay of the key rate with respect to the transmission distance. Recently proposed schemes mainly focus on achieving longer distances by creating a long-arm single-photon interferometer over two communication p… ▽ More In the past two decades, quantum key distribution networks based on telecom fibers have been implemented on metropolitan and intercity scales. One of the bottlenecks lies in the exponential decay of the key rate with respect to the transmission distance. Recently proposed schemes mainly focus on achieving longer distances by creating a long-arm single-photon interferometer over two communication parties. Despite their advantageous performance over long communication distances, the requirement of phase-locking between two independent lasers is technically challenging. By adopting the recently-proposed mode-pairing idea, we realize high-performance quantum key distribution without global phase-locking. Using two independent off-the-shelf lasers, we show a quadratic key-rate improvement over the conventional measurement-device-independent schemes in the regime of metropolitan and intercity distances. For longer distances, we also boost the key rate performance by three orders of magnitude via 304 km commercial fiber and 407 km ultra-low-loss fiber. We expect this ready-to-implement high-performance scheme to be widely used in future intercity quantum communication networks. △ Less

Submitted 9 February, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

Comments: 19 pages, 9 figures, 7 tables

Journal ref: Phys. Rev. Lett. 130, 030801 (2023)

arXiv:2207.07913 [pdf, other]

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

Authors: Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen

Abstract: The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of ta… ▽ More The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method. △ Less

Submitted 16 July, 2022; originally announced July 2022.

arXiv:2207.04602 [pdf, other]

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

Authors: Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, **gkuan Song

Abstract: The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classif… ▽ More The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in kee** with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2204.02597

arXiv:2206.11653 [pdf, other]

Learning To Generate Scene Graph from Head to Tail

Authors: Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, **gkuan Song, Lianli Gao

Abstract: Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates lead… ▽ More Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates leads to semantic deviation. Based on this, we propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT), containing Curriculum Re-weight Mechanism (CRM) and Semantic Context Module (SCM). CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones. SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations. Experiments show that SGG-HT significantly alleviates the biased problem and chieves state-of-the-art performances on Visual Genome. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.09302 [pdf, other]

Delay-aware Multiple Access Design for Intelligent Reflecting Surface Aided Uplink Transmission

Authors: Piao Zeng, Guangji Chen, Qingqing Wu, Deli Qiao, Abbas Jamalipour

Abstract: In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are co… ▽ More In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are considered, where the device's rearranged order, time and power allocation, as well as dynamic IRS beamforming patterns over time are jointly optimized to minimize the sum transmission delay. To shed light on the superiority of the proposed IRS-aided hybrid MA (I-HMA) protocol over conventional protocols, the conditions under which I-HMA outperforms I-TDMA and I-NOMA are revealed by characterizing their corresponding optimal solution. Then, a computationally efficient algorithm is proposed to obtain the high-quality solution to the corresponding optimization problems. Simulation results validate our theoretical findings, demonstrate the superiority of the proposed design, and draw some useful insights. Specifically, it is found that the proposed protocol can significantly reduce the sum transmission delay by combining the additional gain of dynamic IRS beamforming with the high spectral efficiency of NOMA, which thus reveals that integrating IRS into the proposed HMA protocol is an effective solution for delay-aware optimization. Furthermore, it reveals that the proposed design reduces the time consumption not only from the system-centric view, but also from the device-centric view. △ Less

Submitted 26 June, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: Submitted to TWC

arXiv:2206.01923 [pdf, other]

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

Authors: **gkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Abstract: Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features a… ▽ More Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features are naturally spatial and channel-wise. Also, visual attention models are usually conducted on pixel-level, which may cause region discontinuous problems. In this paper, we propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task. Specifically, instead of attending to pixels, we first take advantage of the object proposal networks to generate a set of object candidates and extract their associated conv features. Then, we utilize the question to guide channel attention and spatial attention calculation based on the con-layer feature map. Finally, the attended visual features and the question are combined to infer the answer. We assess the performance of our proposed CVA on three public image QA datasets, including COCO-QA, VQA and Visual7W. Experimental results show that our proposed method significantly outperforms the state-of-the-arts. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2206.01017 [pdf, other]

Structured Two-stream Attention Network for Video Question Answering

Authors: Lianli Gao, Pengpeng Zeng, **gkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen

Abstract: To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures o… ▽ More To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich long-range temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset \textit{TGIF-QA} show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.09523 [pdf, other]

scICML: Information-theoretic Co-clustering-based Multi-view Learning for the Integrative Analysis of Single-cell Multi-omics data

Authors: Pengcheng Zeng, Zhixiang Lin

Abstract: Modern high-throughput sequencing technologies have enabled us to profile multiple molecular modalities from the same single cell, providing unprecedented opportunities to assay celluar heterogeneity from multiple biological layers. However, the datasets generated from these technologies tend to have high level of noise and are highly sparse, bringing challenges to data analysis. In this paper, we… ▽ More Modern high-throughput sequencing technologies have enabled us to profile multiple molecular modalities from the same single cell, providing unprecedented opportunities to assay celluar heterogeneity from multiple biological layers. However, the datasets generated from these technologies tend to have high level of noise and are highly sparse, bringing challenges to data analysis. In this paper, we develop a novel information-theoretic co-clustering-based multi-view learning (scICML) method for multi-omics single-cell data integration. scICML utilizes co-clusterings to aggregate similar features for each view of data and uncover the common clustering pattern for cells. In addition, scICML automatically matches the clusters of the linked features across different data types for considering the biological dependency structure across different types of genomic features. Our experiments on four real-world datasets demonstrate that scICML improves the overall clustering performance and provides biological insights into the data analysis of peripheral blood mononuclear cells. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 11 pages; 1 figure

arXiv:2205.09307 [pdf, other]

Support-set based Multi-modal Representation Enhancement for Video Captioning

Authors: Xiaoya Chen, **gkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Abstract: Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes. Existing methods follow a typical one-to-one map**, which concentrates on a limited sample space while ignoring the intrinsic semantic associations between samples, resulting in rigid and uninformative expressions. To address this issue, we propose a novel and flexible framework, namely Support-se… ▽ More Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes. Existing methods follow a typical one-to-one map**, which concentrates on a limited sample space while ignoring the intrinsic semantic associations between samples, resulting in rigid and uninformative expressions. To address this issue, we propose a novel and flexible framework, namely Support-set based Multi-modal Representation Enhancement (SMRE) model, to mine rich information in a semantic subspace shared between samples. Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements. During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way. Extensive experiments on MSVD and MSR-VTT datasets demonstrate that our SMRE achieves state-of-the-art performance. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2203.10320 [pdf, other]

doi 10.1364/PRJ.473970

Scalable fast benchmarking for individual quantum gates with local twirling

Authors: Yihong Zhang, Wenjun Yu, Pei Zeng, Guoding Liu, Xiongfeng Ma

Abstract: With the development of controllable quantum systems, fast and practical characterization for multi-qubit gates is essential for building high-fidelity quantum computing devices. The usual way to fulfill this requirement via randomized benchmarking asks for the complicated implementation of numerous multi-qubit twirling gates. How to efficiently and reliably estimate the fidelity of a quantum proc… ▽ More With the development of controllable quantum systems, fast and practical characterization for multi-qubit gates is essential for building high-fidelity quantum computing devices. The usual way to fulfill this requirement via randomized benchmarking asks for the complicated implementation of numerous multi-qubit twirling gates. How to efficiently and reliably estimate the fidelity of a quantum process remains an open problem. In this work, we propose a character-cycle benchmarking protocol and a character-average benchmarking protocol only using local twirling gates to estimate the process fidelity of an individual multi-qubit operation. Our protocols can characterize a large class of quantum gates including and beyond the Clifford group via the local gauge transformation, which forms a universal gate set for quantum computing. We numerically demonstrate our protocols for a non-Clifford gate -- controlled-$(TX)$ and a Clifford gate -- five-qubit quantum error-correcting encoding circuit. The numerical results show that our protocols can efficiently and reliably characterize the gate process fidelities. Compared with the cross-entropy benchmarking, the simulation results show that the character-average benchmarking achieves three orders of magnitude improvements in terms of sampling complexity. △ Less

Submitted 9 February, 2023; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: 30 pages, 7 figures

Journal ref: Photonics Research Vol. 11, Issue 1, pp. 81-99 (2023)

arXiv:2202.05992 [pdf]

doi 10.1002/lpor.202200219

Soliton Microcombs in Integrated Chalcogenide Microresonators

Authors: Di Xia, Zelin Yang, **yang Zeng, Bin Zhang, Jiayue Wu, Zifu Wang, Jiaxin Zhao, Mingqi Gao, Yufei Huang, Jianteng Huang, Liyang Luo, Dong Liu, Shuixian Yang, Hairun Guo, Zhaohui Li

Abstract: Photonic integrated microcombs have enabled advanced applications in optical communication, microwave synthesis, and optical metrology, which in nature unveil an optical dissipative soliton pattern under cavity-enhanced nonlinear processes. The most decisive factor of microcombs lies in the photonic material platforms, where materials with high nonlinearity and in capacity of high-quality chip int… ▽ More Photonic integrated microcombs have enabled advanced applications in optical communication, microwave synthesis, and optical metrology, which in nature unveil an optical dissipative soliton pattern under cavity-enhanced nonlinear processes. The most decisive factor of microcombs lies in the photonic material platforms, where materials with high nonlinearity and in capacity of high-quality chip integration are highly demanded. In this work, we present a home-developed chalcogenide glasses-Ge25Sb10S65 (GeSbS) for the nonlinear photonic integration and for the dissipative soliton microcomb generation. Compared with the current integrated nonlinear platforms, the GeSbS features wider transparency from the visible to 11 um region, stronger nonlinearity, and lower thermo-refractive coefficient, and is CMOS compatible in fabrication. In this platform, we achieve chip-integrated optical microresonators with a quality (Q) factor above 2 x 10^6, and carry out lithographically controlled dispersion engineering. In particular, we demonstrate that both a bright soliton-based microcomb and a dark-pulsed comb are generated in a single microresonator, in its separated fundamental polarized mode families under different dispersion regimes. The overall pum** power is on the ten-milliwatt level, determined by both the high Q-factor and the high material nonlinearity of the microresonator. Our results may contribute to the field of nonlinear photonics with an alternative material platform for highly compact and high-intensity nonlinear interactions, while on the application aspect, contribute to the development of soliton microcombs at low operation power, which is potentially required for monolithically integrated optical frequency combs. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: 22 pages, 5 figures

Journal ref: Laser & Photonics Reviews 16 202200219 (2022)

arXiv:2201.11924 [pdf, other]

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

Authors: Xiaoshuai Zhang, Rui Chen, Ang Li, Fanbo Xiang, Yuzhe Qin, Jiayuan Gu, Zhan Ling, Minghua Liu, Peiyu Zeng, Songfang Han, Zhiao Huang, Tongzhou Mu, **g Xu, Hao Su

Abstract: In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able… ▽ More In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor in real time. We conduct real experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to the real-world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in develo** robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities. △ Less

Submitted 5 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: The paper will appear in the IEEE Transactions on Robotics. 20 pages, 14 figures, 10 tables

arXiv:2201.04300 [pdf, other]

doi 10.1038/s41467-022-31534-7

Quantum key distribution surpassing the repeaterless rate-transmittance bound without global phase locking

Authors: Pei Zeng, Hongyi Zhou, Weijie Wu, Xiongfeng Ma

Abstract: Quantum key distribution -- the establishment of information-theoretically secure keys based on quantum physics -- is mainly limited by its practical performance, which is characterised by the dependence of the key rate on the channel transmittance $R(η)$. Recently, schemes based on single-photon interference have been proposed to improve the key rate to $R=O(\sqrtη)$ by overcoming the point-to-po… ▽ More Quantum key distribution -- the establishment of information-theoretically secure keys based on quantum physics -- is mainly limited by its practical performance, which is characterised by the dependence of the key rate on the channel transmittance $R(η)$. Recently, schemes based on single-photon interference have been proposed to improve the key rate to $R=O(\sqrtη)$ by overcoming the point-to-point secret key capacity bound with interferometers. Unfortunately, all of these schemes require challenging global phase locking to realise a stable long-arm single-photon interferometer with a precision of approximately 100 nm over fibres that are hundreds of kilometres long. Aiming to address this problem, we propose a mode-pairing measurement-device-independent quantum key distribution scheme in which the encoded key bits and bases are determined during data post-processing. Using conventional second-order interference, this scheme can achieve a key rate of $R=O(\sqrtη)$ without global phase locking when the local phase fluctuation is mild. We expect this high-performance scheme to be ready-to-implement with off-the-shelf optical devices. △ Less

Submitted 30 January, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: 56 pages, 24 figures. Comments are welcome

Journal ref: Nat Commun 13, 3903 (2022)

arXiv:2111.13855 [pdf, other]

Quantum Complementarity Approach to Device-Independent Security

Authors: Xingjian Zhang, Pei Zeng, Tian Ye, Hoi-Kwong Lo, Xiongfeng Ma

Abstract: Complementarity is an essential feature of quantum mechanics. The preparation of an eigenstate of one observable implies complete randomness in its complementary observable. In quantum cryptography, complementarity allows us to formulate security analyses in terms of phase-error correction. However, in the device-independent regime that offers security without device characterization, the concept… ▽ More Complementarity is an essential feature of quantum mechanics. The preparation of an eigenstate of one observable implies complete randomness in its complementary observable. In quantum cryptography, complementarity allows us to formulate security analyses in terms of phase-error correction. However, in the device-independent regime that offers security without device characterization, the concept becomes much subtler. Security proofs of device-independent quantum cryptography tasks are often complex and quite different from those of their more standard device-dependent cousins. The existing proofs pose huge challenges to experiments, among which large data-size requirement is a crux. Here, we show the complementarity security origin of the device-independent tasks. By linking complementarity with quantum nonlocality, we recast the device-independent scheme into a quantum error correction protocol. Going beyond the identical-and-independent-distribution case, we consider the most general attack. We generalize the sample entropy in classical Shannon theory for the finite-size analysis. Our method exhibits good finite-size performance and brings the device-independent scheme to a more practical regime. Applying it to the data in a recent ion-trap-based device-independent quantum key distribution experiment, one could reduce the requirement on data size to less than a third. Furthermore, the complementarity approach can be naturally extended to advantage key distillation to ease experiments by tolerating higher loss and lower transmittance. △ Less

Submitted 11 October, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: 57 pages, 21 figures, 4 tables; In this version, we have (1) added security statements for general device-independent tasks; (2) updated the finite-size analysis with Kato's inequality; (3) presented more numerical simulation results, including a detailed presentation of analysing the reported data from the recent ion-trap DIQKD experiment; (4) fixed a few typos

arXiv:2111.11600 [pdf, other]

Throughput Maximization for Active Intelligent Reflecting Surface Aided Wireless Powered Communications

Authors: Piao Zeng, Deli Qiao, Qingqing Wu, Yuan Wu

Abstract: This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an a… ▽ More This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an amplifying power-limited weighted sum throughput (WST) maximization problem, which is solved by successive convex approximation technique and fractional programming alternatively. To balance the performance and complexity tradeoff, three beamforming setups are considered at the active IRS, namely user-adaptive IRS beamforming, uplink-adaptive IRS beamforming, and static IRS beamforming. Numerical results demonstrate the significant superiority of employing active IRS in WPCNs and the benefits of dynamic IRS beamforming. Specifically, it is found that compared to the passive IRS, the active IRS not only improves the WST greatly, but also is more energy-efficient and can significantly extend the transmission coverage. Moreover, different from the symmetric deployment strategy of passive IRS, it is more preferable to deploy the active IRS near the devices. △ Less

Submitted 11 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Submitted to Wireless Communications Letters

arXiv:2111.08442 [pdf, other]

doi 10.1088/1572-9494/ac679a

Bootstrap** Calabi-Yau Quantum Mechanics

Authors: Bao-ning Du, Min-xin Huang, Pei-xuan Zeng

Abstract: Recently, a novel bootstrap method for numerical calculations in matrix models and quantum mechanical systems is proposed. We apply the method to certain quantum mechanical systems derived from some well-known local toric Calabi-Yau geometries, where the exact quantization conditions have been conjecturally related to topological string theory. We find that the bootstrap method provides a promisin… ▽ More Recently, a novel bootstrap method for numerical calculations in matrix models and quantum mechanical systems is proposed. We apply the method to certain quantum mechanical systems derived from some well-known local toric Calabi-Yau geometries, where the exact quantization conditions have been conjecturally related to topological string theory. We find that the bootstrap method provides a promising alternative for the precision numerical calculations of the energy eigenvalues. An improvement in our approach is to use a larger set of two-dimensional operators instead of one-dimensional ones. We also apply our improved bootstrap methods to some non-relativistic models in the recent literature and demonstrate better numerical accuracies. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: 21 pages, 21 figures

Report number: USTC-ICTS/PCFT-21-43

arXiv:2109.15304 [pdf, other]

Universal quantum algorithmic cooling on a quantum computer

Authors: Pei Zeng, **zhao Sun, Xiao Yuan

Abstract: Quantum cooling, a deterministic process that drives any state to the lowest eigenstate, has been widely used from studying ground state properties of chemistry and condensed matter quantum physics, to general optimization problems. However, the cooling procedure is generally non-unitary, hence its realization on a quantum computer either requires deep circuits or assumes specific input states wit… ▽ More Quantum cooling, a deterministic process that drives any state to the lowest eigenstate, has been widely used from studying ground state properties of chemistry and condensed matter quantum physics, to general optimization problems. However, the cooling procedure is generally non-unitary, hence its realization on a quantum computer either requires deep circuits or assumes specific input states with variational circuits. Here, we propose universal quantum cooling algorithms that overcome these limitations. By utilizing a dual phase representation of decaying functions, we show how to universally and deterministically realize a general cooling procedure with shallow quantum circuits. We demonstrate its applications in cooling an arbitrary input state with known ground state energy, corresponding to satisfactory, linear algebra tasks, and quantum state compiling tasks, and preparing unknown eigenvalues and eigenstates, corresponding to quantum many-body problems. Compared to quantum phase estimation, our method uses only one ancillary qubit and much shallower circuits, showing exponential improvement of the circuit complexity with respect to the final state infidelity. We numerically benchmark the algorithms for the $8$-qubit Heisenberg model and verify its feasibility for accurately finding eigenenergies and obtaining eigenstate measurements. Our work paves the way for efficient and universal quantum algorithmic cooling with near-term as well as universal fault-tolerant quantum devices. △ Less

Submitted 2 June, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: 35 pages, 7 figures. Comments are welcome

arXiv:2109.09936 [pdf, other]

doi 10.1080/01621459.2021.1918554

A Model-free Variable Screening Method Based on Leverage Score

Authors: Wenxuan Zhong, Yiwen Liu, Peng Zeng

Abstract: With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational ch… ▽ More With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singular value decompositions has been proposed to select rows of a design matrix as a surrogate of the full data in linear regression. Analogously, variable screening can be viewed as selecting rows of the design matrix. However, effective variable selection along this line of thinking remains elusive. In this article, we bridge this gap to propose a weighted leverage variable screening method by utilizing both the left and right singular vectors of the design matrix. We show theoretically and empirically that the predictors selected using our method can consistently include true predictors not only for linear models but also for complicated general index models. Extensive simulation studies show that the weighted leverage screening method is highly computationally efficient and effective. We also demonstrate its success in identifying carcinoma related genes using spatial transcriptome data. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: Journal of the American Statistical Association, published online: 21 Jun 2021

arXiv:2109.07241 [pdf, other]

doi 10.1103/PhysRevApplied.16.034017

Reference-frame-independent design of phase-matching quantum key distribution

Authors: Anran **, Pei Zeng, Richard V. Penty, Xiongfeng Ma

Abstract: The recently proposed phase-matching quantum key distribution offers means to overcome the linear key rate-transmittance bound. Since the key information is encoded onto the phases of coherent states, the misalignment between the two remote reference frames would yield errors and significantly degrade the key generation rate from the ideal case. In this work, we propose a reference-frame-independe… ▽ More The recently proposed phase-matching quantum key distribution offers means to overcome the linear key rate-transmittance bound. Since the key information is encoded onto the phases of coherent states, the misalignment between the two remote reference frames would yield errors and significantly degrade the key generation rate from the ideal case. In this work, we propose a reference-frame-independent design of phase-matching quantum key distribution by introducing high-dimensional key encoding space. With encoded phases spanning the unit circle, the error statistics at arbitrary fixed phase reference difference can be recovered and treated separately, from which the misalignment angle can be identified. By naturally extending the binary encoding symmetry and complementarity to high dimensions, we present a security proof of this high-dimensional phase-matching quantum key distribution and demonstrate with simulation that a 17-dimensional protocol is completely immune to any degree of fixed misalignment and robust to slow phase fluctuations. We expect the high-dimensional protocol to be a practical reference-frame-independent design for general phase-encoding schemes where high-dimensional encoding is relatively easy to implement. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: 20 pages, 8 figures

Journal ref: Phys. Rev. Applied 16, 034017 (2021)

arXiv:2109.03233 [pdf, other]

Contrastive Learning with Temporal Correlated Medical Images: A Case Study using Lung Segmentation in Chest X-Rays

Authors: Dewen Zeng, John N. Kheir, Peng Zeng, Yiyu Shi

Abstract: Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we… ▽ More Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we use lung segmentation in chest X-rays as a case study and propose a contrastive learning framework with temporal correlated medical images, named CL-TCI, to learn superior encoders for initializing the segmentation network. We adapt CL-TCI from two state-of-the-art contrastive learning methods-MoCo and SimCLR. Experiment results on three chest X-ray datasets show that under two different segmentation backbones, U-Net and Deeplab-V3, CL-TCI can outperform all baselines that do not incorporate any temporal correlation in both semi-supervised learning setting and transfer learning setting with limited annotation. This suggests that information among temporal correlated medical images can indeed improve contrastive learning performance. Between the two variations of CL-TCI, CL-TCI adapted from MoCo outperforms CL-TCI adapted from SimCLR in most settings, indicating that more contrastive samples can benefit the learning process and help the network learn high-quality representations. △ Less

Submitted 16 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

Comments: 7 pages, submitted to ICCAD'21 special session

Showing 1–50 of 83 results for author: Zeng, P