-
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models
Authors:
Haowen Hou,
Peigen Zeng,
Fei Ma,
Fei Richard Yu
Abstract:
Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We pro…
▽ More
Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We propose a data-dependent recurrence and sandwich prompts to enhance our modeling capabilities, along with a 2D image scanning mechanism to enrich the processing of visual sequences. Extensive experiments demonstrate that VisualRWKV achieves competitive performance compared to Transformer-based models like LLaVA-1.5 on various benchmarks. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at the following GitHub repository: \href{https://github.com/howard-hou/VisualRWKV}{https://github.com/howard-hou/VisualRWKV}.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction
Authors:
Jiaqi Cui,
Xinyi Zeng,
Pinxian Zeng,
Bo Liu,
Xi Wu,
Jiliu Zhou,
Yan Wang
Abstract:
Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g.…
▽ More
Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g., patients' clinical tabular, resulting in compromised reconstruction with limited diagnostic utility. Moreover, they often overlook the semantic consistency between real SPET and reconstructed images, leading to distorted semantic contexts. To tackle these problems, we propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular. Specifically, our MCAD incorporates a Multi-modal conditional Encoder (Mc-Encoder) to extract multi-modal features, followed by a conditional diffusion process to blend noise with multi-modal features and gradually map blended features to the target SPET images. To balance multi-modal inputs, the Mc-Encoder embeds Optimal Multi-modal Transport co-Attention (OMTA) to narrow the heterogeneity gap between image and tabular while capturing their interactions, providing sufficient guidance for reconstruction. In addition, to mitigate semantic distortions, we introduce the Multi-Modal Masked Text Reconstruction (M3TRec), which leverages semantic knowledge extracted from denoised PET images to restore the masked clinical tabular, thereby compelling the network to maintain accurate semantics during reconstruction. To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps. Experiments show that our method achieves the state-of-the-art performance both qualitatively and quantitatively.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Gram2Vec: An Interpretable Document Vectorizer
Authors:
Peter Zeng,
Eric Sclafani,
Owen Rambow
Abstract:
We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a map** of authors to do…
▽ More
We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a map** of authors to documents based on their Gram2Vec vectors and highlight the ability to drop or add features to view which authors make certain linguistic choices. Next, we use authorship attribution as an application to show how Gram2Vec can explain why a document is attributed to a certain author, using cosine similarities between the Gram2Vec feature vectors to calculate the distances between candidate documents and a query document.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
High-precision and low-depth eigenstate property estimation: theory and resource estimation
Authors:
**zhao Sun,
Pei Zeng,
Tom Gur,
M. S. Kim
Abstract:
Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target pre…
▽ More
Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target precision. However, QSP is challenging for both near-term noisy quantum computers and early fault-tolerant quantum computers (FTQC), which are limited by the number of logical qubits and circuit depth. To date, early FTQC algorithms have focused on querying the perfect time evolution $e^{-iHt}$. It remains uncertain whether early FTQC algorithms can maintain good asymptotic scaling at the gate level. Moreover, when considering qubit connectivity, the circuit depth of existing FTQC algorithms may scale suboptimally with system size. Here, we present a full-stack design of a random sampling algorithm for estimating the eigenenergy and the observable expectations on the eigenstates, which can achieve high precision and good system size scaling. The gate complexity has a logarithmic dependence on precision $ {O}(\log^{1+o(1)} (1/ε))$ for generic Hamiltonians, which cannot achieved by methods using Trottersiation to realise $e^{-iHt}$ like in QETU. For $n$-qubit lattice Hamiltonians, our method achieves near-optimal system size dependence with the gate complexity $O(n^{1+o(1)})$. When restricting the qubit connectivity to a linear nearest-neighbour architecture, The method shows advantages in circuit depth, with $O(n^{o(1)})$ for lattice models and $O(n^{2+o(1)})$ for electronic structure problems. We compare the resource requirements (CNOT gates, T gates and qubit numbers) by phase estimation, QSP, and QETU, in lattice and molecular problems.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Authors:
Haonan Zhang,
Pengpeng Zeng,
Lianli Gao,
**gkuan Song,
Yihang Duan,
Xinyu Lyu,
Hengtao Shen
Abstract:
Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l…
▽ More
Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: https://github.com/zchoi/GLSCL.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Authors:
Pai Zeng,
Zhenyu Ning,
Jieru Zhao,
Weihao Cui,
Mengwei Xu,
Liwei Guo,
Xusheng Chen,
Yizhou Shan
Abstract:
We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving…
▽ More
We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving accuracy (A), and improving serving performance (P). Drawing inspiration from the CAP theorem in databases, we propose a CAP principle for LLM serving, which suggests that any optimization can improve at most two of these three goals simultaneously. Our survey categorizes existing works within this framework. We find the definition and continuity of user-perceived measurement metrics are crucial in determining whether a goal has been met, akin to prior CAP databases in the wild. We recognize the CAP principle for LLM serving as a guiding principle, rather than a formal theorem, to inform designers of the inherent and dynamic trade-offs in serving models. As serving accuracy and performance have been extensively studied, this survey focuses on works that extend serving context length and address the resulting challenges.
△ Less
Submitted 26 May, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
Authors:
Hongcheng Zhang,
Liu Liang,
Pengxin Zeng,
Xiao Song,
Zhe Wang
Abstract:
Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-per…
▽ More
Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-performance fully sparse detector for end-to-end multi-modality 3D object detection. The detector, termed SparseLIF, contains three key designs, which are (1) Perspective-Aware Query Generation (PAQG) to generate high-quality 3D queries with perspective priors, (2) RoI-Aware Sampling (RIAS) to further refine prior queries by sampling RoI features from each modality, (3) Uncertainty-Aware Fusion (UAF) to precisely quantify the uncertainty of each sensor modality and adaptively conduct final multi-modality fusion, thus achieving great robustness against sensor noises. By the time of submission (2024/03/08), SparseLIF achieves state-of-the-art performance on the nuScenes dataset, ranking 1st on both validation set and test benchmark, outperforming all state-of-the-art 3D object detectors by a notable margin. The source code will be released upon acceptance.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground
Authors:
Adil Soubki,
John Murzaku,
Arash Yousefi Jordehi,
Peter Zeng,
Magdalena Markowska,
Seyed Abolghasem Mirroshandel,
Owen Rambow
Abstract:
Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then s…
▽ More
Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.
△ Less
Submitted 5 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction
Authors:
Jiaqi Cui,
Yan Wang,
Lu Wen,
Pinxian Zeng,
Xi Wu,
Jiliu Zhou,
Dinggang Shen
Abstract:
To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le…
▽ More
To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, leading to compromised reconstruction. In this paper, we propose a 3D point-based context clusters GAN, namely PCC-GAN, to reconstruct high-quality SPET images from LPET. Specifically, inspired by the geometric representation power of points, we resort to a point-based representation to enhance the explicit expression of the image structure, thus facilitating the reconstruction with finer details. Moreover, a context clustering strategy is applied to explore the contextual relationships among points, which mitigates the ambiguities of small structures in the reconstructed images. Experiments on both clinical and phantom datasets demonstrate that our PCC-GAN outperforms the state-of-the-art reconstruction methods qualitatively and quantitatively. Code is available at https://github.com/gluucose/PCCGAN.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Authors:
Kaipeng Fang,
**gkuan Song,
Lianli Gao,
Pengpeng Zeng,
Zhi-Qi Cheng,
Xiyao Li,
Heng Tao Shen
Abstract:
The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text…
▽ More
The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at https://github.com/fangkaipeng/ProS.
△ Less
Submitted 29 February, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Fault-Tolerant Operation of Bosonic Qubits with Discrete-Variable Ancillae
Authors:
Qian Xu,
Pei Zeng,
Daohong Xu,
Liang Jiang
Abstract:
Fault-tolerant quantum computation with bosonic qubits often necessitates the use of noisy discrete-variable ancillae. In this work, we establish a comprehensive and practical fault-tolerance framework for such a hybrid system and synthesize it with fault-tolerant protocols by combining bosonic quantum error correction (QEC) and advanced quantum control techniques. We introduce essential building…
▽ More
Fault-tolerant quantum computation with bosonic qubits often necessitates the use of noisy discrete-variable ancillae. In this work, we establish a comprehensive and practical fault-tolerance framework for such a hybrid system and synthesize it with fault-tolerant protocols by combining bosonic quantum error correction (QEC) and advanced quantum control techniques. We introduce essential building blocks of error-corrected gadgets by leveraging ancilla-assisted bosonic operations using a generalized variant of path-independent quantum control (GPI). Using these building blocks, we construct a universal set of error-corrected gadgets that tolerate a single photon loss and an arbitrary ancilla fault for four-legged cat qubits. Notably, our construction only requires dispersive coupling between bosonic modes and ancillae, as well as beam-splitter coupling between bosonic modes, both of which have been experimentally demonstrated with strong strengths and high accuracy. Moreover, each error-corrected bosonic qubit is only comprised of a single bosonic mode and a three-level ancilla, featuring the hardware efficiency of bosonic QEC in the full fault-tolerant setting. We numerically demonstrate the feasibility of our schemes using current experimental parameters in the circuit-QED platform. Finally, we present a hardware-efficient architecture for fault-tolerant quantum computing by concatenating the four-legged cat qubits with an outer qubit code utilizing only beam-splitter couplings. Our estimates suggest that the overall noise threshold can be reached using existing hardware. These developed fault-tolerant schemes extend beyond their applicability to four-legged cat qubits and can be adapted for other rotation-symmetrical codes, offering a promising avenue toward scalable and robust quantum computation with bosonic qubits.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Similarity-driven and Task-driven Models for Diversity of Opinion in Crowdsourcing Markets
Authors:
Chen Jason Zhang,
Yunrui Liu,
Pengcheng Zeng,
Ting Wu,
Lei Chen,
Pan Hui,
Fei Hao
Abstract:
The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to constru…
▽ More
The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to construct a sound crowd - Diversity of Opinion, Independence, Decentralization and Aggregation. However, while the other three components have already been investigated and implemented in existing crowdsourcing platforms, 'Diversity of Opinion' has not been functionally enabled yet. From a computational point of view, constructing a wise crowd necessitates quantitatively modeling and taking diversity into account. There are usually two paradigms in a crowdsourcing marketplace for worker selection: building a crowd to wait for tasks to come and selecting workers for a given task. We propose similarity-driven and task-driven models for both paradigms. Also, we develop efficient and effective algorithms for recruiting a limited number of workers with optimal diversity in both models. To validate our solutions, we conduct extensive experiments using both synthetic datasets and real data sets.
△ Less
Submitted 28 February, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Pilot-reference-free continuous-variable quantum key distribution with efficient decoy-state analysis
Authors:
Anran **,
Xingjian Zhang,
Liang Jiang,
Richard V. Penty,
Pei Zeng
Abstract:
Continuous-variable quantum key distribution (CV QKD) using optical coherent detectors is practically favorable due to its low implementation cost, flexibility of wavelength division multiplexing, and compatibility with standard coherent communication technologies. However, the security analysis and parameter estimation of CV QKD are complicated due to the infinite-dimensional latent Hilbert space…
▽ More
Continuous-variable quantum key distribution (CV QKD) using optical coherent detectors is practically favorable due to its low implementation cost, flexibility of wavelength division multiplexing, and compatibility with standard coherent communication technologies. However, the security analysis and parameter estimation of CV QKD are complicated due to the infinite-dimensional latent Hilbert space. Also, the transmission of strong reference pulses undermines the security and complicates the experiments. In this work, we tackle these two problems by presenting a time-bin-encoding CV protocol with a simple phase-error-based security analysis valid under general coherent attacks. With the key encoded into the relative intensity between two optical modes, the need for global references is removed. Furthermore, phase randomization can be introduced to decouple the security analysis of different photon-number components. We can hence tag the photon number for each round, effectively estimate the associated privacy using a carefully designed coherent-detection method, and independently extract encryption keys from each component. Simulations manifest that the protocol using multi-photon components increases the key rate by two orders of magnitude compared to the one using only the single-photon component. Meanwhile, the protocol with four-intensity decoy analysis is sufficient to yield tight parameter estimation with a short-distance key-rate performance comparable to the best Bennett-Brassard-1984 implementation.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms
Authors:
Jiaqi Cui,
Pinxian Zeng,
Xinyi Zeng,
Peng Wang,
Xi Wu,
Jiliu Zhou,
Yan Wang,
Dinggang Shen
Abstract:
To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminishe…
▽ More
To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from prorogating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Generalized Unbiased Scene Graph Generation
Authors:
Xinyu Lyu,
Lianli Gao,
Junlin Xie,
Pengpeng Zeng,
Yulu Tian,
Jie Shao,
Heng Tao Shen
Abstract:
Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-obj…
▽ More
Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-object combinations). This concept-level imbalance poses a more pervasive and challenging issue compared to the predicate-level imbalance since subject-object pairs are inherently complex in combinations. Hence, we introduce a novel research problem: Generalized Unbiased Scene Graph Generation (G-USGG), which takes into account both predicate-level and concept-level imbalance. To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts. MCL first quantifies the concept-level imbalance across predicates in terms of different amounts of concepts, representing as multiple concept-prototypes within the same class. It then effectively learns concept-prototypes by applying the Concept Regularization (CR) technique. Furthermore, to achieve balanced learning over different concepts, we introduce the Balanced Prototypical Memory (BPM), which guides SGG models to generate balanced representations for concept-prototypes. Extensive experiments demonstrate the remarkable efficacy of our model-agnostic strategy in enhancing the performance of benchmark models on both VG-SGG and OI-SGG datasets, leading to new state-of-the-art achievements in two key aspects: predicate-level unbiased relation recognition and concept-level compositional generability.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Performance Analysis for Polar Codes under Successive Cancellation List Decoding with Fixed List Size
Authors:
**nan Piao,
Dong Li,
Xueting Yu,
Zhibo Li,
Ming Yang,
**di Liu,
Peng Zeng
Abstract:
In this paper, we first indicate that the block error event of polar codes under successive cancellation list (SCL) decoding is composed of path loss (PL) error event and path selection (PS) error event, where the PL error event is that correct codeword is lost during the SCL decoding and the PS error event is that correct codeword is reserved in the decoded list but not selected as the decoded co…
▽ More
In this paper, we first indicate that the block error event of polar codes under successive cancellation list (SCL) decoding is composed of path loss (PL) error event and path selection (PS) error event, where the PL error event is that correct codeword is lost during the SCL decoding and the PS error event is that correct codeword is reserved in the decoded list but not selected as the decoded codeword. Then, we simplify the PL error event by assuming the all-zero codeword is transmitted and derive the probability lower bound via the joint probability density of the log-likelihood ratios of information bits. Meanwhile, the union bound calculated by the minimum weight distribution is used to evaluate the probability of the PS error event. With the performance analysis, we design a greedy bit-swap** (BS) algorithm to construct polar codes by gradually swap** information bit and frozen bit to reduce the performance lower bound of SCL decoding. The simulation results show that the BLER performance of SCL decoding is close to the lower bound in the medium to high signal-to-noise ratio region and we can optimize the lower bound to improve the BLER performance of SCL decoding by the BS algorithm.
△ Less
Submitted 6 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Semantic Invariant Multi-view Clustering with Fully Incomplete Information
Authors:
Pengxin Zeng,
Mouxing Yang,
Yiding Lu,
Changqing Zhang,
Peng Hu,
Xi Peng
Abstract:
Robust multi-view learning with incomplete information has received significant attention due to issues such as incomplete correspondences and incomplete instances that commonly affect real-world multi-view applications. Existing approaches heavily rely on paired samples to realign or impute defective ones, but such preconditions cannot always be satisfied in practice due to the complexity of data…
▽ More
Robust multi-view learning with incomplete information has received significant attention due to issues such as incomplete correspondences and incomplete instances that commonly affect real-world multi-view applications. Existing approaches heavily rely on paired samples to realign or impute defective ones, but such preconditions cannot always be satisfied in practice due to the complexity of data collection and transmission. To address this problem, we present a novel framework called SeMantic Invariance LEarning (SMILE) for multi-view clustering with incomplete information that does not require any paired samples. To be specific, we discover the existence of invariant semantic distribution across different views, which enables SMILE to alleviate the cross-view discrepancy to learn consensus semantics without requiring any paired samples. The resulting consensus semantics remain unaffected by cross-view distribution shifts, making them useful for realigning/imputing defective instances and forming clusters. We demonstrate the effectiveness of SMILE through extensive comparison experiments with 13 state-of-the-art baselines on five benchmarks. Our approach improves the clustering accuracy of NoisyMNIST from 19.3\%/23.2\% to 82.7\%/69.0\% when the correspondences/instances are fully incomplete. The code could be accessed from https://pengxi.me.
△ Less
Submitted 21 December, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Extended ADMM for general penalized quantile regression with linear constraints in big data
Authors:
Yongxin Liu,
Peng Zeng
Abstract:
Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Differe…
▽ More
Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Different formulations to handle the linear constraints and general penalty are explored and compared. The most efficient one has explicit expressions for each parameter and avoids nested-loop iterations in some existing algorithms. Additionally, parallel ADMM algorithm for big data is also developed when data are stored in a distributed fashion. The stop** criterion and convergence of the algorithm are established. Extensive numerical experiments and a real data example demonstrate the computational efficiency of the proposed algorithms. The details of theoretical proofs and different algorithm variations are presented in Appendix.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Differentiable Genetic Programming for High-dimensional Symbolic Regression
Authors:
Peng Zeng,
Xiaotian Song,
Andrew Lensen,
Yuwei Ou,
Yanan Sun,
Mengjie Zhang,
Jiancheng Lv
Abstract:
Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-…
▽ More
Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer esca** from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Development of Nb-GaAs based superconductor semiconductor hybrid platform by combining in-situ dc magnetron sputtering and molecular beam epitaxy
Authors:
Clemens Todt,
Sjoerd Telkamp,
Filip Krizek,
Christian Reichl,
Mihai Gabureac,
Rüdiger Schott,
Erik Cheah,
Peng Zeng,
Thomas Weber,
Arnold Müller,
Christof Vockenhuber,
Mohsen Bahrami Panah,
Werner Wegscheider
Abstract:
We present Nb thin films deposited in-situ on GaAs by combining molecular beam epitaxy and magnetron sputtering within an ultra-high vacuum cluster. Nb films deposited at varying power, and a reference film from a commercial system, are compared. The results show clear variation between the in-situ and ex-situ deposition which we relate to differences in magnetron sputtering conditions and chamber…
▽ More
We present Nb thin films deposited in-situ on GaAs by combining molecular beam epitaxy and magnetron sputtering within an ultra-high vacuum cluster. Nb films deposited at varying power, and a reference film from a commercial system, are compared. The results show clear variation between the in-situ and ex-situ deposition which we relate to differences in magnetron sputtering conditions and chamber geometry. The Nb films have critical temperatures of around $9 \textrm{K}$. and critical perpendicular magnetic fields of up to $B_{c2} = 1.4 \textrm{T}$ at $4.2 \textrm{K}$. From STEM images of the GaAs-Nb interface we find the formation of an amorphous interlayer between the GaAs and the Nb for both the ex-situ and in-situ deposited material.
△ Less
Submitted 18 April, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Construction Methods Based Minimum Weight Distribution for Polar Codes with Successive Cancellation List Decoding
Authors:
**nan Piao,
Dong Li,
**di Liu,
Xueting Yu,
Zhibo Li,
Ming Yang,
Peng Zeng
Abstract:
In this paper, we focus on the construction methods based MWD for polar codes to improve the performance with successive cancellation list (SCL) decoding. We first propose an ordered and nested reliability sequence, namely MWD sequence, to improve the ML performance of polar codes and apply fast construction without the original channel information. In the MWD sequence, the synthetic channels are…
▽ More
In this paper, we focus on the construction methods based MWD for polar codes to improve the performance with successive cancellation list (SCL) decoding. We first propose an ordered and nested reliability sequence, namely MWD sequence, to improve the ML performance of polar codes and apply fast construction without the original channel information. In the MWD sequence, the synthetic channels are sorted by the partial MWD which is used to evaluate the influence of information bit on MWD and we prove the MWD sequence is the optimum sequence under ML decoding. Then, since the list size of SCL decoding is limited, we introduce an entropy constraint to establish a relationship between the list size and the ML performance and propose a heuristic and greedy construction method named bit grou** reorder based MWD (BGR-MWD) algorithm. In the algorithm, we divide the synthetic channels into groups by the partial MWD and greedily reorder the synthetic channels in some groups until the entropy constraint is satisfied. The simulation results show the MWD sequence is suitable for constructing polar codes with short code length. Meanwhile, the BGR-MWD algorithm has superior performance over the traditional construction methods for long code length.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Event-Triggered Active Disturbance Rejection Control for Uncertain Random Nonlinear Systems
Authors:
Ze-Hao Wu,
Feiqi Deng,
Pengyu Zeng,
Hua-Cheng Zhou,
Hongyi Li
Abstract:
In this paper, event-triggered active disturbance rejection control (ADRC) is first addressed for a class of uncertain random nonlinear systems driven by bounded noise and colored noise. The event-triggered extended state observer (ESO) and ADRC controller are designed, where two respective event-triggering mechanisms with a fixed positive lower bound for the inter-execution times are proposed. Th…
▽ More
In this paper, event-triggered active disturbance rejection control (ADRC) is first addressed for a class of uncertain random nonlinear systems driven by bounded noise and colored noise. The event-triggered extended state observer (ESO) and ADRC controller are designed, where two respective event-triggering mechanisms with a fixed positive lower bound for the inter-execution times are proposed. The random total disturbance representing the coupling of nonlinear unmodeled dynamics, external deterministic disturbance, bounded noise, and colored noise is estimated in real time by the event-triggered ESO and compensated in the event-triggered feedback loop. Both the mean square and almost surely practical convergence of the closed-loop systems is shown with rigorous theoretical analysis. Finally, some numerical simulations are implemented to validate the proposed control scheme and theoretical results.
△ Less
Submitted 8 May, 2024; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Control over epitaxy and the role of the InAs/Al interface in hybrid two-dimensional electron gas systems
Authors:
E. Cheah,
D. Z. Haxell,
R. Schott,
P. Zeng,
E. Paysen,
S. C. ten Kate,
M. Coraiola,
M. Landstetter,
A. B. Zadeh,
A. Trampert,
M. Sousa,
H. Riel,
F. Nichele,
W. Wegscheider,
F. Krizek
Abstract:
In-situ synthesised semiconductor/superconductor hybrid structures became an important material platform in condensed matter physics. Their development enabled a plethora of novel quantum transport experiments with focus on Andreev and Majorana physics. The combination of InAs and Al has become the workhorse material and has been successfully implemented in the form of one-dimensional structures a…
▽ More
In-situ synthesised semiconductor/superconductor hybrid structures became an important material platform in condensed matter physics. Their development enabled a plethora of novel quantum transport experiments with focus on Andreev and Majorana physics. The combination of InAs and Al has become the workhorse material and has been successfully implemented in the form of one-dimensional structures and two-dimensional electron gases. In contrast to the well-developed semiconductor parts of the hybrid materials, the direct effect of the crystal nanotexture of Al films on the electron transport still remains unclear. This is mainly due to the complex epitaxial relation between Al and the semiconductor. We present a study of Al films on shallow InAs two-dimensional electron gas systems grown by molecular beam epitaxy, with focus on control of the Al crystal structure. We identify the dominant grain types present in our Al films and show that the formation of grain boundaries can be significantly reduced by controlled roughening of the epitaxial interface. Finally, we demonstrate that the implemented roughening does not negatively impact either the electron mobility of the two-dimensional electron gas or the basic superconducting properties of the proximitized system.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Simple and high-precision Hamiltonian simulation by compensating Trotter error with linear combination of unitary operations
Authors:
Pei Zeng,
**zhao Sun,
Liang Jiang,
Qi Zhao
Abstract:
Trotter and linear-combination-of-unitary (LCU) are two popular Hamiltonian simulation methods. We propose Hamiltonian simulation algorithms using LCU to compensate Trotter error, which enjoy both of their advantages. By adding few gates after the Kth-order Trotter, we realize a better time scaling than 2Kth-order Trotter. Our first algorithm exponentially improves the accuracy scaling of the Kth-…
▽ More
Trotter and linear-combination-of-unitary (LCU) are two popular Hamiltonian simulation methods. We propose Hamiltonian simulation algorithms using LCU to compensate Trotter error, which enjoy both of their advantages. By adding few gates after the Kth-order Trotter, we realize a better time scaling than 2Kth-order Trotter. Our first algorithm exponentially improves the accuracy scaling of the Kth-order Trotter formula. In the second algorithm, we consider the detailed structure of Hamiltonians and construct LCU for Trotter errors with commutator scaling. Consequently, for lattice Hamiltonians, the algorithm enjoys almost linear system-size dependence and quadratically improves the accuracy of the Kth-order Trotter.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting
Authors:
Maowei Jiang,
Pengyu Zeng,
Kai Wang,
Huan Liu,
Wenbo Chen,
Haoran Liu
Abstract:
Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real…
▽ More
Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real world datasets. At present, the mainstream frequency information extraction methods are Fourier transform(FT) based. However, use of FT is problematic due to Gibbs phenomenon. If the values on both sides of sequences differ significantly, oscillatory approximations are observed around both sides and high frequency noise will be introduced. Therefore We propose a novel frequency enhanced channel attention that adaptively modelling frequency interdependencies between channels based on Discrete Cosine Transform which would intrinsically avoid high frequency noise caused by problematic periodity during Fourier Transform, which is defined as Gibbs Phenomenon. We show that this network generalize extremely effectively across six real-world datasets and achieve state-of-the-art performance, we further demonstrate that frequency enhanced channel attention mechanism module can be flexibly applied to different networks. This module can improve the prediction ability of existing mainstream networks, which reduces 35.99% MSE on LSTM, 10.01% on Reformer, 8.71% on Informer, 8.29% on Autoformer, 8.06% on Transformer, etc., at a slight computational cost ,with just a few line of code. Our codes and data are available at https://github.com/Zero-coder/FECAM.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Learnable Blur Kernel for Single-Image Defocus Deblurring in the Wild
Authors:
Jucai Zhai,
Pengcheng Zeng,
Chihao Ma,
Yong Zhao,
Jie Chen
Abstract:
Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue…
▽ More
Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue, we propose a novel defocus deblurring method that uses the guidance of the defocus map to implement image deblurring. The proposed method consists of a learnable blur kernel to estimate the defocus map, which is an unsupervised method, and a single-image defocus deblurring generative adversarial network (DefocusGAN) for the first time. The proposed network can learn the deblurring of different regions and recover realistic details. We propose a defocus adversarial loss to guide this training process. Competitive experimental results confirm that with a learnable blur kernel, the generated defocus map can achieve results comparable to supervised methods. In the single-image defocus deblurring task, the proposed method achieves state-of-the-art results, especially significant improvements in perceptual quality, where PSNR reaches 25.56 dB and LPIPS reaches 0.111.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Phase transition and higher order analysis of $L_q$ regularization under dependence
Authors:
Hanwen Huang,
Peng Zeng,
Qinglong Yang
Abstract:
We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the…
▽ More
We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the formulation $\hat{\mbox{$β$}}(λ,q)=\text{argmin}_{\mbox{$β$}\in{\bf R}^p}\frac{1}{2}\|{\bf y}-{\bf X}{\mbox{$β$}}\|^2_2+λ\|{\mbox{$β$}}\|_q^q$, where $\|\cdot\|_q$ $(0\le q\le 2)$ denotes the $L_q$-norm. In the setting $p,n,k\rightarrow\infty$ with fixed $k/p=ε$ and $n/p=δ$, we derive the asymptotic risk of $\hat{\mbox{$β$}}(λ,q)$ for arbitrary covariance matrix ${\mbox{$Σ$}}$ which generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim}N(0,1)$. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\mbox{$Σ$}}$ in the cases $0\le q< 1$ and $1< q\le 2$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\mbox{$Σ$}}$ on the performance of LQLS in the cases $0\le q< 1$ and $1<q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.
△ Less
Submitted 1 December, 2022; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Visual Commonsense-aware Representation Network for Video Captioning
Authors:
Pengpeng Zeng,
Haonan Zhang,
Lianli Gao,
Xiangpeng Li,
** Qian,
Heng Tao Shen
Abstract:
Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video its…
▽ More
Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video itself without considering the intrinsic visual commonsense knowledge that existed in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple yet effective method, called Visual Commonsense-aware Representation Network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in the video domain, which is utilized in our proposed Visual Concept Selection (VCS) to obtain a video-related concept feature. Next, a Conceptual Integration Generation (CIG) is proposed to enhance the caption generation. Extensive experiments on three publicly video captioning benchmarks: MSVD, MSR-VTT, and VATEX, demonstrate that our method reaches state-of-the-art performance, indicating the effectiveness of our method. In addition, our approach is integrated into the existing method of video question answering and improves this performance, further showing the generalization of our method. Source code has been released at https://github.com/zchoi/VCRN.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Authors:
Pengpeng Zeng,
**kuan Zhu,
**gkuan Song,
Lianli Gao
Abstract:
Studies of image captioning are shifting towards a trend of a fully end-to-end paradigm by leveraging powerful visual pre-trained models and transformer-based generation architecture for more flexible model training and faster inference speed. State-of-the-art approaches simply extract isolated concepts or attributes to assist description generation. However, such approaches do not consider the hi…
▽ More
Studies of image captioning are shifting towards a trend of a fully end-to-end paradigm by leveraging powerful visual pre-trained models and transformer-based generation architecture for more flexible model training and faster inference speed. State-of-the-art approaches simply extract isolated concepts or attributes to assist description generation. However, such approaches do not consider the hierarchical semantic structure in the textual domain, which leads to an unpredictable map** between visual representations and concept words. To this end, we propose a novel Progressive Tree-Structured prototype Network (dubbed PTSN), which is the first attempt to narrow down the scope of prediction words with appropriate semantics by modeling the hierarchical textual semantics. Specifically, we design a novel embedding method called tree-structured prototype, producing a set of hierarchical representative embeddings which capture the hierarchical semantic structure in textual space. To utilize such tree-structured prototypes into visual cognition, we also propose a progressive aggregation module to exploit semantic relationships within the image and prototypes. By applying our PTSN to the end-to-end captioning framework, extensive experiments conducted on MSCOCO dataset show that our method achieves a new state-of-the-art performance with 144.2% (single model) and 146.5% (ensemble of 4 models) CIDEr scores on `Karpathy' split and 141.4% (c5) and 143.9% (c40) CIDEr scores on the official online test server. Trained models and source code have been released at: https://github.com/NovaMind-Z/PTSN.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric
Authors:
Pengxin Zeng,
Yunfan Li,
Peng Hu,
Dezhong Peng,
Jiancheng Lv,
Xi Peng
Abstract:
Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo**…
▽ More
Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo** a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}.
△ Less
Submitted 20 April, 2023; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Experimental mode-pairing measurement-device-independent quantum key distribution without global phase-locking
Authors:
Hao-Tao Zhu,
Yizhi Huang,
Hui Liu,
Pei Zeng,
Mi Zou,
Yunqi Dai,
Shibiao Tang,
Hao Li,
Lixing You,
Zhen Wang,
Yu-Ao Chen,
Xiongfeng Ma,
Teng-Yun Chen,
Jian-Wei Pan
Abstract:
In the past two decades, quantum key distribution networks based on telecom fibers have been implemented on metropolitan and intercity scales. One of the bottlenecks lies in the exponential decay of the key rate with respect to the transmission distance. Recently proposed schemes mainly focus on achieving longer distances by creating a long-arm single-photon interferometer over two communication p…
▽ More
In the past two decades, quantum key distribution networks based on telecom fibers have been implemented on metropolitan and intercity scales. One of the bottlenecks lies in the exponential decay of the key rate with respect to the transmission distance. Recently proposed schemes mainly focus on achieving longer distances by creating a long-arm single-photon interferometer over two communication parties. Despite their advantageous performance over long communication distances, the requirement of phase-locking between two independent lasers is technically challenging. By adopting the recently-proposed mode-pairing idea, we realize high-performance quantum key distribution without global phase-locking. Using two independent off-the-shelf lasers, we show a quadratic key-rate improvement over the conventional measurement-device-independent schemes in the regime of metropolitan and intercity distances. For longer distances, we also boost the key rate performance by three orders of magnitude via 304 km commercial fiber and 407 km ultra-low-loss fiber. We expect this ready-to-implement high-performance scheme to be widely used in future intercity quantum communication networks.
△ Less
Submitted 9 February, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
Authors:
Chaofan Zheng,
Lianli Gao,
Xinyu Lyu,
Pengpeng Zeng,
Abdulmotaleb El Saddik,
Heng Tao Shen
Abstract:
The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of ta…
▽ More
The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
Authors:
Xinyu Lyu,
Lianli Gao,
Pengpeng Zeng,
Heng Tao Shen,
**gkuan Song
Abstract:
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classif…
▽ More
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in kee** with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
Learning To Generate Scene Graph from Head to Tail
Authors:
Chaofan Zheng,
Xinyu Lyu,
Yuyu Guo,
Pengpeng Zeng,
**gkuan Song,
Lianli Gao
Abstract:
Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates lead…
▽ More
Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates leads to semantic deviation. Based on this, we propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT), containing Curriculum Re-weight Mechanism (CRM) and Semantic Context Module (SCM). CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones. SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations. Experiments show that SGG-HT significantly alleviates the biased problem and chieves state-of-the-art performances on Visual Genome.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Delay-aware Multiple Access Design for Intelligent Reflecting Surface Aided Uplink Transmission
Authors:
Piao Zeng,
Guangji Chen,
Qingqing Wu,
Deli Qiao,
Abbas Jamalipour
Abstract:
In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are co…
▽ More
In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are considered, where the device's rearranged order, time and power allocation, as well as dynamic IRS beamforming patterns over time are jointly optimized to minimize the sum transmission delay. To shed light on the superiority of the proposed IRS-aided hybrid MA (I-HMA) protocol over conventional protocols, the conditions under which I-HMA outperforms I-TDMA and I-NOMA are revealed by characterizing their corresponding optimal solution. Then, a computationally efficient algorithm is proposed to obtain the high-quality solution to the corresponding optimization problems. Simulation results validate our theoretical findings, demonstrate the superiority of the proposed design, and draw some useful insights. Specifically, it is found that the proposed protocol can significantly reduce the sum transmission delay by combining the additional gain of dynamic IRS beamforming with the high spectral efficiency of NOMA, which thus reveals that integrating IRS into the proposed HMA protocol is an effective solution for delay-aware optimization. Furthermore, it reveals that the proposed design reduces the time consumption not only from the system-centric view, but also from the device-centric view.
△ Less
Submitted 26 June, 2023; v1 submitted 18 June, 2022;
originally announced June 2022.
-
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Authors:
**gkuan Song,
Pengpeng Zeng,
Lianli Gao,
Heng Tao Shen
Abstract:
Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features a…
▽ More
Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features are naturally spatial and channel-wise. Also, visual attention models are usually conducted on pixel-level, which may cause region discontinuous problems. In this paper, we propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task. Specifically, instead of attending to pixels, we first take advantage of the object proposal networks to generate a set of object candidates and extract their associated conv features. Then, we utilize the question to guide channel attention and spatial attention calculation based on the con-layer feature map. Finally, the attended visual features and the question are combined to infer the answer. We assess the performance of our proposed CVA on three public image QA datasets, including COCO-QA, VQA and Visual7W. Experimental results show that our proposed method significantly outperforms the state-of-the-arts.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Structured Two-stream Attention Network for Video Question Answering
Authors:
Lianli Gao,
Pengpeng Zeng,
**gkuan Song,
Yuan-Fang Li,
Wu Liu,
Tao Mei,
Heng Tao Shen
Abstract:
To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures o…
▽ More
To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich long-range temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset \textit{TGIF-QA} show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
scICML: Information-theoretic Co-clustering-based Multi-view Learning for the Integrative Analysis of Single-cell Multi-omics data
Authors:
Pengcheng Zeng,
Zhixiang Lin
Abstract:
Modern high-throughput sequencing technologies have enabled us to profile multiple molecular modalities from the same single cell, providing unprecedented opportunities to assay celluar heterogeneity from multiple biological layers. However, the datasets generated from these technologies tend to have high level of noise and are highly sparse, bringing challenges to data analysis. In this paper, we…
▽ More
Modern high-throughput sequencing technologies have enabled us to profile multiple molecular modalities from the same single cell, providing unprecedented opportunities to assay celluar heterogeneity from multiple biological layers. However, the datasets generated from these technologies tend to have high level of noise and are highly sparse, bringing challenges to data analysis. In this paper, we develop a novel information-theoretic co-clustering-based multi-view learning (scICML) method for multi-omics single-cell data integration. scICML utilizes co-clusterings to aggregate similar features for each view of data and uncover the common clustering pattern for cells. In addition, scICML automatically matches the clusters of the linked features across different data types for considering the biological dependency structure across different types of genomic features. Our experiments on four real-world datasets demonstrate that scICML improves the overall clustering performance and provides biological insights into the data analysis of peripheral blood mononuclear cells.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Support-set based Multi-modal Representation Enhancement for Video Captioning
Authors:
Xiaoya Chen,
**gkuan Song,
Pengpeng Zeng,
Lianli Gao,
Heng Tao Shen
Abstract:
Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes. Existing methods follow a typical one-to-one map**, which concentrates on a limited sample space while ignoring the intrinsic semantic associations between samples, resulting in rigid and uninformative expressions. To address this issue, we propose a novel and flexible framework, namely Support-se…
▽ More
Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes. Existing methods follow a typical one-to-one map**, which concentrates on a limited sample space while ignoring the intrinsic semantic associations between samples, resulting in rigid and uninformative expressions. To address this issue, we propose a novel and flexible framework, namely Support-set based Multi-modal Representation Enhancement (SMRE) model, to mine rich information in a semantic subspace shared between samples. Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements. During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way. Extensive experiments on MSVD and MSR-VTT datasets demonstrate that our SMRE achieves state-of-the-art performance.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Scalable fast benchmarking for individual quantum gates with local twirling
Authors:
Yihong Zhang,
Wenjun Yu,
Pei Zeng,
Guoding Liu,
Xiongfeng Ma
Abstract:
With the development of controllable quantum systems, fast and practical characterization for multi-qubit gates is essential for building high-fidelity quantum computing devices. The usual way to fulfill this requirement via randomized benchmarking asks for the complicated implementation of numerous multi-qubit twirling gates. How to efficiently and reliably estimate the fidelity of a quantum proc…
▽ More
With the development of controllable quantum systems, fast and practical characterization for multi-qubit gates is essential for building high-fidelity quantum computing devices. The usual way to fulfill this requirement via randomized benchmarking asks for the complicated implementation of numerous multi-qubit twirling gates. How to efficiently and reliably estimate the fidelity of a quantum process remains an open problem. In this work, we propose a character-cycle benchmarking protocol and a character-average benchmarking protocol only using local twirling gates to estimate the process fidelity of an individual multi-qubit operation. Our protocols can characterize a large class of quantum gates including and beyond the Clifford group via the local gauge transformation, which forms a universal gate set for quantum computing. We numerically demonstrate our protocols for a non-Clifford gate -- controlled-$(TX)$ and a Clifford gate -- five-qubit quantum error-correcting encoding circuit. The numerical results show that our protocols can efficiently and reliably characterize the gate process fidelities. Compared with the cross-entropy benchmarking, the simulation results show that the character-average benchmarking achieves three orders of magnitude improvements in terms of sampling complexity.
△ Less
Submitted 9 February, 2023; v1 submitted 19 March, 2022;
originally announced March 2022.
-
Soliton Microcombs in Integrated Chalcogenide Microresonators
Authors:
Di Xia,
Zelin Yang,
**yang Zeng,
Bin Zhang,
Jiayue Wu,
Zifu Wang,
Jiaxin Zhao,
Mingqi Gao,
Yufei Huang,
Jianteng Huang,
Liyang Luo,
Dong Liu,
Shuixian Yang,
Hairun Guo,
Zhaohui Li
Abstract:
Photonic integrated microcombs have enabled advanced applications in optical communication, microwave synthesis, and optical metrology, which in nature unveil an optical dissipative soliton pattern under cavity-enhanced nonlinear processes. The most decisive factor of microcombs lies in the photonic material platforms, where materials with high nonlinearity and in capacity of high-quality chip int…
▽ More
Photonic integrated microcombs have enabled advanced applications in optical communication, microwave synthesis, and optical metrology, which in nature unveil an optical dissipative soliton pattern under cavity-enhanced nonlinear processes. The most decisive factor of microcombs lies in the photonic material platforms, where materials with high nonlinearity and in capacity of high-quality chip integration are highly demanded. In this work, we present a home-developed chalcogenide glasses-Ge25Sb10S65 (GeSbS) for the nonlinear photonic integration and for the dissipative soliton microcomb generation. Compared with the current integrated nonlinear platforms, the GeSbS features wider transparency from the visible to 11 um region, stronger nonlinearity, and lower thermo-refractive coefficient, and is CMOS compatible in fabrication. In this platform, we achieve chip-integrated optical microresonators with a quality (Q) factor above 2 x 10^6, and carry out lithographically controlled dispersion engineering. In particular, we demonstrate that both a bright soliton-based microcomb and a dark-pulsed comb are generated in a single microresonator, in its separated fundamental polarized mode families under different dispersion regimes. The overall pum** power is on the ten-milliwatt level, determined by both the high Q-factor and the high material nonlinearity of the microresonator. Our results may contribute to the field of nonlinear photonics with an alternative material platform for highly compact and high-intensity nonlinear interactions, while on the application aspect, contribute to the development of soliton microcombs at low operation power, which is potentially required for monolithically integrated optical frequency combs.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation
Authors:
Xiaoshuai Zhang,
Rui Chen,
Ang Li,
Fanbo Xiang,
Yuzhe Qin,
Jiayuan Gu,
Zhan Ling,
Minghua Liu,
Peiyu Zeng,
Songfang Han,
Zhiao Huang,
Tongzhou Mu,
**g Xu,
Hao Su
Abstract:
In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able…
▽ More
In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor in real time. We conduct real experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to the real-world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in develo** robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities.
△ Less
Submitted 5 January, 2023; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Quantum key distribution surpassing the repeaterless rate-transmittance bound without global phase locking
Authors:
Pei Zeng,
Hongyi Zhou,
Weijie Wu,
Xiongfeng Ma
Abstract:
Quantum key distribution -- the establishment of information-theoretically secure keys based on quantum physics -- is mainly limited by its practical performance, which is characterised by the dependence of the key rate on the channel transmittance $R(η)$. Recently, schemes based on single-photon interference have been proposed to improve the key rate to $R=O(\sqrtη)$ by overcoming the point-to-po…
▽ More
Quantum key distribution -- the establishment of information-theoretically secure keys based on quantum physics -- is mainly limited by its practical performance, which is characterised by the dependence of the key rate on the channel transmittance $R(η)$. Recently, schemes based on single-photon interference have been proposed to improve the key rate to $R=O(\sqrtη)$ by overcoming the point-to-point secret key capacity bound with interferometers. Unfortunately, all of these schemes require challenging global phase locking to realise a stable long-arm single-photon interferometer with a precision of approximately 100 nm over fibres that are hundreds of kilometres long. Aiming to address this problem, we propose a mode-pairing measurement-device-independent quantum key distribution scheme in which the encoded key bits and bases are determined during data post-processing. Using conventional second-order interference, this scheme can achieve a key rate of $R=O(\sqrtη)$ without global phase locking when the local phase fluctuation is mild. We expect this high-performance scheme to be ready-to-implement with off-the-shelf optical devices.
△ Less
Submitted 30 January, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
Quantum Complementarity Approach to Device-Independent Security
Authors:
Xingjian Zhang,
Pei Zeng,
Tian Ye,
Hoi-Kwong Lo,
Xiongfeng Ma
Abstract:
Complementarity is an essential feature of quantum mechanics. The preparation of an eigenstate of one observable implies complete randomness in its complementary observable. In quantum cryptography, complementarity allows us to formulate security analyses in terms of phase-error correction. However, in the device-independent regime that offers security without device characterization, the concept…
▽ More
Complementarity is an essential feature of quantum mechanics. The preparation of an eigenstate of one observable implies complete randomness in its complementary observable. In quantum cryptography, complementarity allows us to formulate security analyses in terms of phase-error correction. However, in the device-independent regime that offers security without device characterization, the concept becomes much subtler. Security proofs of device-independent quantum cryptography tasks are often complex and quite different from those of their more standard device-dependent cousins. The existing proofs pose huge challenges to experiments, among which large data-size requirement is a crux. Here, we show the complementarity security origin of the device-independent tasks. By linking complementarity with quantum nonlocality, we recast the device-independent scheme into a quantum error correction protocol. Going beyond the identical-and-independent-distribution case, we consider the most general attack. We generalize the sample entropy in classical Shannon theory for the finite-size analysis. Our method exhibits good finite-size performance and brings the device-independent scheme to a more practical regime. Applying it to the data in a recent ion-trap-based device-independent quantum key distribution experiment, one could reduce the requirement on data size to less than a third. Furthermore, the complementarity approach can be naturally extended to advantage key distillation to ease experiments by tolerating higher loss and lower transmittance.
△ Less
Submitted 11 October, 2022; v1 submitted 27 November, 2021;
originally announced November 2021.
-
Throughput Maximization for Active Intelligent Reflecting Surface Aided Wireless Powered Communications
Authors:
Piao Zeng,
Deli Qiao,
Qingqing Wu,
Yuan Wu
Abstract:
This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an a…
▽ More
This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an amplifying power-limited weighted sum throughput (WST) maximization problem, which is solved by successive convex approximation technique and fractional programming alternatively. To balance the performance and complexity tradeoff, three beamforming setups are considered at the active IRS, namely user-adaptive IRS beamforming, uplink-adaptive IRS beamforming, and static IRS beamforming. Numerical results demonstrate the significant superiority of employing active IRS in WPCNs and the benefits of dynamic IRS beamforming. Specifically, it is found that compared to the passive IRS, the active IRS not only improves the WST greatly, but also is more energy-efficient and can significantly extend the transmission coverage. Moreover, different from the symmetric deployment strategy of passive IRS, it is more preferable to deploy the active IRS near the devices.
△ Less
Submitted 11 January, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Bootstrap** Calabi-Yau Quantum Mechanics
Authors:
Bao-ning Du,
Min-xin Huang,
Pei-xuan Zeng
Abstract:
Recently, a novel bootstrap method for numerical calculations in matrix models and quantum mechanical systems is proposed. We apply the method to certain quantum mechanical systems derived from some well-known local toric Calabi-Yau geometries, where the exact quantization conditions have been conjecturally related to topological string theory. We find that the bootstrap method provides a promisin…
▽ More
Recently, a novel bootstrap method for numerical calculations in matrix models and quantum mechanical systems is proposed. We apply the method to certain quantum mechanical systems derived from some well-known local toric Calabi-Yau geometries, where the exact quantization conditions have been conjecturally related to topological string theory. We find that the bootstrap method provides a promising alternative for the precision numerical calculations of the energy eigenvalues. An improvement in our approach is to use a larger set of two-dimensional operators instead of one-dimensional ones. We also apply our improved bootstrap methods to some non-relativistic models in the recent literature and demonstrate better numerical accuracies.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Universal quantum algorithmic cooling on a quantum computer
Authors:
Pei Zeng,
**zhao Sun,
Xiao Yuan
Abstract:
Quantum cooling, a deterministic process that drives any state to the lowest eigenstate, has been widely used from studying ground state properties of chemistry and condensed matter quantum physics, to general optimization problems. However, the cooling procedure is generally non-unitary, hence its realization on a quantum computer either requires deep circuits or assumes specific input states wit…
▽ More
Quantum cooling, a deterministic process that drives any state to the lowest eigenstate, has been widely used from studying ground state properties of chemistry and condensed matter quantum physics, to general optimization problems. However, the cooling procedure is generally non-unitary, hence its realization on a quantum computer either requires deep circuits or assumes specific input states with variational circuits. Here, we propose universal quantum cooling algorithms that overcome these limitations. By utilizing a dual phase representation of decaying functions, we show how to universally and deterministically realize a general cooling procedure with shallow quantum circuits. We demonstrate its applications in cooling an arbitrary input state with known ground state energy, corresponding to satisfactory, linear algebra tasks, and quantum state compiling tasks, and preparing unknown eigenvalues and eigenstates, corresponding to quantum many-body problems. Compared to quantum phase estimation, our method uses only one ancillary qubit and much shallower circuits, showing exponential improvement of the circuit complexity with respect to the final state infidelity. We numerically benchmark the algorithms for the $8$-qubit Heisenberg model and verify its feasibility for accurately finding eigenenergies and obtaining eigenstate measurements. Our work paves the way for efficient and universal quantum algorithmic cooling with near-term as well as universal fault-tolerant quantum devices.
△ Less
Submitted 2 June, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
A Model-free Variable Screening Method Based on Leverage Score
Authors:
Wenxuan Zhong,
Yiwen Liu,
Peng Zeng
Abstract:
With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational ch…
▽ More
With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singular value decompositions has been proposed to select rows of a design matrix as a surrogate of the full data in linear regression. Analogously, variable screening can be viewed as selecting rows of the design matrix. However, effective variable selection along this line of thinking remains elusive. In this article, we bridge this gap to propose a weighted leverage variable screening method by utilizing both the left and right singular vectors of the design matrix. We show theoretically and empirically that the predictors selected using our method can consistently include true predictors not only for linear models but also for complicated general index models. Extensive simulation studies show that the weighted leverage screening method is highly computationally efficient and effective. We also demonstrate its success in identifying carcinoma related genes using spatial transcriptome data.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Reference-frame-independent design of phase-matching quantum key distribution
Authors:
Anran **,
Pei Zeng,
Richard V. Penty,
Xiongfeng Ma
Abstract:
The recently proposed phase-matching quantum key distribution offers means to overcome the linear key rate-transmittance bound. Since the key information is encoded onto the phases of coherent states, the misalignment between the two remote reference frames would yield errors and significantly degrade the key generation rate from the ideal case. In this work, we propose a reference-frame-independe…
▽ More
The recently proposed phase-matching quantum key distribution offers means to overcome the linear key rate-transmittance bound. Since the key information is encoded onto the phases of coherent states, the misalignment between the two remote reference frames would yield errors and significantly degrade the key generation rate from the ideal case. In this work, we propose a reference-frame-independent design of phase-matching quantum key distribution by introducing high-dimensional key encoding space. With encoded phases spanning the unit circle, the error statistics at arbitrary fixed phase reference difference can be recovered and treated separately, from which the misalignment angle can be identified. By naturally extending the binary encoding symmetry and complementarity to high dimensions, we present a security proof of this high-dimensional phase-matching quantum key distribution and demonstrate with simulation that a 17-dimensional protocol is completely immune to any degree of fixed misalignment and robust to slow phase fluctuations. We expect the high-dimensional protocol to be a practical reference-frame-independent design for general phase-encoding schemes where high-dimensional encoding is relatively easy to implement.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Contrastive Learning with Temporal Correlated Medical Images: A Case Study using Lung Segmentation in Chest X-Rays
Authors:
Dewen Zeng,
John N. Kheir,
Peng Zeng,
Yiyu Shi
Abstract:
Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we…
▽ More
Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we use lung segmentation in chest X-rays as a case study and propose a contrastive learning framework with temporal correlated medical images, named CL-TCI, to learn superior encoders for initializing the segmentation network. We adapt CL-TCI from two state-of-the-art contrastive learning methods-MoCo and SimCLR. Experiment results on three chest X-ray datasets show that under two different segmentation backbones, U-Net and Deeplab-V3, CL-TCI can outperform all baselines that do not incorporate any temporal correlation in both semi-supervised learning setting and transfer learning setting with limited annotation. This suggests that information among temporal correlated medical images can indeed improve contrastive learning performance. Between the two variations of CL-TCI, CL-TCI adapted from MoCo outperforms CL-TCI adapted from SimCLR in most settings, indicating that more contrastive samples can benefit the learning process and help the network learn high-quality representations.
△ Less
Submitted 16 September, 2021; v1 submitted 6 September, 2021;
originally announced September 2021.