Search | arXiv e-print repository

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Authors: Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, **gqi Tong, Conghui He, Yu Cheng

Abstract: Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B mod… ▽ More Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B model, we obtain an MoE model by: (1) Expert Construction, which partitions the parameters of original Feed-Forward Networks (FFNs) into multiple experts; (2) Continual Pre-training, which further trains the transformed MoE model and additional gate networks. In this paper, we comprehensively explore different methods for expert construction and various data sampling strategies for continual pre-training. After these stages, our LLaMA-MoE models could maintain language abilities and route the input tokens to specific experts with part of the parameters activated. Empirically, by training 200B tokens, LLaMA-MoE-3.5B models significantly outperform dense models that contain similar activation parameters. The source codes and models are available at https://github.com/pjlab-sys4nlp/llama-moe . △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.11256 [pdf, other]

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Authors: Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

Abstract: Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c… ▽ More Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data cannot be effectively distinguished, leading to suboptimal model performance. To reduce the potential redundancies of datasets, we make the first attempt and propose a novel dynamic data mixture for MoE instruction tuning. Specifically, inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets. Finally, we propose to dynamically adjust the sampling weight of datasets by their inter-redundancies, thus maximizing global performance under a limited training budget. The experimental results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge \& reasoning tasks and open-ended queries. Code and models are available at https://github.com/Spico197/MoE-SFT . △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05597 [pdf, other]

doi 10.1103/PhysRevA.109.063508

Optimal control of linear Gaussian quantum systems via quantum learning control

Authors: Yu-Hong Liu, Yexiong Zeng, Qing-Shou Tan, Daoyi Dong, Franco Nori, Jie-Qiao Liao

Abstract: Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing f… ▽ More Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing first- and second-order moments that completely describe the quantum state of LGQ systems. We demonstrate both deep optomechanical cooling and large optomechanical entanglement using this approach. Our approach enables the fast and deep ground-state cooling of a mechanical resonator within a short time, surpassing the limitations of sideband cooling in the continuous-wave driven strong-coupling regime. Furthermore, optomechanical entanglement could be generated remarkably fast and surpass several times the corresponding steady-state entanglement, even when the thermal phonon occupation reaches one hundred. This work will not only broaden the application of quantum learning control, but also open an avenue for optimal control of LGQ systems. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 14 pages, 7 figures

Journal ref: Phys. Rev. A 109, 063508 (2024)

arXiv:2406.02500 [pdf, other]

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework

Authors: Shwai He, Daize Dong, Liang Ding, Ang Li

Abstract: Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE intro… ▽ More Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE introduces potential redundancy (e.g., parameters) and extra costs (e.g., communication overhead). Despite numerous compression techniques developed for mitigating the redundancy in dense models, the compression of MoE remains under-explored. We first bridge this gap with a cutting-edge unified framework that not only seamlessly integrates mainstream compression methods but also helps systematically understand MoE compression. This framework approaches compression from two perspectives: Expert Slimming which compresses individual experts and Expert Trimming which removes structured modules. Within this framework, we explore the optimization space unexplored by existing methods,and further introduce aggressive Expert Trimming techniques, i.e., Layer Drop and Block Drop, to eliminate redundancy at larger scales. Based on these insights,we present a comprehensive recipe to guide practitioners in compressing MoE effectively. Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6.05x speedup and only 20.0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B. Code is released at \url{https://github.com/DaizeDong/Unified-MoE-Compression}. △ Less

Submitted 24 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 20 pages, 15 figures, 5 tables

arXiv:2405.17870 [pdf, other]

Full-Stack Allreduce on Multi-Rail Networks

Authors: Enda Yu, Dezun Dong, Xiangke Liao

Abstract: The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale netw… ▽ More The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale networks by facilitating collaborative data transfer across various networks. To achieve this, we propose the Nezha system, which integrates TCP, in-network computing protocol SHARP, and RDMA-based protocol GLEX. To maximize data transfer rates, Nezha incorporates a load balancing data allocation scheme based on cost feedback and combines exception handling to achieve reliable data transmission. Our experiments on a six-node cluster demonstrate that Nezha significantly enhances allreduce performance by 58\% to 87\% in homogeneous dual-rail configurations and offers considerable acceleration in heterogeneous settings, contingent on the performance variance among networks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted to SC'2024

arXiv:2405.06948 [pdf, other]

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

Authors: Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

Abstract: Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these… ▽ More Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 26 pages, 13 figures

arXiv:2404.17005 [pdf, ps, other]

Uncommon linear systems of two equations

Authors: Dingding Dong, Anqi Li, Yufei Zhao

Abstract: A system of linear equations $L$ is common over $\mathbb{F}_p$ if, as $n\to\infty$, any 2-coloring of $\mathbb{F}_p^n$ gives asymptotically at least as many monochromatic solutions to $L$ as a random 2-coloring. The notion of common linear systems is analogous to that of common graphs, i.e., graphs whose monochromatic density in 2-edge-coloring of cliques is asymptotically minimized by the random… ▽ More A system of linear equations $L$ is common over $\mathbb{F}_p$ if, as $n\to\infty$, any 2-coloring of $\mathbb{F}_p^n$ gives asymptotically at least as many monochromatic solutions to $L$ as a random 2-coloring. The notion of common linear systems is analogous to that of common graphs, i.e., graphs whose monochromatic density in 2-edge-coloring of cliques is asymptotically minimized by the random coloring. Saad and Wolf initiated a systematic study on identifying common linear systems, built upon the earlier work of Cameron-Cilleruelo-Serra. When $L$ is a single equation, Fox-Pham-Zhao gave a complete characterization of common linear equations. When $L$ consists of two equations, Kamčev-Liebenau-Morrison showed that irredundant $2\times 4$ linear systems are always uncommon. In this work, (1) we determine commonness of all $2\times 5$ linear systems up to a small number of cases, and (2) we show that all $2\times k$ linear systems with $k$ even and girth (minimum number of nonzero coefficients of a nonzero equation spanned by the system) $k-1$ are uncommon, answering a question of Kamčev-Liebenau-Morrison. △ Less

Submitted 21 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 59 pages, 1 figure

arXiv:2404.13391 [pdf, other]

Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

Authors: Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong

Abstract: The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the s… ▽ More The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2403.19251 [pdf, other]

Arbitrary State Transition of Open Qubit System Based on Switching Control

Authors: Guangpu Wu, Shibei Xue, Shan Ma, Sen Kuang, Daoyi Dong, Ian R. Petersen

Abstract: We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In compar… ▽ More We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In comparison to existing works, this control strategy relaxes the strict constraints on system models imposed by special target states. Furthermore, we identify conditions under which the open qubit system achieves finite-time stability (FTS) and finite-time contractive stability (FTCS), respectively. This represents a critical improvement in quantum state transitions, especially considering the asymptotic stability of arbitrary target states is unattainable in open quantum systems. The effectiveness of our proposed method is convincingly demonstrated through its application in a qubit system affected by various types of decoherence, including amplitude, dephasing and polarization decoherence. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 12 pages, 7 figures

arXiv:2403.15750 [pdf, other]

iDAT: inverse Distillation Adapter-Tuning

Authors: Jiacheng Ruan, **gsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu

Abstract: Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time,… ▽ More Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time, we explore the possibility of combining the AT method with knowledge distillation. Via statistical analysis, we observe significant differences in the knowledge acquisition between adapter modules of different models. Leveraging these differences, we propose a simple yet effective framework called inverse Distillation Adapter-Tuning (iDAT). Specifically, we designate the smaller model as the teacher and the larger model as the student. The two are jointly trained, and online knowledge distillation is applied to inject knowledge of different perspective to student model, and significantly enhance the fine-tuning performance on downstream tasks. Extensive experiments on the VTAB-1K benchmark with 19 image classification tasks demonstrate the effectiveness of iDAT. The results show that using existing AT method within our iDAT framework can further yield a 2.66% performance gain, with only an additional 0.07M trainable parameters. Our approach compares favorably with state-of-the-arts without bells and whistles. Our code is available at https://github.com/JCruan519/iDAT. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 10 pages, 9 figures, 13 tables. This paper has been accepted by ICME 2024

arXiv:2403.09195 [pdf, other]

SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration

Authors: Yanfei Song, Bangzheng Pu, Peng Wang, Hongxu Jiang, Dong Dong, Yongxiang Cao, Yiqing Shen

Abstract: Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability. However, a broader application of SAMs to real-world practice has been restricted by their low inference speed and high computational memory demands, which mainly stem from the attention mechanism. Existing work concentrated on optimizing the encoder, yet has not ade… ▽ More Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability. However, a broader application of SAMs to real-world practice has been restricted by their low inference speed and high computational memory demands, which mainly stem from the attention mechanism. Existing work concentrated on optimizing the encoder, yet has not adequately addressed the inefficiency of the attention mechanism itself, even when distilled to a smaller model, which thus leaves space for further improvement. In response, we introduce SAM-Lightening, a variant of SAM, that features a re-engineered attention mechanism, termed Dilated Flash Attention. It not only facilitates higher parallelism, enhancing processing efficiency but also retains compatibility with the existing FlashAttention. Correspondingly, we propose a progressive distillation to enable an efficient knowledge transfer from the vanilla SAM without costly training from scratch. Experiments on COCO and LVIS reveal that SAM-Lightening significantly outperforms the state-of-the-art methods in both run-time efficiency and segmentation accuracy. Specifically, it can achieve an inference speed of 7 milliseconds (ms) per image, for images of size 1024*1024 pixels, which is 30.1 times faster than the vanilla SAM and 2.1 times than the state-of-the-art. Moreover, it takes only 244MB memory, which is 3.5\% of the vanilla SAM. The code and weights are available at https://anonymous.4open.science/r/SAM-LIGHTENING-BC25/. △ Less

Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.00966 [pdf, other]

Generalized Eulerian Numbers and Directed Friends-and-seats Graphs

Authors: David Dong

Abstract: Let $A(n,m)$ denote the Eulerian numbers, which count the number of permutations on $[n]$ with exactly $m$ descents, or, due to the Foata transform, the number of permutations on $[n]$ with exactly $m$ excedances. Friends-and-seats graphs, also known as friends-and-strangers graphs, are a seemingly unrelated recent construction in graph theory. In this paper, we introduce directed friends-and-seat… ▽ More Let $A(n,m)$ denote the Eulerian numbers, which count the number of permutations on $[n]$ with exactly $m$ descents, or, due to the Foata transform, the number of permutations on $[n]$ with exactly $m$ excedances. Friends-and-seats graphs, also known as friends-and-strangers graphs, are a seemingly unrelated recent construction in graph theory. In this paper, we introduce directed friends-and-seats graphs and establish a connection between these graphs and a generalization of the Eulerian numbers. We use this connection to reprove and extend a Worpitzky-like identity on generalized Eulerian numbers. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 22 pages, 5 figures

MSC Class: 05A05 (Primary) 05C20; 05C31; 05C38 (Secondary)

arXiv:2402.08952 [pdf, other]

A two-stage solution to quantum process tomography: error analysis and optimal design

Authors: Shuixin Xiao, Yuanlong Wang, Jun Zhang, Daoyi Dong, Gary J. Mooney, Ian R. Petersen, Hidehiro Yonezawa

Abstract: Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum sy… ▽ More Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum system and $ M $, $ L $ represent the numbers of different input states and measurement operators, respectively. We establish an analytical error upper bound and then design the optimal input states and the optimal measurement operators, which are both based on minimizing the error upper bound and maximizing the robustness characterized by the condition number. Numerical examples and testing on IBM quantum devices are presented to demonstrate the performance and efficiency of our algorithm. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 41 pages, 7 figures

arXiv:2402.07396 [pdf, other]

Robust Quantum Control via a Model Predictive Control Strategy

Authors: Yunyan Lee, Ian R. Petersen, Daoyi Dong

Abstract: This article presents a robust control strategy using Time-Optimal Model Predictive Control (TOMPC) for a two-level quantum system subject to bounded uncertainties. In this method, the control field is optimized over a finite horizon using a nominal quantum system as the reference and then the optimal control for the first time interval is applied and a projective measurement is implemented on the… ▽ More This article presents a robust control strategy using Time-Optimal Model Predictive Control (TOMPC) for a two-level quantum system subject to bounded uncertainties. In this method, the control field is optimized over a finite horizon using a nominal quantum system as the reference and then the optimal control for the first time interval is applied and a projective measurement is implemented on the uncertain system. The new control field for the next time interval will be iteratively optimized based on the measurement result. We present theoretical results to guarantee the stability of the TOMPC algorithm. We also characterize the robustness and the convergence rate of the TOMPC strategy for the control of two-level systems. Numerical simulations further demonstrate that, in the presence of uncertainties, our quantum TOMPC algorithm enhances robustness and steers the state to the desired state with high fidelity. This work contributes to the progress of Model Predictive Control in quantum control and explores its potential in practical applications of quantum technology. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 22 pages, 3 figures

arXiv:2402.02464 [pdf, other]

A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

Authors: Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li

Abstract: Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce Gr… ▽ More Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}. △ Less

Submitted 29 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02376 [pdf, other]

Variational Quantum AdaBoost with Supervised Learning Guarantee

Authors: Yabo Wang, Xin Wang, Bo Qi, Daoyi Dong

Abstract: Although variational quantum algorithms based on parameterized quantum circuits promise to achieve quantum advantages, in the noisy intermediate-scale quantum (NISQ) era, their capabilities are greatly constrained due to limited number of qubits and depth of quantum circuits. Therefore, we may view these variational quantum algorithms as weak learners in supervised learning. Ensemble methods are a… ▽ More Although variational quantum algorithms based on parameterized quantum circuits promise to achieve quantum advantages, in the noisy intermediate-scale quantum (NISQ) era, their capabilities are greatly constrained due to limited number of qubits and depth of quantum circuits. Therefore, we may view these variational quantum algorithms as weak learners in supervised learning. Ensemble methods are a general technique in machine learning for combining weak learners to construct a more accurate one. In this paper, we theoretically prove and numerically verify a learning guarantee for variational quantum adaptive boosting (AdaBoost). To be specific, we theoretically depict how the prediction error of variational quantum AdaBoost on binary classification decreases with the increase of the number of boosting rounds and sample size. By employing quantum convolutional neural networks, we further demonstrate that variational quantum AdaBoost can not only achieve much higher accuracy in prediction, but also help mitigate the impact of noise. Our work indicates that in the current NISQ era, introducing appropriate ensemble methods is particularly valuable in improving the performance of quantum machine learning algorithms. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 5 figures

arXiv:2401.17526 [pdf, other]

Power Characterization of Noisy Quantum Kernels

Authors: Yabo Wang, Bo Qi, Xin Wang, Tongliang Liu, Daoyi Dong

Abstract: Quantum kernel methods have been widely recognized as one of promising quantum machine learning algorithms that have potential to achieve quantum advantages. In this paper, we theoretically characterize the power of noisy quantum kernels and demonstrate that under global depolarization noise, for different input data the predictions of the optimal hypothesis inferred by the noisy quantum kernel ap… ▽ More Quantum kernel methods have been widely recognized as one of promising quantum machine learning algorithms that have potential to achieve quantum advantages. In this paper, we theoretically characterize the power of noisy quantum kernels and demonstrate that under global depolarization noise, for different input data the predictions of the optimal hypothesis inferred by the noisy quantum kernel approximately concentrate towards some fixed value. In particular, we depict the convergence rate in terms of the strength of quantum noise, the size of training samples, the number of qubits, the number of layers affected by quantum noises, as well as the number of measurement shots. Our results show that noises may make quantum kernel methods to only have poor prediction capability, even when the generalization error is small. Thus, we provide a crucial warning to employ noisy quantum kernel methods for quantum computation and the theoretical results can also serve as guidelines when develo** practical quantum kernel algorithms for achieving quantum advantages. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 3 figures

arXiv:2401.16639 [pdf, ps, other]

Structure of tight (k,0)-stable graphs

Authors: Dingding Dong, Sammy Luo

Abstract: We say that a graph G is $(k,\ell)$-stable if removing $k$ vertices from it reduces its independence number by at most $\ell$. We say that G is tight $(k,\ell)$-stable if it is $(k,\ell)$-stable and its independence number equals $\lfloor{\frac{n-k+1}{2}\rfloor}+\ell$, the maximum possible, where $n$ is the vertex number of G. Answering a question of Dong and Wu, we show that every tight $(2,0)$-s… ▽ More We say that a graph G is $(k,\ell)$-stable if removing $k$ vertices from it reduces its independence number by at most $\ell$. We say that G is tight $(k,\ell)$-stable if it is $(k,\ell)$-stable and its independence number equals $\lfloor{\frac{n-k+1}{2}\rfloor}+\ell$, the maximum possible, where $n$ is the vertex number of G. Answering a question of Dong and Wu, we show that every tight $(2,0)$-stable graph with odd vertex number must be an odd cycle. Moreover, we show that for all $k\geq 3$, every tight $(k,0)$-stable graph has at most $k+6$ vertices. △ Less

Submitted 6 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 7 pages

arXiv:2401.11724 [pdf, other]

Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification

Authors: Chun Liu, Longwei Yang, Dongmei Dong, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang

Abstract: Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corr… ▽ More Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corresponding to the pixels which are located at the boundary of the objects in the hyperspectral images, are hard to classify. These boundary patchs are mixed with multi-class spectral information. Inspired by this, we propose to augment the prototype network with TransMix for few-shot hyperspectrial image classification(APNT). While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation and pay different attentions to different pixels. At the same time, instead of directly using the patches which are cut from the hyperspectral images for training, it randomly mixs up two patches to imitate the boundary patches and uses the synthetic patches to train the model, with the aim to enlarge the number of hard training samples and enhance their diversity. And by following the data agumentation technique TransMix, the attention returned by the transformer is also used to mix up the labels of two patches to generate better labels for synthetic patches. Compared with existing methods, the proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification in our experiments. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.03513 [pdf, other]

Real-time parameter estimation for two-qubit systems based on hybrid control

Authors: Yue Tian, Xiujuan Lu, Sen Kuang, Daoyi Dong

Abstract: In this paper, we consider the real-time parameter estimation problem for a ZZ-coupled system composed of two qubits in the presence of spontaneous emission. To enhance the estimation precision of the coupling coefficient, we first propose two different control schemes, where the first one is feedback control based on quantum-jump detection, and the second one is hybrid control combining Markovian… ▽ More In this paper, we consider the real-time parameter estimation problem for a ZZ-coupled system composed of two qubits in the presence of spontaneous emission. To enhance the estimation precision of the coupling coefficient, we first propose two different control schemes, where the first one is feedback control based on quantum-jump detection, and the second one is hybrid control combining Markovian feedback and Hamiltonian control. The simulation results show that compared with free evolution, both control schemes can improve parameter precision and extend system coherence time. Next, on the basis of the two control schemes, we propose a practical single-parameter quantum recovery protocol based on Bayesian estimation theory. In this protocol, by employing batch-style adaptive measurement rules, parameter recovery is conducted to verify the effectiveness of both control schemes. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 13 pages, 14 figures

arXiv:2401.02708 [pdf, other]

TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis

Authors: Liwen Zhang, Lianzhen Zhong, Fan Yang, Di Dong, Hui Hui, Jie Tian

Abstract: A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not… ▽ More A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples for exact survival time values. Furthermore, the MLE is unbounded and easily subject to outliers (e.g., censored data), which may cause poor performance of modeling. To handle the complexities of learning process and exploit valuable survival time values, we propose a time-adaptive coordinate loss function, TripleSurv, to achieve adaptive adjustments by introducing the differences in the survival time between sample pairs into the ranking, which can encourage the model to quantitatively rank relative risk of pairs, ultimately enhancing the accuracy of predictions. Most importantly, the TripleSurv is proficient in quantifying the relative risk between samples by ranking ordering of pairs, and consider the time interval as a trade-off to calibrate the robustness of model over sample distribution. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset. The results show that our method outperforms the state-of-the-art methods and exhibits good model performance and robustness on modeling various sophisticated data distributions with different censor rates. Our code will be available upon acceptance. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 9 pages,6 figures

arXiv:2401.01571 [pdf, other]

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

Authors: Xiaoheng Xie, Gang Fan, Xiaojun Lin, Ang Zhou, Shijie Li, Xun** Zheng, Yinan Liang, Yu Zhang, Na Yu, Haokun Li, Xinyu Chen, Yingzhuang Chen, Yi Zhen, Dejun Dong, Xian** Fu, **zhou Su, Fuxiong Pan, Pengshuai Luo, Youzheng Feng, Ruoxiang Hu, **g Fan, **guo Zhou, Xiao Xiao, Peng Di

Abstract: In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data compu… ▽ More In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2311.07766 [pdf, other]

Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain

Authors: Dota Tianai Dong, Mariya Toneva

Abstract: Integrating information from multiple modalities is arguably one of the essential prerequisites for grounding artificial intelligence systems with an understanding of the real world. Recent advances in video transformers that jointly learn from vision, text, and sound over time have made some progress toward this goal, but the degree to which these models integrate information from modalities stil… ▽ More Integrating information from multiple modalities is arguably one of the essential prerequisites for grounding artificial intelligence systems with an understanding of the real world. Recent advances in video transformers that jointly learn from vision, text, and sound over time have made some progress toward this goal, but the degree to which these models integrate information from modalities still remains unclear. In this work, we present a promising approach for probing a pre-trained multimodal video transformer model by leveraging neuroscientific evidence of multimodal information processing in the brain. Using brain recordings of participants watching a popular TV show, we analyze the effects of multi-modal connections and interactions in a pre-trained multi-modal video transformer on the alignment with uni- and multi-modal brain regions. We find evidence that vision enhances masked prediction performance during language processing, providing support that cross-modal representations in models can benefit individual modalities. However, we don't find evidence of brain-relevant information captured by the joint multi-modal transformer representations beyond that captured by all of the individual modalities. We finally show that the brain alignment of the pre-trained joint representation can be improved by fine-tuning using a task that requires vision-language inferences. Overall, our results paint an optimistic picture of the ability of multi-modal transformers to integrate vision and language in partially brain-relevant ways but also show that improving the brain alignment of these models may require new approaches. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.01120 [pdf, other]

EHA: Entanglement-variational Hardware-efficient Ansatz for Eigensolvers

Authors: Xin Wang, Bo Qi, Yabo Wang, Daoyi Dong

Abstract: Variational quantum eigensolvers (VQEs) are one of the most important and effective applications of quantum computing, especially in the current noisy intermediate-scale quantum (NISQ) era. There are mainly two ways for VQEs: problem-agnostic and problem-specific. For problem-agnostic methods, they often suffer from trainability issues. For problem-specific methods, their performance usually relie… ▽ More Variational quantum eigensolvers (VQEs) are one of the most important and effective applications of quantum computing, especially in the current noisy intermediate-scale quantum (NISQ) era. There are mainly two ways for VQEs: problem-agnostic and problem-specific. For problem-agnostic methods, they often suffer from trainability issues. For problem-specific methods, their performance usually relies upon choices of initial reference states which are often hard to determine. In this paper, we propose an Entanglement-variational Hardware-efficient Ansatz (EHA), and numerically compare it with some widely used ansatzes by solving benchmark problems in quantum many-body systems and quantum chemistry. Our EHA is problem-agnostic and hardware-efficient, especially suitable for NISQ devices and having potential for wide applications. EHA can achieve a higher level of accuracy in finding ground states and their energies in most cases even compared with problem-specific methods. The performance of EHA is robust to choices of initial states and parameters initialization and it has the ability to quickly adjust the entanglement to the required amount, which is also the fundamental reason for its superiority. △ Less

Submitted 15 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 18 pages, 23 figures

arXiv:2310.20421 [pdf, other]

Two-stage solution for ancilla-assisted quantum process tomography: error analysis and optimal design

Authors: Shuixin Xiao, Yuanlong Wang, Daoyi Dong, Jun Zhang

Abstract: Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has… ▽ More Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has $O(Md_A^2d_B^2)$ computational complexity where $ M $ is the type number of the measurement operators, $ d_A $ is the dimension of the quantum system of interest, and $d_B$ is the dimension of the ancilla system. Then we establish an error upper bound and further discuss the optimal design on the input state in AAPT. A numerical example on a phase dam** process demonstrates the effectiveness of the optimal design and illustrates the theoretical error analysis. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 6 pages, 3 figures

arXiv:2310.15204 [pdf]

Mid-Long Term Daily Electricity Consumption Forecasting Based on Piecewise Linear Regression and Dilated Causal CNN

Authors: Zhou Lan, Ben Liu, Yi Feng, Danhuang Dong, Peng Zhang

Abstract: Daily electricity consumption forecasting is a classical problem. Existing forecasting algorithms tend to have decreased accuracy on special dates like holidays. This study decomposes the daily electricity consumption series into three components: trend, seasonal, and residual, and constructs a two-stage prediction method using piecewise linear regression as a filter and Dilated Causal CNN as a pr… ▽ More Daily electricity consumption forecasting is a classical problem. Existing forecasting algorithms tend to have decreased accuracy on special dates like holidays. This study decomposes the daily electricity consumption series into three components: trend, seasonal, and residual, and constructs a two-stage prediction method using piecewise linear regression as a filter and Dilated Causal CNN as a predictor. The specific steps involve setting breakpoints on the time axis and fitting the piecewise linear regression model with one-hot encoded information such as month, weekday, and holidays. For the challenging prediction of the Spring Festival, distance is introduced as a variable using a third-degree polynomial form in the model. The residual sequence obtained in the previous step is modeled using Dilated Causal CNN, and the final prediction of daily electricity consumption is the sum of the two-stage predictions. Experimental results demonstrate that this method achieves higher accuracy compared to existing approaches. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Key words: Daily electricity consumption forecasting; time series decomposition; piecewise linear regression; Dilated Causal CNN

arXiv:2310.05062 [pdf, ps, other]

Local to Global: A Distributed Quantum Approximate Optimization Algorithm for Pseudo-Boolean Optimization Problems

Authors: Bo Yue, Shibei Xue, Yu Pan, Min Jiang, Daoyi Dong

Abstract: With the rapid advancement of quantum computing, Quantum Approximate Optimization Algorithm (QAOA) is considered as a promising candidate to demonstrate quantum supremacy, which exponentially solves a class of Quadratic Unconstrained Binary Optimization (QUBO) problems. However, limited qubit availability and restricted coherence time challenge QAOA to solve large-scale pseudo-Boolean problems on… ▽ More With the rapid advancement of quantum computing, Quantum Approximate Optimization Algorithm (QAOA) is considered as a promising candidate to demonstrate quantum supremacy, which exponentially solves a class of Quadratic Unconstrained Binary Optimization (QUBO) problems. However, limited qubit availability and restricted coherence time challenge QAOA to solve large-scale pseudo-Boolean problems on currently available Near-term Intermediate Scale Quantum (NISQ) devices. In this paper, we propose a distributed QAOA which can solve a general pseudo-Boolean problem by converting it to a simplified Ising model. Different from existing distributed QAOAs' assuming that local solutions are part of a global one, which is not often the case, we introduce community detection using Louvian algorithm to partition the graph where subgraphs are further compressed by community representation and merged into a higher level subgraph. Recursively and backwards, local solutions of lower level subgraphs are updated by heuristics from solutions of higher level subgraphs. Compared with existing methods, our algorithm incorporates global heuristics into local solutions such that our algorithm is proven to achieve a higher approximation ratio and outperforms across different graph configurations. Also, ablation studies validate the effectiveness of each component in our method. △ Less

Submitted 9 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: 12 pages, 6 figures

arXiv:2310.03791 [pdf, other]

VLASS tidal disruption events with optical flares I: the sample and a comparison to optically-selected TDEs

Authors: Jean J. Somalwar, Vikram Ravi, Dillon Z. Dong, Erica Hammerstein, Gregg Hallinan, Casey Law, Jessie Miller, Steven T. Myers, Yuhan Yao, Richard Dekany, Matthew Graham, Steven L. Groom, Josiah Purdum, Avery Wold

Abstract: In this work, we use the Jansky VLA Sky Survey (VLASS) to compile the first sample of six radio-selected tidal disruption events (TDEs) with transient optical counterparts. While we still lack the statistics to do detailed population studies of radio-selected TDEs, we use these events to suggest trends in host galaxy and optical light curve properties that may correlate with the presence of radio… ▽ More In this work, we use the Jansky VLA Sky Survey (VLASS) to compile the first sample of six radio-selected tidal disruption events (TDEs) with transient optical counterparts. While we still lack the statistics to do detailed population studies of radio-selected TDEs, we use these events to suggest trends in host galaxy and optical light curve properties that may correlate with the presence of radio emission, and hence can inform optically-selected TDE radio follow-up campaigns. We find that radio-selected TDEs tend to have faint and cool optical flares, as well as host galaxies with low SMBH masses. Our radio-selected TDEs also tend to have more energetic, larger radio emitting regions than radio-detected, optically-selected TDEs. We consider possible explanations for these trends, including by invoking super-Eddington accretion and enhanced circumnuclear media. Finally, we constrain the radio-emitting TDE rate to be $\gtrsim 10$ Gpc$^{-3}$ yr$^{-1}$. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 26 pages, 5 tables, 11 figures, submitted to ApJ

arXiv:2310.00518 [pdf, other]

Learning Informative Latent Representation for Quantum State Tomography

Authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Dong Gong

Abstract: Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix and the corresponding properties of the… ▽ More Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix and the corresponding properties of the quantum state from the measured frequencies. Although an informationally complete set of measurements can specify quantum state accurately in an ideal scenario with a large number of identical copies, both measurements and identical copies are restricted and imperfect in practical scenarios, making QST highly ill-posed. The conventional QST methods usually assume adequate or accurate measured frequencies or rely on manually designed regularizers to handle the ill-posed reconstruction problem, suffering from limited applications in realistic scenarios. Recent advances in deep neural networks (DNNs) led to the emergence of deep learning (DL) in QST. However, existing DL-based QST approaches often employ generic DNN models that are not optimized for imperfect conditions of QST. In this paper, we propose a transformer-based autoencoder architecture tailored for QST with imperfect measurement data. Our method leverages a transformer-based encoder to extract an informative latent representation (ILR) from imperfect measurement data and employs a decoder to predict the quantum states based on the ILR. We anticipate that the high-dimensional ILR will capture more comprehensive information about quantum states. To achieve this, we conduct pre-training of the encoder using a pretext task that involves reconstructing high-quality frequencies from measured frequencies. Extensive simulations and experiments demonstrate the remarkable ability of the ILR in dealing with imperfect measurement data in QST. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.15582 [pdf, other]

Quantum autoencoders using mixed reference states

Authors: Hailan Ma, Gary J. Mooney, Ian R. Petersen, Lloyd C. L. Hollenberg, Daoyi Dong

Abstract: One of the fundamental tasks in information theory is the compression of information. To achieve this in the quantum domain, quantum autoencoders that aim to compress quantum states to low-dimensional ones have been proposed. When taking a pure state as the reference state, there exists an upper bound for the encoding fidelity. This bound limits the compression rate for high-rank states that have… ▽ More One of the fundamental tasks in information theory is the compression of information. To achieve this in the quantum domain, quantum autoencoders that aim to compress quantum states to low-dimensional ones have been proposed. When taking a pure state as the reference state, there exists an upper bound for the encoding fidelity. This bound limits the compression rate for high-rank states that have high entropy. To overcome the entropy inconsistency between the initial states and the reconstructed states, we allow the reference state to be a mixed state. A new cost function that combines the encoding fidelity and the quantum mutual information is proposed for compressing general input states. In particular, we consider the reference states to be a mixture of maximally mixed states and pure states. To achieve efficient compression for different states, two strategies for setting the ratio of mixedness (in the mixture of maximally mixed states and pure states) are provided based on prior knowledge about quantum states or observations obtained from the training process. Numerical results on thermal states of the transverse-field Ising model, Werner states, and maximally mixed states blended with pure states illustrate the effectiveness of the proposed method. In addition, quantum autoencoders using mixed reference states are experimentally implemented on IBM Quantum devices to compress and reconstruct thermal states and Werner states. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.02879 [pdf]

Quantum coherence and interference of a single moiré exciton in nano-fabricated twisted semiconductor heterobilayers

Authors: Haonan Wang, Heejun Kim, Duanfei Dong, Keisuke Shinokita, Kenji Watanabe, Takashi Taniguchi, Kazunari Matsuda

Abstract: Moiré potential acts as periodic quantum confinement for optically generated exciton, generating spatially ordered zero-dimensional quantum system. However, broad emission spectrum arising from inhomogeneity among moiré potential hinders the exploration of the intrinsic properties of moiré exciton. In this study, we have demonstrated a new method to realize the optical observation of quantum coher… ▽ More Moiré potential acts as periodic quantum confinement for optically generated exciton, generating spatially ordered zero-dimensional quantum system. However, broad emission spectrum arising from inhomogeneity among moiré potential hinders the exploration of the intrinsic properties of moiré exciton. In this study, we have demonstrated a new method to realize the optical observation of quantum coherence and interference of a single moiré exciton in twisted semiconducting heterobilayer beyond the diffraction limit of light. A significant single and sharp photoluminescence peak from a single moiré exciton has been demonstrated after nano-fabrication. We present the longer duration of quantum coherence of a single moiré exciton, which reaches beyond 10 ps and the accelerated decoherence process with elevating temperature and excitation power density. Moreover, the quantum interference has revealed the coupling between moiré excitons in different moiré potential minima. The observed quantum coherence and interference of moiré exciton will facilitate potential application toward quantum technologies based on moiré quantum systems. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 42 pages, 4 figures

arXiv:2308.12823 [pdf, other]

Uncovering a Massive z~7.7 Galaxy Hosting a Heavily Obscured Radio-Loud QSO Candidate in COSMOS-Web

Authors: Erini Lambrides, Marco Chiaberge, Arianna Long, Daizhong Liu, Hollis B. Akins, Andrew F. Ptak, Irham Taufik Andika, Alessandro Capetti, Caitlin M. Casey, Jaclyn B. Champagne, Katherine Chworowsky, Tracy E. Clarke, Olivia R. Cooper, Xuheng Ding, Dillon Z. Dong, Andreas L. Faisst, Jordan Y. Forman, Maximilien Franco, Steven Gillman, Ghassem Gozaliasl, Kirsten R. Hall, Santosh Harish, Christopher C. Hayward, Michaela Hirschmann, Taylor A. Hutchison , et al. (25 additional authors not shown)

Abstract: In this letter, we report the discovery of the highest redshift, heavily obscured, radio-loud AGN candidate selected using JWST NIRCam/MIRI, mid-IR, sub-mm, and radio imaging in the COSMOS-Web field. Using multi-frequency radio observations and mid-IR photometry, we identify a powerful, radio-loud (RL), growing supermassive black hole (SMBH) with significant spectral steepening of the radio SED (… ▽ More In this letter, we report the discovery of the highest redshift, heavily obscured, radio-loud AGN candidate selected using JWST NIRCam/MIRI, mid-IR, sub-mm, and radio imaging in the COSMOS-Web field. Using multi-frequency radio observations and mid-IR photometry, we identify a powerful, radio-loud (RL), growing supermassive black hole (SMBH) with significant spectral steepening of the radio SED ($f_{1.28 \mathrm{GHz}} \sim 2$ mJy, $q_{24μm} = -1.1$, $α_{1.28-3\mathrm{GHz}}=-1.2$, $Δα= -0.4$). In conjunction with ALMA, deep ground-based observations, ancillary space-based data, and the unprecedented resolution and sensitivity of JWST, we find no evidence of AGN contribution to the UV/optical/NIR data and thus infer heavy amounts of obscuration (N$_{\mathrm{H}} > 10^{23}$ cm$^{-2}$). Using the wealth of deep UV to sub-mm photometric data, we report a singular solution photo-z of $z_\mathrm{phot}$ = 7.7$^{+0.4}_{-0.3}$ and estimate an extremely massive host-galaxy ($\log M_{\star} = 11.4 -12\,\mathrm{M}_{\odot}$) hosting a powerful, growing SMBH (L$_{\mathrm{Bol}} = 4-12 \times 10^{46}$ erg s$^{-1}$). This source represents the furthest known obscured RL AGN candidate, and its level of obscuration aligns with the most representative but observationally scarce population of AGN at these epochs. △ Less

Submitted 15 December, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Accepted to ApJL

arXiv:2308.07622 [pdf, other]

EMID: An Emotional Aligned Dataset in Audio-Visual Modality

Authors: Jialing Zou, Jiahao Mei, Guangze Ye, Tianyu Huai, Qiwei Shen, Daoguo Dong

Abstract: In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consisten… ▽ More In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consistency between music and images using an advanced 13-dimension emotional model. By incorporating emotional alignment into the dataset, it aims to establish pairs that closely align with human perceptual understanding, thereby raising the performance of auditory-visual cross-modal tasks. We also design a supplemental module named EMI-Adapter to optimize existing cross-modal alignment methods. To validate the effectiveness of the EMID, we conduct a psychological experiment, which has demonstrated that considering the emotional relationship between the two modalities effectively improves the accuracy of matching in abstract perspective. This research lays the foundation for future cross-modal research in domains such as psychotherapy and contributes to advancing the understanding and utilization of emotions in cross-modal alignment. The EMID dataset is available at https://github.com/ecnu-aigc/EMID. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.14599 [pdf, ps, other]

doi 10.1016/j.automatica.2020.109174

Two-step feedback preparation of entanglement for qubit systems with time delay

Authors: Yanan Liu, Daoyi Dong, Sen Kuang, Ian R. Petersen, Hidehiro Yonezawa

Abstract: Quantum entanglement plays a fundamental role in quantum computation and quantum communication. Feedback control has been widely used in stochastic quantum systems to generate given entangled states since it has good robustness, where the time required to compute filter states and conduct filter based control usually cannot be ignored in many practical applications. This paper designed two control… ▽ More Quantum entanglement plays a fundamental role in quantum computation and quantum communication. Feedback control has been widely used in stochastic quantum systems to generate given entangled states since it has good robustness, where the time required to compute filter states and conduct filter based control usually cannot be ignored in many practical applications. This paper designed two control strategies based on the Lyapunov method to prepare a class of entangled states for qubit systems with a constant delay time. The first one is bang bang like control strategy, which has a simple form with switching between a constant value and zero, the stability of which is proved. Another control strategy is switching Lyapunov control, where a constant delay time is introduced in the filter-based feedback control law to compensate for the computation time. Numerical results on a two qubit system illustrate the effectiveness of these two proposed control strategies. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Journal ref: Automatica 125 (2021): 109174

arXiv:2307.14583 [pdf, ps, other]

doi 10.1016/j.automatica.2022.110236

Fault-tolerant $H^\infty$ control for optical parametric oscillators with pum** fluctuations

Authors: Yanan Liu, Daoyi Dong, Ian R. Petersen, Hidehiro Yonezaw

Abstract: Optical Parametric Oscillators (OPOs) have wide applications in quantum optics for generating squeezed states and develo** advanced technologies. When the phase or/and the amplitude of the pum** field for an OPO have fluctuations due to fault signals, time-varying uncertainties will be introduced in the dynamic parameters of the system. In this paper, we investigate how to design a fault-toler… ▽ More Optical Parametric Oscillators (OPOs) have wide applications in quantum optics for generating squeezed states and develo** advanced technologies. When the phase or/and the amplitude of the pum** field for an OPO have fluctuations due to fault signals, time-varying uncertainties will be introduced in the dynamic parameters of the system. In this paper, we investigate how to design a fault-tolerant $H^\infty$ controller for an OPO with a disturbance input and time-varying uncertainties, which can achieve the required $H^\infty$ performance of the quantum system. We apply robust $H^\infty$ control theory to a quantum system, and design a passive controller and an active controller based on the solutions to two Riccati equations. The passive controller has a simple structure and is easy to be implemented by using only passive optical components, while the active quantum controller may achieve improved performance. The control performance of the proposed two controllers and one controller that was designed without consideration of system uncertainties is compared by numerical simulations in a specific OPO, and the results show that the designed controllers work effectively for fluctuations in both the phase and amplitude of the pum** field. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Journal ref: Automatica 140 (2022): 110236

arXiv:2306.11836 [pdf, ps, other]

Generalized Eulerian Numbers

Authors: David Dong

Abstract: Let $A(n,m)$ denote the Eulerian numbers, which count the number of permutations on $[n]$ with exactly $m$ descents. It is well known that $A(n,m)$ also counts the number of permutations on $[n]$ with exactly $m$ excedances. In this report, we define numbers of the form $A(n,m,k)$, which count the number of permutations on $[n]$ with exactly $m$ descents and the last element $k$. We then show bije… ▽ More Let $A(n,m)$ denote the Eulerian numbers, which count the number of permutations on $[n]$ with exactly $m$ descents. It is well known that $A(n,m)$ also counts the number of permutations on $[n]$ with exactly $m$ excedances. In this report, we define numbers of the form $A(n,m,k)$, which count the number of permutations on $[n]$ with exactly $m$ descents and the last element $k$. We then show bijections between this definition and various other analogs for $r$-excedances and $r$-descents. We also prove a variation of Worpitzky's identity on $A(n,m,k)$ using a combinatorial argument mentioned in a paper by Spivey in 2021. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 15 pages, 0 figures

MSC Class: 05A05

arXiv:2306.11784 [pdf, other]

NANCY: Next-generation All-sky Near-infrared Community surveY

Authors: Jiwon Jesse Han, Arjun Dey, Adrian M. Price-Whelan, Joan Najita, Edward F. Schlafly, Andrew Saydjari, Risa H. Wechsler, Ana Bonaca, David J Schlegel, Charlie Conroy, Anand Raichoor, Alex Drlica-Wagner, Juna A. Kollmeier, Sergey E. Koposov, Gurtina Besla, Hans-Walter Rix, Alyssa Goodman, Douglas Finkbeiner, Abhijeet Anand, Matthew Ashby, Benedict Bahr-Kalus, Rachel Beaton, Jayashree Behera, Eric F. Bell, Eric C Bellm , et al. (184 additional authors not shown)

Abstract: The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GAL… ▽ More The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GALAH, 4MOST, WEAVE, MOONS, PFS, UVEX, NEO Surveyor, etc.). Roman can uniquely provide uniform high-spatial-resolution (~0.1 arcsec) imaging over the entire sky, vastly expanding the science reach and precision of all of these near-term and future surveys. This imaging will not only enhance other surveys, but also facilitate completely new science. By imaging the full sky over two epochs, Roman can measure the proper motions for stars across the entire Milky Way, probing 100 times fainter than Gaia out to the very edge of the Galaxy. Here, we propose NANCY: a completely public, all-sky survey that will create a high-value legacy dataset benefiting innumerable ongoing and forthcoming studies of the universe. NANCY is a pure expression of Roman's potential: it images the entire sky, at high spatial resolution, in a broad infrared bandpass that collects as many photons as possible. The majority of all ongoing astronomical surveys would benefit from incorporating observations of NANCY into their analyses, whether these surveys focus on nearby stars, the Milky Way, near-field cosmology, or the broader universe. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Submitted to the call for white papers for the Roman Core Community Survey (June 16th, 2023), and to the Bulletin of the AAS

arXiv:2306.03387 [pdf, other]

ColdNAS: Search to Modulate for User Cold-Start Recommendation

Authors: Shiguang Wu, Yaqing Wang, Qinghe **g, Daxiang Dong, De**g Dou, Quanming Yao

Abstract: Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. H… ▽ More Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. However, the physical meaning of scaling and shifting in recommendation data is unclear. Instead of using a fixed modulation function and deciding modulation position by expertise, we propose a modulation framework called ColdNAS for user cold-start problem, where we look for proper modulation structure, including function and position, via neural architecture search. We design a search space which covers broad models and theoretically prove that this search space can be transformed to a much smaller space, enabling an efficient and robust one-shot search algorithm. Extensive experimental results on benchmark datasets show that ColdNAS consistently performs the best. We observe that different modulation functions lead to the best performance on different datasets, which validates the necessity of designing a searching-based method. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.13850 [pdf, other]

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

Authors: Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang

Abstract: Visual Relation Extraction (VRE) is a powerful means of discovering relationships between entities within visually-rich documents. Existing methods often focus on manipulating entity features to find pairwise relations, yet neglect the more fundamental structural information that links disparate entity pairs together. The absence of global structure information may make the model struggle to learn… ▽ More Visual Relation Extraction (VRE) is a powerful means of discovering relationships between entities within visually-rich documents. Existing methods often focus on manipulating entity features to find pairwise relations, yet neglect the more fundamental structural information that links disparate entity pairs together. The absence of global structure information may make the model struggle to learn long-range relations and easily predict conflicted results. To alleviate such limitations, we propose a GlObal Structure knowledge-guided relation Extraction (GOSE) framework. GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document. Subsequently, global structural knowledge is captured from the preceding iterative predictions, which are then incorporated into the representations of the entities. This "generate-capture-incorporate" cycle is repeated multiple times, allowing entity representations and global structure knowledge to be mutually reinforced. Extensive experiments validate that GOSE not only outperforms existing methods in the standard fine-tuning setting but also reveals superior cross-lingual learning capabilities; indeed, even yields stronger data-efficient performance in the low-resource setting. The code for GOSE will be available at https://github.com/chenxn2020/GOSE. △ Less

Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by EMNLP 2023 (Findings)

arXiv:2305.11643 [pdf, other]

Time Optimal Ergodic Search

Authors: Dayi Dong, Henry Berger, Ian Abraham

Abstract: Robots with the ability to balance time against the thoroughness of search have the potential to provide time-critical assistance in applications such as search and rescue. Current advances in ergodic coverage-based search methods have enabled robots to completely explore and search an area in a fixed amount of time. However, optimizing time against the quality of autonomous ergodic search has yet… ▽ More Robots with the ability to balance time against the thoroughness of search have the potential to provide time-critical assistance in applications such as search and rescue. Current advances in ergodic coverage-based search methods have enabled robots to completely explore and search an area in a fixed amount of time. However, optimizing time against the quality of autonomous ergodic search has yet to be demonstrated. In this paper, we investigate solutions to the time-optimal ergodic search problem for fast and adaptive robotic search and exploration. We pose the problem as a minimum time problem with an ergodic inequality constraint whose upper bound regulates and balances the granularity of search against time. Solutions to the problem are presented analytically using Pontryagin's conditions of optimality and demonstrated numerically through a direct transcription optimization approach. We show the efficacy of the approach in generating time-optimal ergodic search trajectories in simulation and with drone experiments in a cluttered environment. Obstacle avoidance is shown to be readily integrated into our formulation, and we perform ablation studies that investigate parameter dependence on optimized time and trajectory sensitivity for search. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: 13 pages, 8 figures, Robotics: Science and Systems

arXiv:2305.05433 [pdf, other]

Tomography of Quantum States from Structured Measurements via quantum-aware transformer

Authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Chunlin Chen, Herschel Rabitz

Abstract: Quantum state tomography (QST) is the process of reconstructing the state of a quantum system (mathematically described as a density matrix) through a series of different measurements, which can be solved by learning a parameterized function to translate experimentally measured statistics into physical density matrices. However, the specific structure of quantum measurements for characterizing a q… ▽ More Quantum state tomography (QST) is the process of reconstructing the state of a quantum system (mathematically described as a density matrix) through a series of different measurements, which can be solved by learning a parameterized function to translate experimentally measured statistics into physical density matrices. However, the specific structure of quantum measurements for characterizing a quantum state has been neglected in previous work. In this paper, we explore the similarity between highly structured sentences in natural language and intrinsically structured measurements in QST. To fully leverage the intrinsic quantum characteristics involved in QST, we design a quantum-aware transformer (QAT) model to capture the complex relationship between measured frequencies and density matrices. In particular, we query quantum operators in the architecture to facilitate informative representations of quantum data and integrate the Bures distance into the loss function to evaluate quantum state fidelity, thereby enabling the reconstruction of quantum states from measured data with high fidelity. Extensive simulations and experiments (on IBM quantum computers) demonstrate the superiority of the QAT in reconstructing quantum states with favorable robustness against experimental noise. △ Less

Submitted 17 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.11384 [pdf, other]

Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

Authors: Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi **, Xiaoguang Mao, Xiangke Liao

Abstract: Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionali… ▽ More Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation. △ Less

Submitted 14 June, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

Comments: Accepted by the 46th International Conference on Software Engineering (ICSE 2024)

arXiv:2303.16575 [pdf, other]

doi 10.1103/PhysRevResearch.6.023216

Exponential sensitivity revival of noisy non-Hermitian quantum sensing with two-photon drives

Authors: Liying Bao, Bo Qi, Franco Nori, Daoyi Dong

Abstract: Unique properties of multimode non-Hermitian lattice dynamics can be utilized to construct exponentially sensitive sensors. However, the impact of noise remains unclear, which may severely degrade their sensitivity. We analytically characterize and highlight the impact of loss and gain on the sensitivity revival and stability of non-Hermitian sensors. Defying the general belief that the superiorit… ▽ More Unique properties of multimode non-Hermitian lattice dynamics can be utilized to construct exponentially sensitive sensors. However, the impact of noise remains unclear, which may severely degrade their sensitivity. We analytically characterize and highlight the impact of loss and gain on the sensitivity revival and stability of non-Hermitian sensors. Defying the general belief that the superiority of quantum sensing will vanish in the presence of loss, we find that by proactively tuning the loss, the exponential sensitivity can be surprisingly regained when the sensing dynamics is stable. Furthermore, we prove that gain is crucial to fully revive the ideally exponential sensitivity and to ensure the stability of non-Hermitian sensing by making a balanced loss and gain. Our paper opens a way to significantly enhance the sensitivity by proactively tuning the loss and gain, which may promote future quantum sensing and quantum engineering. △ Less

Submitted 28 May, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: 18 pages, 3 figures

Journal ref: Physical Review RESEARCH 6, 023216 (2024)

arXiv:2302.14312 [pdf, other]

Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

Authors: Shumin Zhou, Hailan Ma, Sen Kuang, Daoyi Dong

Abstract: Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement… ▽ More Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement learning (AT-DRL) for quantum control. In particular, we first design a guided reward function based on the fidelity of quantum states that enables incremental fidelity improvement. Then, we introduce the concept of an auxiliary task whose network shares parameters with the main network to predict the reward provided by the environment (called the main task). The auxiliary task learns synchronously with the main task, allowing one to select the most relevant features of the environment, thus aiding the agent in comprehending how to achieve the desired state. The numerical simulations demonstrate that the proposed AT-DRL can provide a solution to the sparse reward in quantum systems, and has great potential in designing control pulses that achieve efficient quantum state preparation. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 13 pages, 11 figures

arXiv:2302.06858 [pdf, other]

Trainability Enhancement of Parameterized Quantum Circuits via Reduced-Domain Parameter Initialization

Authors: Yabo Wang, Bo Qi, Chris Ferrie, Daoyi Dong

Abstract: Parameterized quantum circuits (PQCs) have been widely used as a machine learning model to explore the potential of achieving quantum advantages for various tasks. However, the training of PQCs is notoriously challenging owing to the phenomenon of plateaus and/or the existence of (exponentially) many spurious local minima. In this work, we propose an efficient parameter initialization strategy wit… ▽ More Parameterized quantum circuits (PQCs) have been widely used as a machine learning model to explore the potential of achieving quantum advantages for various tasks. However, the training of PQCs is notoriously challenging owing to the phenomenon of plateaus and/or the existence of (exponentially) many spurious local minima. In this work, we propose an efficient parameter initialization strategy with theoretical guarantees. We prove that if the initial domain of each parameter is reduced inversely proportional to the square root of circuit depth, then the magnitude of the cost gradient decays at most polynomially as a function of the depth. Our theoretical results are verified by numerical simulations of variational quantum eigensolver tasks. Moreover, we demonstrate that the reduced-domain initialization strategy can protect specific quantum neural networks from exponentially many spurious local minima. Our results highlight the significance of an appropriate parameter initialization strategy and can be used to enhance the trainability of PQCs in variational quantum algorithms. △ Less

Submitted 1 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 27 pages,5 figures

arXiv:2302.02371 [pdf, other]

Model-free Quantum Gate Design and Calibration using Deep Reinforcement Learning

Authors: Omar Shindi, Qi Yu, Parth Girdhar, Daoyi Dong

Abstract: High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical ap… ▽ More High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical applications. Thus, the control policy based on a quantum system model may be unpractical for quantum gate design. Also, quantum measurements collapse quantum states, which makes it challenging to obtain information through measurements during the control process. In this paper, we propose a novel training framework using deep reinforcement learning for model-free quantum control. The proposed framework relies only on the measurement at the end of the control process and offers the ability to find the optimal control policy without access to quantum systems during the learning process. The effectiveness of the proposed technique is numerically demonstrated for model-free quantum gate design and quantum gate calibration using off-policy reinforcement learning algorithms. △ Less

Submitted 7 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: 12 pages, 17 figures, accepted for publication in the IEEE Transactions on Artificial Intelligence, in press

arXiv:2302.00282 [pdf, other]

Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Authors: Zhang Runhua, Jiang Hongxu, Tian Fangzheng, Geng **kun, Li Xiaobin, Ma Yuhang, Zhu Chenhui, Dong Dong, Li Xin, Wang Haojie

Abstract: Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Bes… ▽ More Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance. In this paper, we propose Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 21.2\%--84.9\% and 17.9\%--96.2\% , respectively. Besides, Xenos also outperforms the widely-used TVM by 3.22$\times$--17.92$\times$. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x--3.78x compared with the single device. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: The preliminary version is accepted by the 28th International Conference on Database Systems for Advanced Applications (DASFAA-2023)

arXiv:2301.11012 [pdf]

Dynamics of moire trion and its valley polarization in microfabricated WSe2/MoSe2 heterobilayer

Authors: Heejun Kim, Duanfei Dong, Yuki Okamura, Keisuke Shinokita, Kenji Watanabe, Takashi Taniguchi, Kazunari Matsuda

Abstract: The moire potential, induced by stacking two monolayer semiconductors with slightly different lattice mismatches, acts as periodic quantum confinement for optically generated excitons, resulting in spatially ordered zero-dimensional quantum systems. However, there are limitations to exploring intrinsic optical properties of moire excitons due to ensemble averaged and broadened emissions from many… ▽ More The moire potential, induced by stacking two monolayer semiconductors with slightly different lattice mismatches, acts as periodic quantum confinement for optically generated excitons, resulting in spatially ordered zero-dimensional quantum systems. However, there are limitations to exploring intrinsic optical properties of moire excitons due to ensemble averaged and broadened emissions from many peaks caused by the inhomogeneity of the moire potential. In this study, we proposed a microfabrication technique based on focused Ga+ ion beams, which enables us to control the number of peaks originating from the moire potential and thus explore unknown moire optical characteristics of WSe2/MoSe2 heterobilayers. By taking advantage of this approach, we reveal emissions from a single moire exciton and charged moire exciton (trion) under electrostatic do** conditions. We show the momentum dark moire trion state above the bright trion state with a splitting energy of approximately 4 meV and clarify that the dynamics are determined by the initial trion population in the bright state. Furthermore, the degree of negative circularly polarized emissions and their valley dynamics of moire trions are dominated by a very long valley relaxation process lasting ~700 ns. Our findings on microfabricated heterobilayers could be viewed as an extension of our groundbreaking efforts in the field of quantum optics application using moire superlattices. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 41 pages, 4 figures

arXiv:2212.11649 [pdf, ps, other]

doi 10.1103/PhysRevLett.130.043604

Quantum Coherent Control of a Single Molecular-Polariton Rotation

Authors: Li-Bao Fan, Chuan-Cun Shu, Daoyi Dong, Jun He, Niels E. Henriksen, Franco Nori

Abstract: We present a combined analytical and numerical study for coherent terahertz control of a single molecular polariton, formed by strongly coupling two rotational states of a molecule with a single-mode cavity. Compared to the bare molecules driven by a single terahertz pulse, the presence of a cavity strongly modifies the post-pulse orientation of the polariton, making it difficult to obtain its max… ▽ More We present a combined analytical and numerical study for coherent terahertz control of a single molecular polariton, formed by strongly coupling two rotational states of a molecule with a single-mode cavity. Compared to the bare molecules driven by a single terahertz pulse, the presence of a cavity strongly modifies the post-pulse orientation of the polariton, making it difficult to obtain its maximal degree of orientation. To solve this challenging problem toward achieving complete quantum coherent control, we derive an analytical solution of a pulse-driven quantum Jaynes-Cummings model by expanding the wave function into entangled states and constructing an effective Hamiltonian. We utilize it to design a composite terahertz pulse and obtain the maximum degree of orientation of the polariton by exploiting photon blockade effects. This work offers a new strategy to study rotational dynamics in the strong-coupling regime and provides a method for complete quantum coherent control of a single molecular polariton. It, therefore, has direct applications in polariton chemistry and molecular polaritonics for exploring novel quantum optical phenomena. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: 16 pages, 5 figures , accepted by Physical Review Letters on 19 December, 2022

arXiv:2212.09326 [pdf, other]

doi 10.1103/PhysRevA.107.052403

Complementary relations of entanglement, coherence, steering and Bell nonlocality inequality violation in three-qubit states

Authors: Dong-Dong Dong, Xue-Ke Song, Xiao-Gang Fan, Liu Ye, Dong Wang

Abstract: We put forward complementary relations of entanglement, coherence, steering inequality violation, and Bell nonlocality for arbitrary three-qubit states. We show that two families of genuinely entangled three-qubit pure states with single parameter exist, and they exhibit maximum coherence and steering inequality violation for a fixed amount of negativity, respectively. It is found that the negativ… ▽ More We put forward complementary relations of entanglement, coherence, steering inequality violation, and Bell nonlocality for arbitrary three-qubit states. We show that two families of genuinely entangled three-qubit pure states with single parameter exist, and they exhibit maximum coherence and steering inequality violation for a fixed amount of negativity, respectively. It is found that the negativity is exactly equal to the geometric mean of bipartite concurrences for the three-qubit pure states, although the negativity is always less than or equal to the latter for three-qubit mixed states. Moreover, the complementary relation between negativity and first-order coherence for tripartite entanglement states are established. Furthermore, we investigate the close relation between the negativity and the maximum steering inequality violation. In addition, the complementary relation between negativity and the maximum Bell-inequality violation for arbitrary three-qubit states is obtained. The results provide reliable evidence of fundamental connections among entanglement, coherence, steering inequality violation, and Bell nonlocality. △ Less

Submitted 28 April, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: 11 pages, 5 figures

Showing 1–50 of 218 results for author: Dong, D