Search | arXiv e-print repository

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Authors: Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Abstract: Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation… ▽ More Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation by proposing the UnifiedSKG framework, which unifies 21 SKG tasks into a text-to-text format, aiming to promote systematic SKG research, instead of being exclusive to a single task, domain, or dataset. We use UnifiedSKG to benchmark T5 with different sizes and show that T5, with simple modifications when necessary, achieves state-of-the-art performance on almost all of the 21 tasks. We further demonstrate that multi-task prefix-tuning improves the performance on most tasks, largely improving the overall performance. UnifiedSKG also facilitates the investigation of zero-shot and few-shot learning, and we show that T0, GPT-3, and Codex struggle in zero-shot and few-shot learning for SKG. We also use UnifiedSKG to conduct a series of controlled experiments on structured knowledge encoding variants across SKG tasks. UnifiedSKG is easily extensible to more tasks, and it is open-sourced at https://github.com/hkunlp/unifiedskg. △ Less

Submitted 18 October, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

Comments: EMNLP 2022

arXiv:2201.05726 [pdf, ps, other]

Categories of quantum liquids III

Authors: Liang Kong, Hao Zheng

Abstract: We continue our study of the categories of quantum liquids started in a previous work. We combine local quantum symmetries with topological skeletons into a single mathematical theory of topological nets and defect nets. In particular, we introduce the notion of a topological net, which is motivated from and generalizes that of a conformal net, and the notion of a defect net which generalizes that… ▽ More We continue our study of the categories of quantum liquids started in a previous work. We combine local quantum symmetries with topological skeletons into a single mathematical theory of topological nets and defect nets. In particular, we introduce the notion of a topological net, which is motivated from and generalizes that of a conformal net, and the notion of a defect net which generalizes that of a defect between conformal nets. We give explicit examples of them. Moreover, we construct the category of topological $n$-nets with $k$-morphisms defined by defect $n$-nets of codimension $k$, and show that the category of $n$D quantum liquids can be extracted from it and computed explicitly via the condensation theory of topological nets. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: 28 pages

arXiv:2112.09784 [pdf, other]

Functional Linear Regression for Partially Observed Functional Data

Authors: Yafei Wang, Tingyu Lai, Bei Jiang, Linglong Kong, Zhongzhan Zhang

Abstract: In the functional linear regression model, many methods have been proposed and studied to estimate the slope function while the functional predictor was observed in the entire domain. However, works on functional linear regression models with partially observed trajectories have received less attention. In this paper, to fill the literature gap we consider the scenario where individual functional… ▽ More In the functional linear regression model, many methods have been proposed and studied to estimate the slope function while the functional predictor was observed in the entire domain. However, works on functional linear regression models with partially observed trajectories have received less attention. In this paper, to fill the literature gap we consider the scenario where individual functional predictor may be observed only on part of the domain. Depending on whether measurement error is presented in functional predictors, two methods are developed, one is based on linear functionals of the observed part of the trajectory and the other one uses conditional principal component scores. We establish the asymptotic properties of the two proposed methods. Finite sample simulations are conducted to verify their performance. Diffusion tensor imaging (DTI) data from Alzheimer's Disease Neuroimaging Initiative (ADNI) study is analyzed. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.08985 [pdf, other]

Effective Rate of RIS-aided Networks with Location and Phase Estimation Uncertainty

Authors: Long Kong, Steven Kisseleff, Symeon Chatzinotas, Björn Ottersten, Melike Erol-Kantarci

Abstract: Reconfigurable Intelligent Surfaces (RIS) are planar structures connected to electronic circuitry, which can be employed to steer the electromagnetic signals in a controlled manner. Through this, the signal quality and the effective data rate can be substantially improved. While the benefits of RIS-assisted wireless communications have been investigated for various scenarios, some aspects of the n… ▽ More Reconfigurable Intelligent Surfaces (RIS) are planar structures connected to electronic circuitry, which can be employed to steer the electromagnetic signals in a controlled manner. Through this, the signal quality and the effective data rate can be substantially improved. While the benefits of RIS-assisted wireless communications have been investigated for various scenarios, some aspects of the network design, such as coverage, optimal placement of RIS, etc., often require complex optimization and numerical simulations, since the achievable effective rate is difficult to predict. This problem becomes even more difficult in the presence of phase estimation errors or location uncertainty, which can lead to substantial performance degradation if neglected. Considering randomly distributed receivers within a ring-shaped RIS-assisted wireless network, this paper mainly investigates the effective rate by taking into account the above-mentioned impairments. Furthermore, exact closed-form expressions for the effective rate are derived in terms of Meijer's $G$-function, which (i) reveals that the location and phase estimation uncertainty should be well considered in the deployment of RIS in wireless networks; and (ii) facilitates future network design and performance prediction. △ Less

Submitted 17 December, 2021; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 5 pages, 6 figures, conference

arXiv:2112.08787 [pdf, other]

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

Authors: Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, Chao Zhang

Abstract: While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled data to impr… ▽ More While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled data to improve the label efficiency of active PLM fine-tuning. AcTune switches between data annotation and model self-training based on uncertainty: it selects high-uncertainty unlabeled samples for active annotation and low-uncertainty ones for model self-training. Under this framework, we design (1) a region-aware sampling strategy that reduces redundancy when actively querying for annotations and (2) a momentum-based memory bank that dynamically aggregates the model's pseudo labels to suppress label noise in self-training. Experiments on 6 text classification datasets show that AcTune outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be available at \url{https://github.com/yueyu1030/actune}. △ Less

Submitted 3 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: NAACL 2022 Main Conference (Code: https://github.com/yueyu1030/actune)

Journal ref: NAACL 2022

arXiv:2112.07874 [pdf, other]

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Authors: Jakob Prange, Nathan Schneider, Lingpeng Kong

Abstract: We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, we find that, overall, semantic constituency structures are most useful to language modeling performance -- outpacing syntactic constituency struc… ▽ More We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, we find that, overall, semantic constituency structures are most useful to language modeling performance -- outpacing syntactic constituency structures as well as syntactic and semantic dependency structures. Further, effects vary greatly depending on part-of-speech class. In sum, our findings point to promising tendencies in neuro-symbolic language modeling and invite future research quantifying the design choices made by different formalisms. △ Less

Submitted 10 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted to NAACL 2022 (slight typesetting divergences to NAACL camera-ready due to TexLive 2020/2021 mismatches)

arXiv:2112.05368 [pdf, other]

Sample Average Approximation for Stochastic Optimization with Dependent Data: Performance Guarantees and Tractability

Authors: Yafei Wang, Bo Pan, Wei Tu, Peng Liu, Bei Jiang, Chao Gao, Wei Lu, Shangling Jui, Linglong Kong

Abstract: Sample average approximation (SAA), a popular method for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in online learning with time series data or distributed computing with Markovian training samples. In this… ▽ More Sample average approximation (SAA), a popular method for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in online learning with time series data or distributed computing with Markovian training samples. In this paper, we show that SAA remains tractable when the distribution of unknown parameters is only observable through dependent instances and still enjoys asymptotic consistency and finite sample guarantees. Specifically, we provide a rigorous probability error analysis to derive $1 - β$ confidence bounds for the out-of-sample performance of SAA estimators and show that these estimators are asymptotically consistent. We then, using monotone operator theory, study the performance of a class of stochastic first-order algorithms trained on a dependent source of data. We show that approximation error for these algorithms is bounded and concentrates around zero, and establish deviation bounds for iterates when the underlying stochastic process is $φ$-mixing. The algorithms presented can be used to handle numerically inconvenient loss functions such as the sum of a smooth and non-smooth function or of non-smooth functions with constraints. To illustrate the usefulness of our results, we present several stochastic versions of popular algorithms such as stochastic proximal gradient descent (S-PGD), stochastic relaxed Peaceman--Rachford splitting algorithms (S-rPRS), and numerical experiment. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2112.05194 [pdf, other]

Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

Authors: Lei Ding, Dengdeng Yu, **han Xie, Wenxing Guo, Shenggang Hu, Meichen Liu, Linglong Kong, Hongsheng Dai, Yanchun Bao, Bei Jiang

Abstract: With widening deployments of natural language processing (NLP) in daily life, inherited social biases from NLP models have become more severe and problematic. Previous studies have shown that word embeddings trained on human-generated corpora have strong gender biases that can produce discriminative results in downstream tasks. Previous debiasing methods focus mainly on modeling bias and only impl… ▽ More With widening deployments of natural language processing (NLP) in daily life, inherited social biases from NLP models have become more severe and problematic. Previous studies have shown that word embeddings trained on human-generated corpora have strong gender biases that can produce discriminative results in downstream tasks. Previous debiasing methods focus mainly on modeling bias and only implicitly consider semantic information while completely overlooking the complex underlying causal structure among bias and semantic components. To address these issues, we propose a novel methodology that leverages a causal inference framework to effectively remove gender bias. The proposed method allows us to construct and analyze the complex causal mechanisms facilitating gender information flow while retaining oracle semantic information within word embeddings. Our comprehensive experiments show that the proposed method achieves state-of-the-art results in gender-debiasing tasks. In addition, our methods yield better performance in word similarity evaluation and various extrinsic downstream NLP tasks. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI 2022

arXiv:2112.02295 [pdf, other]

doi 10.3390/universe7120472

Fermi-LAT Observation of PSR B1259-63 during Its 2021 Periastron Passage

Authors: Zhi Chang, Shu Zhang, Yu-Peng Chen, Long Ji, Ling-Da Kong, Peng-Ju Wang

Abstract: PSR B1259-63 is a $γ$-ray binary system, where the compact object is a pulsar. The system has an orbital period of 1236.7 days and shows peculiar $γ$-ray flares (in 100\,MeV--300\,GeV) after its periastron time. We analyzed the \textit{Fermi}-LAT observation of PSR B1259-63 during its latest periastron passage, as well as its previous three periastrons. The bright GeV flares started about 60 days… ▽ More PSR B1259-63 is a $γ$-ray binary system, where the compact object is a pulsar. The system has an orbital period of 1236.7 days and shows peculiar $γ$-ray flares (in 100\,MeV--300\,GeV) after its periastron time. We analyzed the \textit{Fermi}-LAT observation of PSR B1259-63 during its latest periastron passage, as well as its previous three periastrons. The bright GeV flares started about 60 days after the periastron epoch in 2021. This delay is larger than that around the 2017 periastron and much larger than earlier periastrons. The delay of the GeV flux peak time in each periastron passage is apparent in our results. We discussed the possible origin of this delay and made a prediction of the GeV flux peak time in next periastron passage, based on observation of the previous delays. △ Less

Submitted 8 December, 2021; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: accepted for publication in Universe

Journal ref: Universe 2021, 7(12), 472

arXiv:2112.01498 [pdf, other]

doi 10.1103/PRXQuantum.3.020314

Near-optimal covariant quantum error-correcting codes from random unitaries with symmetries

Authors: Linghang Kong, Zi-Wen Liu

Abstract: Quantum error correction and symmetries play central roles in quantum information science and physics. It is known that quantum error-correcting codes that obey (are covariant with respect to) continuous symmetries in a certain sense cannot correct erasure errors perfectly (a well-known result in this regard being the Eastin-Knill theorem in the context of fault-tolerant quantum computing), in con… ▽ More Quantum error correction and symmetries play central roles in quantum information science and physics. It is known that quantum error-correcting codes that obey (are covariant with respect to) continuous symmetries in a certain sense cannot correct erasure errors perfectly (a well-known result in this regard being the Eastin-Knill theorem in the context of fault-tolerant quantum computing), in contrast to the case without symmetry constraints. Furthermore, several quantitative fundamental limits on the accuracy of such covariant codes for approximate quantum error correction are known. Here, we consider the quantum error correction capability of uniformly random covariant codes. In particular, we analytically study the most essential cases of $U(1)$ and $SU(d)$ symmetries, and show that for both symmetry groups the error of the covariant codes generated by Haar-random symmetric unitaries, i.e., unitaries that commute with the group actions, typically scale as $O(n^{-1})$ in terms of both the average- and worst-case purified distances against erasure noise, saturating the fundamental limits to leading order. We note that the results hold for symmetric variants of unitary 2-designs, and comment on the convergence problem of symmetric random circuits. Our results not only indicate (potentially efficient) randomized constructions of optimal $U(1)$- and $SU(d)$-covariant codes, but also reveal fundamental properties of random symmetric unitaries, which yield important solvable models of complex quantum systems (including black holes and many-body spin systems) that have attracted great recent interest in quantum gravity and condensed matter physics. We expect our construction and analysis to find broad relevance in both physics and quantum computing. △ Less

Submitted 11 April, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: 25 pages. Close to the published version. Features both U(1) and SU(d). Supersedes 2102.11835

Journal ref: PRX Quantum 3, 020314 (2022)

arXiv:2111.15242 [pdf, other]

ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation

Authors: Lingdong Kong, Niamul Quader, Venice Erin Liong

Abstract: Transferring knowledge learned from the labeled source domain to the raw target domain for unsupervised domain adaptation (UDA) is essential to the scalable deployment of autonomous driving systems. State-of-the-art methods in UDA often employ a key idea: utilizing joint supervision signals from both source and target domains for self-training. In this work, we improve and extend this aspect. We p… ▽ More Transferring knowledge learned from the labeled source domain to the raw target domain for unsupervised domain adaptation (UDA) is essential to the scalable deployment of autonomous driving systems. State-of-the-art methods in UDA often employ a key idea: utilizing joint supervision signals from both source and target domains for self-training. In this work, we improve and extend this aspect. We present ConDA, a concatenation-based domain adaptation framework for LiDAR segmentation that: 1) constructs an intermediate domain consisting of fine-grained interchange signals from both source and target domains without destabilizing the semantic coherency of objects and background around the ego-vehicle; and 2) utilizes the intermediate domain for self-training. To improve the network training on the source domain and self-training on the intermediate domain, we propose an anti-aliasing regularizer and an entropy aggregator to reduce the negative effect caused by the aliasing artifacts and noisy pseudo labels. Through extensive studies, we demonstrate that ConDA significantly outperforms prior arts in mitigating domain gaps. △ Less

Submitted 6 April, 2023; v1 submitted 30 November, 2021; originally announced November 2021.

Comments: 8 pages, 6 figures, 4 tables; ICRA 2023

arXiv:2111.10357 [pdf, other]

A framework for randomized benchmarking over compact groups

Authors: Linghang Kong

Abstract: Characterization of experimental systems is an essential step in develo** and improving quantum hardware. A collection of protocols known as Randomized Benchmarking (RB) was developed in the past decade, which provides an efficient way to measure error rates in quantum systems. In a recent paper (arxiv:2010.07974), a general framework for RB was proposed, which encompassed most of the known RB p… ▽ More Characterization of experimental systems is an essential step in develo** and improving quantum hardware. A collection of protocols known as Randomized Benchmarking (RB) was developed in the past decade, which provides an efficient way to measure error rates in quantum systems. In a recent paper (arxiv:2010.07974), a general framework for RB was proposed, which encompassed most of the known RB protocols and overcame the limitation on error models in previous works. However, even this general framework has a restriction: it can only be applied to a finite group of gates. This does not meet the need posed by experiments, in particular the demand for benchmarking non-Clifford gates and continuous gate sets on quantum devices. In this work we generalize the RB framework to continuous groups of gates and show that as long as the noise level is reasonably small, the output can be approximated as a linear combination of matrix exponential decays. As an application, we numerically study the fully randomized benchmarking protocol (i.e. RB with the entire unitary group as the gate set) enabled by our proof. This provides a unified way to estimate the gate fidelity for any quantum gate in an experiment. △ Less

Submitted 19 November, 2021; originally announced November 2021.

arXiv:2111.06012 [pdf, other]

Kronecker Factorization for Preventing Catastrophic Forgetting in Large-scale Medical Entity Linking

Authors: Denis Jered McInerney, Luyang Kong, Kristjan Arumae, Byron Wallace, Parminder Bhatia

Abstract: Multi-task learning is useful in NLP because it is often practically desirable to have a single model that works across a range of tasks. In the medical domain, sequential training on tasks may sometimes be the only way to train models, either because access to the original (potentially sensitive) data is no longer available, or simply owing to the computational costs inherent to joint retraining.… ▽ More Multi-task learning is useful in NLP because it is often practically desirable to have a single model that works across a range of tasks. In the medical domain, sequential training on tasks may sometimes be the only way to train models, either because access to the original (potentially sensitive) data is no longer available, or simply owing to the computational costs inherent to joint retraining. A major issue inherent to sequential learning, however, is catastrophic forgetting, i.e., a substantial drop in accuracy on prior tasks when a model is updated for a new task. Elastic Weight Consolidation is a recently proposed method to address this issue, but scaling this approach to the modern large models used in practice requires making strong independence assumptions about model parameters, limiting its effectiveness. In this work, we apply Kronecker Factorization--a recent approach that relaxes independence assumptions--to prevent catastrophic forgetting in convolutional and Transformer-based neural networks at scale. We show the effectiveness of this technique on the important and illustrative task of medical entity linking across three datasets, demonstrating the capability of the technique to be used to make efficient updates to existing methods as new medical data becomes available. On average, the proposed method reduces catastrophic forgetting by 51% when using a BERT-based model, compared to a 27% reduction using standard Elastic Weight Consolidation, while maintaining spatial complexity proportional to the number of model parameters. △ Less

Submitted 10 November, 2021; originally announced November 2021.

arXiv:2111.03786 [pdf]

Tunable vortex Majorana modes controlled by strain in homogeneous LiFeAs

Authors: Wenyao Liu, Quanxin Hu, Xiancheng Wang, Yigui Zhong, Fazhi Yang, Lingyuan Kong, Lu Cao, Geng Li, Kozo Okazaki, Takeshi Kondo, Changqing **, Fuchun Zhang, **peng Xu, Hong-Jun Gao, Hong Ding

Abstract: The iron-based superconductors (FeSCs) have recently emerged as a promising single-material Majorana platform by hosting isolated Majorana zero modes (MZMs) at relatively high temperatures. To further verify its Majorana nature and move forward to build topological quantum qubit, it is highly desirable to achieve tunability for MZMs on homogeneous FeSCs. Here, with an in-situ strain device, we can… ▽ More The iron-based superconductors (FeSCs) have recently emerged as a promising single-material Majorana platform by hosting isolated Majorana zero modes (MZMs) at relatively high temperatures. To further verify its Majorana nature and move forward to build topological quantum qubit, it is highly desirable to achieve tunability for MZMs on homogeneous FeSCs. Here, with an in-situ strain device, we can controllably create MZMs on the homogeneous surface of stoichiometric superconductor LiFeAs by inducing a topological phase transition. The evolution of discrete energy modes inside a strained vortex is found to mimics exactly as the predicted topological vortex case, proving the Majorana nature of emerging zero modes of vortex. Such tunability of MZMs in a homogeneous superconductor is an important step toward their application in topological quantum computation. △ Less

Submitted 19 November, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

arXiv:2110.15040 [pdf]

When Liquid Meets Frequency-Selective Rasorber: Wideband and Switchable 3-D Frequency-Selective Rasorber

Authors: Xiangkun Kong, Xuemeng Wang, Xin **, Weihao Lin, Lingqi Kong, Shunliu Jiang, Lei Xing

Abstract: In this paper, a switchable 3-D frequency selective rasorber (FSR) with wide absorption bands without lumped components or commercial magnetic absorbers is presented and investigated. The absorption path is constructed by embedding a hybrid liquid microwave absorber (MA) inside a parallel plate waveguide (PPW) to create an extra-wide absorption band. A reflection layer based on water is placed beh… ▽ More In this paper, a switchable 3-D frequency selective rasorber (FSR) with wide absorption bands without lumped components or commercial magnetic absorbers is presented and investigated. The absorption path is constructed by embedding a hybrid liquid microwave absorber (MA) inside a parallel plate waveguide (PPW) to create an extra-wide absorption band. A reflection layer based on water is placed behind the FSR to realize the reconstruction from FSR to a band-notched absorber (BNA) by controlling the presence or absence of water. The liquid-based absorber is firstly analyzed by a multimode dielectric resonant circuit and the fundamental operating principle of the FSR is demonstrated with the help of an equivalent circuit model (ECM). A design example is provided, fabricated, and measured and it exhibits a passband at 5.10 GHz with a transmission bandwidth of 18.5% for less than 3 dB insertion loss and fractional bandwidth of 146.8% with reflectivity less than -10 dB in FSR mode. In BNA mode, it has a minimum return loss of 0.72 dB and a good absorption band from 2.5 to 4.6 GHz and 5.7 to 16.5 GHz. Good agreements among circuit analysis, simulation results, and measurement results are finally obtained. The switchable rasorber can be applied in a shared-aperture antennas system to convert a broadband stealth radome into a BNA. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: IEEE Transactions on Electromagnetic Compatibility (2022)

arXiv:2110.08896 [pdf, other]

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Authors: Ke Sun, Yafei Wang, Yi Liu, Yingnan Zhao, Bo Pan, Shangling Jui, Bei Jiang, Linglong Kong

Abstract: Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration s… ▽ More Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms. △ Less

Submitted 20 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

arXiv:2110.06892 [pdf, other]

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Authors: Jiuding Yang, Weidong Guo, Bang Liu, Yakun Yu, Chaoyue Wang, **wen Luo, Linglong Kong, Di Niu, Zhen Wen

Abstract: Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept phrases to characterize the main idea of a text snippet on the fast-growing social media. This is partly attributed to the fact that most knowledge bases contain general terms of the world, such as trees and cars, which do not have the defining pow… ▽ More Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept phrases to characterize the main idea of a text snippet on the fast-growing social media. This is partly attributed to the fact that most knowledge bases contain general terms of the world, such as trees and cars, which do not have the defining power or are not interesting enough to social media app users. Another reason is that the intricacy of natural language allows the use of tense, negation and grammar to change the logic or emphasis of language, thus conveying completely different meanings. In this paper, we present TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from the open-domain social media. The concepts we consider represent the trending interests of online users. Associated with TAG is a concept graph of these fine-grained concepts and entities to provide the structural context information. We evaluate a wide range of popular neural text matching models as well as pre-trained language models on TAG, and point out their insufficiency to tag social media content with the most appropriate concept. We further propose a novel graph-graph matching method that demonstrates superior abstraction and generalization performance by better utilizing both the structural context in the concept graph and logic interactions between semantic units in the sentence via syntactic dependency parsing. We open-source both the TAG dataset and the proposed methods to facilitate further research. △ Less

Submitted 15 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted by ACM SIGKDD 2022

arXiv:2110.06465 [pdf, other]

Breaking the Dilemma of Medical Image-to-image Translation

Authors: Lingke Kong, Chenyu Lian, Detian Huang, Zhenjiang Li, Yanle Hu, Qichao Zhou

Abstract: Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal. The Pix2Pix mode has excellent performance. But it requires paired and well pixel-wise aligned images, which may not always be achievable due to respiratory motion or anatomy change between times that paired images are acquired. The Cy… ▽ More Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal. The Pix2Pix mode has excellent performance. But it requires paired and well pixel-wise aligned images, which may not always be achievable due to respiratory motion or anatomy change between times that paired images are acquired. The Cycle-consistency mode is less stringent with training data and works well on unpaired or misaligned images. But its performance may not be optimal. In order to break the dilemma of the existing modes, we propose a new unsupervised mode called RegGAN for medical image-to-image translation. It is based on the theory of "loss-correction". In RegGAN, the misaligned target images are considered as noisy labels and the generator is trained with an additional registration network to fit the misaligned noise distribution adaptively. The goal is to search for the common optimal solution to both image-to-image translation and registration tasks. We incorporated RegGAN into a few state-of-the-art image-to-image translation methods and demonstrated that RegGAN could be easily combined with these methods to improve their performances. Such as a simple CycleGAN in our mode surpasses latest NICEGAN even though using less network parameters. Based on our results, RegGAN outperformed both Pix2Pix on aligned data and Cycle-consistency on misaligned or unpaired data. RegGAN is insensitive to noises which makes it a better choice for a wide range of scenarios, especially for medical image-to-image translation tasks in which well pixel-wise aligned data are not available △ Less

Submitted 10 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.04548 [pdf]

doi 10.1103/PhysRevX.12.041030

Imaging the ultrafast coherent control of a skyrmion crystal

Authors: Phoebe Tengdin, Benoit Truc, Alexey Sapozhnik, Lingyao Kong, Nina del Ser, Simone Gargiulo, Ivan Madan, Thomas Schoenenberger, Priya R. Baral, ** Che, Arnaud Magrez, Dirk Grundler, Henrik M. Rønnow, Thomas Lagrange, Jiadong Zang, Achim Rosch, Fabrizio Carbone

Abstract: Exotic magnetic textures emerging from the subtle interplay between thermodynamic and topological fluctuation have attracted intense interest due to their potential applications in spintronic devices. Recent advances in electron microscopy have enabled the imaging of random photo-generated individual skyrmions. However, their deterministic and dynamical manipulation is hampered by the chaotic natu… ▽ More Exotic magnetic textures emerging from the subtle interplay between thermodynamic and topological fluctuation have attracted intense interest due to their potential applications in spintronic devices. Recent advances in electron microscopy have enabled the imaging of random photo-generated individual skyrmions. However, their deterministic and dynamical manipulation is hampered by the chaotic nature of such fluctuations and the intrinsically irreversible switching between different minima in the magnetic energy landscape. Here, we demonstrate a method to coherently control the rotation of a skyrmion crystal by discrete amounts at speeds which are much faster than previously observed. By employing circularly polarized femtosecond laser pulses with an energy below the bandgap of the Mott insulator Cu2OSeO3, we excite a collective magnon mode via the inverse Faraday effect. This triggers coherent magnetic oscillations that directly control the rotation of a skyrmion crystal imaged by cryo-Lorentz Transmission Electron Microscopy. The manipulation of topological order via ultrafast laser pulses shown here can be used to engineer fast spin-based logical devices. △ Less

Submitted 22 July, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

arXiv:2110.03474 [pdf]

Topology-optimized ultra-compact all-optical logic devices on silicon photonic platforms

Authors: Lu He, Furong Zhang, Huizhen Zhang, Ling-Jun Kong, Weixuan Zhang, Xingsheng Xu, Xiangdong Zhang

Abstract: The realization of all-optical integration and optical computing has always been our goal. One of the most significant challenges is to make integrated all-optical logic devices as small as possible. Here, we report the implementation of ultra-compact all-optical logic devices and integrated chips on silicon photonic platforms by topology optimization. The footprint for the fabricated all-optical… ▽ More The realization of all-optical integration and optical computing has always been our goal. One of the most significant challenges is to make integrated all-optical logic devices as small as possible. Here, we report the implementation of ultra-compact all-optical logic devices and integrated chips on silicon photonic platforms by topology optimization. The footprint for the fabricated all-optical logic gates with XOR and OR functions is only 1.3*1.3 μm2 (~0.84λ*0.84λ), that are the smallest all-optical dielectric logic devices ever verified in experiments in the optical communication range. The ultra-low loss of the optical signal is also demonstrated experimentally (-0.96dB). Furthermore, an integrated chip containing seven major logic gates (AND, OR, NOT, NAND, NOR, XOR, and XNOR) and a half adder is fabricated, where the associated footprint is only 1.3*4.5 μm2. Our work opens up a new path towards practical all-optical integration and optical computing. △ Less

Submitted 15 January, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: 20 pages, 5 figures, 2 table

arXiv:2110.03155 [pdf, other]

The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Authors: Ke Sun, Yingnan Zhao, Enze Shi, Yafei Wang, Xiaodong Yan, Bei Jiang, Linglong Kong

Abstract: The theoretical advantages of distributional reinforcement learning~(RL) over classical RL remain elusive despite its remarkable empirical performance. Starting from Categorical Distributional RL~(CDRL), we attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regular… ▽ More The theoretical advantages of distributional reinforcement learning~(RL) over classical RL remain elusive despite its remarkable empirical performance. Starting from Categorical Distributional RL~(CDRL), we attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution information regardless of only its expectation, contributing to an augmented reward signal in the policy optimization. Compared with the entropy regularization in MaxEnt RL that explicitly optimizes the policy to encourage the exploration, the resulting regularization in CDRL implicitly optimizes policies guided by the new reward signal to align with the uncertainty of target return distributions, leading to an uncertainty-aware exploration effect. Finally, extensive experiments substantiate the importance of this uncertainty-aware regularization in distributional RL on the empirical benefits over classical RL. △ Less

Submitted 2 February, 2024; v1 submitted 6 October, 2021; originally announced October 2021.

arXiv:2110.02488 [pdf, other]

ABC: Attention with Bounded-memory Control

Authors: Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

Abstract: Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size… ▽ More Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size grows linearly with the sequence length, and so does the overhead of reading from it. One way to improve the efficiency is to bound the memory size. We show that disparate approaches can be subsumed into one abstraction, attention with bounded-memory control (ABC), and they vary in their organization of the memory. ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem apart. Second, this abstraction gives new insights--an established approach (Wang et al., 2020b) previously thought to be not applicable in causal attention, actually is. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their heuristic memory-organizing functions with a learned, contextualized one. Our experiments on language modeling, machine translation, and masked language model finetuning show that our approach outperforms previous efficient attention models; compared to the strong transformer baselines, it significantly improves the inference time and space efficiency with no or negligible accuracy loss. △ Less

Submitted 1 June, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.02453 [pdf, other]

Ripple Attention for Visual Perception with Sub-quadratic Complexity

Authors: Lin Zheng, Huijie Pan, Lingpeng Kong

Abstract: Transformer architectures are now central to sequence modeling tasks. At its heart is the attention mechanism, which enables effective modeling of long-term dependencies in a sequence. Recently, transformers have been successfully applied in the computer vision domain, where 2D images are first segmented into patches and then treated as 1D sequences. Such linearization, however, impairs the notion… ▽ More Transformer architectures are now central to sequence modeling tasks. At its heart is the attention mechanism, which enables effective modeling of long-term dependencies in a sequence. Recently, transformers have been successfully applied in the computer vision domain, where 2D images are first segmented into patches and then treated as 1D sequences. Such linearization, however, impairs the notion of spatial locality in images, which bears important visual clues. To bridge the gap, we propose ripple attention, a sub-quadratic attention mechanism for vision transformers. Built upon the recent kernel-based efficient attention mechanisms, we design a novel dynamic programming algorithm that weights contributions of different tokens to a query with respect to their relative spatial distances in the 2D space in linear observed time. Extensive experiments and analyses demonstrate the effectiveness of ripple attention on various visual tasks. △ Less

Submitted 15 June, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: 19 pages, 2 figures, ICML 2022 camera ready

arXiv:2109.12425 [pdf, other]

doi 10.1145/3459637.3482360

L$^{2}$NAS: Learning to Optimize Neural Architectures via Continuous-Action Reinforcement Learning

Authors: Keith G. Mills, Fred X. Han, Mohammad Salameh, Seyed Saeed Changiz Rezaei, Linglong Kong, Wei Lu, Shuo Lian, Shangling Jui, Di Niu

Abstract: Neural architecture search (NAS) has achieved remarkable results in deep neural network design. Differentiable architecture search converts the search over discrete architectures into a hyperparameter optimization problem which can be solved by gradient descent. However, questions have been raised regarding the effectiveness and generalizability of gradient methods for solving non-convex architect… ▽ More Neural architecture search (NAS) has achieved remarkable results in deep neural network design. Differentiable architecture search converts the search over discrete architectures into a hyperparameter optimization problem which can be solved by gradient descent. However, questions have been raised regarding the effectiveness and generalizability of gradient methods for solving non-convex architecture hyperparameter optimization problems. In this paper, we propose L$^{2}$NAS, which learns to intelligently optimize and update architecture hyperparameters via an actor neural network based on the distribution of high-performing architectures in the search history. We introduce a quantile-driven training procedure which efficiently trains L$^{2}$NAS in an actor-critic framework via continuous-action reinforcement learning. Experiments show that L$^{2}$NAS achieves state-of-the-art results on NAS-Bench-201 benchmark as well as DARTS search space and Once-for-All MobileNetV3 search space. We also show that search policies generated by L$^{2}$NAS are generalizable and transferable across different training datasets with minimal fine-tuning. △ Less

Submitted 25 September, 2021; originally announced September 2021.

Comments: Accepted as a Full Research Paper at CIKM 2021; 10 pages, 3 Figures, 5 Tables

arXiv:2109.12270 [pdf]

doi 10.1093/mnras/stab2760

Search for Gamma-Ray Bursts and Gravitational Wave Electromagnetic Counterparts with High Energy X-ray Telescope of \textit{Insight}-HXMT

Authors: C. Cai, S. L. Xiong, C. K. Li, C. Z. Liu, S. N. Zhang, X. B. Li, L. M. Song, B. Li, S. Xiao, Q. B. Yi, Y. Zhu, Y. G. Zheng, W. Chen, Q. Luo, Y. Huang, X. Y. Song, H. S. Zhao, Y. Zhao, Z. Zhang, Q. C. Bu, X. L. Cao, Z. Chang, L. Chen, T. X. Chen, Y. B. Chen , et al. (74 additional authors not shown)

Abstract: The High Energy X-ray telescope (HE) on-board the Hard X-ray Modulation Telescope (\textit{Insight}-HXMT) can serve as a wide Field of View (FOV) gamma-ray monitor with high time resolution ($μ$s) and large effective area (up to thousands cm$^2$). We developed a pipeline to search for Gamma-Ray Bursts (GRBs), using the traditional signal-to-noise ratio (SNR) method for blind search and the coheren… ▽ More The High Energy X-ray telescope (HE) on-board the Hard X-ray Modulation Telescope (\textit{Insight}-HXMT) can serve as a wide Field of View (FOV) gamma-ray monitor with high time resolution ($μ$s) and large effective area (up to thousands cm$^2$). We developed a pipeline to search for Gamma-Ray Bursts (GRBs), using the traditional signal-to-noise ratio (SNR) method for blind search and the coherent search method for targeted search. By taking into account the location and spectrum of the burst and the detector response, the targeted coherent search is more powerful to unveil weak and sub-threshold bursts, especially those in temporal coincidence with Gravitational Wave (GW) events. Based on the original method in literature, we further improved the coherent search to filter out false triggers caused by spikes in light curves, which are commonly seen in gamma-ray instruments (e.g. \textit{Fermi}/GBM, \textit{POLAR}). We show that our improved targeted coherent search method could eliminate almost all false triggers caused by spikes. Based on the first two years of \textit{Insight}-HXMT/HE data, our targeted search recovered 40 GRBs, which were detected by either \textit{Swift}/BAT or \textit{Fermi}/GBM but too weak to be found in our blind search. With this coherent search pipeline, the GRB detection sensitivity of \textit{Insight}-HXMT/HE is increased to about 1.5E-08 erg/cm$^2$ (200 keV$-$3 MeV). We also used this targeted coherent method to search \textit{Insight}-HXMT/HE data for electromagnetic (EM) counterparts of LIGO-Virgo GW events (including O2 and O3a runs). However, we did not find any significant burst associated with GW events. △ Less

Submitted 25 September, 2021; originally announced September 2021.

Comments: 12 pages, 14 figures, 5 tables; accepted for publication in MNRAS

arXiv:2109.09889 [pdf, other]

A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning

Authors: Hongming Zhang, Ke Sun, Bo Xu, Linglong Kong, Martin Müller

Abstract: Abnormal states in deep reinforcement learning~(RL) are states that are beyond the scope of an RL policy. Such states may lead to sub-optimal and unsafe decision making for the RL system, impeding its deployment in real scenarios. In this paper, we propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribu… ▽ More Abnormal states in deep reinforcement learning~(RL) are states that are beyond the scope of an RL policy. Such states may lead to sub-optimal and unsafe decision making for the RL system, impeding its deployment in real scenarios. In this paper, we propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers. In particular, we attain the class-conditional distributions for each action class under the Gaussian assumption, and rely on these distributions to discriminate between inliers and outliers based on Mahalanobis Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive experiments on Atari games that verify the effectiveness of our detection strategies. To the best of our knowledge, we present the first in-detail study of statistical and adversarial anomaly detection in deep RL algorithms. This simple unified anomaly detection paves the way towards deploying safe RL systems in real-world applications. △ Less

Submitted 20 August, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: 19 pages, 21 figures

arXiv:2109.08776 [pdf, other]

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Authors: Ke Sun, Yingnan Zhao, Shangling Jui, Linglong Kong

Abstract: In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation,… ▽ More In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the KL divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart. △ Less

Submitted 21 June, 2023; v1 submitted 17 September, 2021; originally announced September 2021.

Comments: Accepted in ECML PKDD 2023. This is the authors version of the work. The definitive Version of Record will be published in the Proceedings of ECML PKDD 2023

arXiv:2109.07438 [pdf, other]

CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting

Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodríguez, Chao Zhang, B. Aditya Prakash

Abstract: Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information as well as uncertainty from these data sources for well-calibrated and accurate forecasts is an important challenging problem. Most previous work on multi-modal learning and forecasti… ▽ More Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information as well as uncertainty from these data sources for well-calibrated and accurate forecasts is an important challenging problem. Most previous work on multi-modal learning and forecasting simply aggregate intermediate representations from each data view by simple methods of summation or concatenation and do not explicitly model uncertainty for each data-view. We propose a general probabilistic multi-view forecasting framework CAMul, that can learn representations and uncertainty from diverse data sources. It integrates the knowledge and uncertainty from each data view in a dynamic context-specific manner assigning more importance to useful views to model a well-calibrated forecast distribution. We use CAMul for multiple domains with varied sources and modalities and show that CAMul outperforms other state-of-art probabilistic forecasting models by over 25\% in accuracy and calibration. △ Less

Submitted 25 February, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: 16 pages, 4 figures. Accepted at WWW 2022

arXiv:2109.02088 [pdf, other]

doi 10.1093/mnras/stab2521

State transitions of GX 339-4 during its outburst rising phase

Authors: Q. C. Shui, H. X. Yin, S. Zhang, J. L. Qu, Y. P. Chen, L. D. Kong, P. J. Wang, H. F. Zhang, J. X. Song, B. Ning, Y. F. Wang, Z. Chang, P. Zhang

Abstract: We investigate systematically four outbursts of black hole system GX 339-4 observed by the Rossi X-ray Timing Explorer (RXTE) in both spectral and timing domains and find that these outbursts have some common properties although they experience different 'q' tracks in the hardness-intensity diagram (HID). While the spectral indices are around 1.5 in low hard state (LHS), 2.4 in soft intermediate s… ▽ More We investigate systematically four outbursts of black hole system GX 339-4 observed by the Rossi X-ray Timing Explorer (RXTE) in both spectral and timing domains and find that these outbursts have some common properties although they experience different 'q' tracks in the hardness-intensity diagram (HID). While the spectral indices are around 1.5 in low hard state (LHS), 2.4 in soft intermediate state (SIMS) and high soft state (HSS), the spectral parameters of thermal, non-thermal and reflection components vary significantly in transitions from LHS to HIMS. Also the quasi periodic oscillation (QPO) shows a peculiar behavior during the state transition between LHS and HIMS: the RMS drop of type C fundamental QPO is accompanied with showing-up of the second harmonic. Interestingly, the QPO RMS is found to have a similar linear relationship with the non-thermal fraction of emission in different outbursts. These findings provide more clues to our understanding the outburst of the black hole X-ray binary system. △ Less

Submitted 5 September, 2021; originally announced September 2021.

Comments: 13 papes, 12 figures

arXiv:2109.00660 [pdf]

doi 10.1063/5.0068449

Improve photon number discrimination for a superconducting series nanowire detector by applying a digital matched filter

Authors: Hao Hao, Qing-Yuan Zhao, Ling-Dong Kong, Shi Chen, Hui Wang, Yang-Hui Huang, Jia-Wei Guo, Wan Chao, Hao Liu, Xue-Cou Tu, La-Bao Zhang, Xiao-Qing Jia, Jian Chen, Lin Kang, Cong Li, Te Chen, Gui-Xing Cao, Pei-Heng Wu

Abstract: Photon number resolving (PNR) is an important capacity for detectors working in quantum and classical applications. Although a conventional superconducting nanowire single-photon detector (SNSPD) is not a PNR detector, by arranging nanowires in a series array and multiplexing photons over space, such series PNR-SNSPD can gain quasi-PNR capacity. However, the accuracy and maximum resolved photon nu… ▽ More Photon number resolving (PNR) is an important capacity for detectors working in quantum and classical applications. Although a conventional superconducting nanowire single-photon detector (SNSPD) is not a PNR detector, by arranging nanowires in a series array and multiplexing photons over space, such series PNR-SNSPD can gain quasi-PNR capacity. However, the accuracy and maximum resolved photon number are both limited by the signal-to-noise (SNR) ratio of the output pulses. Here, we introduce a matched filter, which is an optimal filter in terms of SNR for SNSPD pulses. Experimentally, compared to conventional readout using a room-temperature amplifier, the normalized spacing between pulse amplitudes from adjacent photon number detections increased by a maximum factor of 2.1 after the matched filter. Combining with a cryogenic amplifier to increase SNR further, such spacing increased by a maximum factor of 5.3. In contrast to a low pass filter, the matched filter gave better SNRs while maintaining good timing jitters. Minimum timing jitter of 55 ps was obtained experimentally. Our results suggest that the matched filter is a useful tool for improving the performance of the series PNR-SNSPD and the maximum resolved photon number can be expected to reach 65 or even large. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2108.12850 [pdf, other]

doi 10.7498/aps.69.20200717

Emergent vortex Majorana zero mode in iron-based superconductors

Authors: Lingyuan Kong, Hong Ding

Abstract: The vortex of iron-based superconductors is emerging as a promising platform for Majorana zero mode, owing to a magic integration among intrinsic vortex winding, non-trivial band topology, strong electron-electron correlations, high-Tc superconductivity and the simplification of single material. It overcomes many difficulties suffered in heterostructure-based Majorana platforms, including small to… ▽ More The vortex of iron-based superconductors is emerging as a promising platform for Majorana zero mode, owing to a magic integration among intrinsic vortex winding, non-trivial band topology, strong electron-electron correlations, high-Tc superconductivity and the simplification of single material. It overcomes many difficulties suffered in heterostructure-based Majorana platforms, including small topological gap, interfacial contamination, lattice imperfections, and etc. Isolated zero-bias peaks have been found in vortex of several iron-based superconductors. So far, studies from both experimental and theoretical aspects strongly indicate the realization of vortex Majorana zero mode, with a potential to be applied to topological quantum computation. By taking Fe(Te,Se) superconductor as an example, here we review original idea and research progress of Majorana zero modes in this new platform. After introducing the identifications of topological band structure and real zero modes in vortex, we summarize the physics behaviors of vortex Majorana zero modes systematically. Firstly, relying on the behavior of the zero mode wave function and evidence of quasiparticle poisoning, we analyze the mechanism of emergence of vortex Majorana zero modes. Secondly, assisted with some well-established theories, we elaborate the measurements on Majorana symmetry and topological nature of vortex Majorana zero modes. After that, we switch from quantum physics to quantum engineering, and analyze the performance of vortex Majorana zero mode under real circumstances, which may potentially benefit the exploration of practical applications in the future. This review follows the physics properties of vortex Majorana zero modes, especially emphasizes the link between phenomena and mechanisms. It provides a chance to bridge the gap between the well-established theories and the newly discovered iron home of Majoranas. △ Less

Submitted 29 August, 2021; originally announced August 2021.

Comments: 41 pages, 17 figures. A review article of Majorana zero modes on iron-based superconducting vortex

Journal ref: Acta Physica Sinica 69, 110301 (2020)

arXiv:2108.12589 [pdf, other]

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

Authors: Fei Mi, Wanhao Zhou, Fengyu Cai, Ling**g Kong, Minlie Huang, Boi Faltings

Abstract: As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. In this paper, we devise a self-training approach to utilize the abundant unlabeled dialog data to further i… ▽ More As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. In this paper, we devise a self-training approach to utilize the abundant unlabeled dialog data to further improve state-of-the-art pre-trained models in few-shot learning scenarios for ToD systems. Specifically, we propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model. Moreover, a new text augmentation technique (GradAug) is proposed to better train the Student by replacing non-crucial tokens using a masked language model. We conduct extensive experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection. Empirical results demonstrate that the proposed self-training approach consistently improves state-of-the-art pre-trained models (BERT, ToD-BERT) when only a small number of labeled data are available. △ Less

Submitted 28 August, 2021; originally announced August 2021.

Comments: Accepted as Long Paper at "EMNLP, 2021"

arXiv:2108.08835 [pdf, other]

doi 10.1007/JHEP03(2022)022

One dimensional gapped quantum phases and enriched fusion categories

Authors: Liang Kong, Xiao-Gang Wen, Hao Zheng

Abstract: In this work, we use Ising chain and Kitaev chain to check the validity of an earlier proposal in arXiv:2011.02859 that enriched fusion (higher) categories provide a unified categorical description of all gapped/gapless quantum liquid phases, including symmetry-breaking phases, topological orders, SPT/SET orders and certain gapless quantum phases. In particular, we show explicitly that, in each ga… ▽ More In this work, we use Ising chain and Kitaev chain to check the validity of an earlier proposal in arXiv:2011.02859 that enriched fusion (higher) categories provide a unified categorical description of all gapped/gapless quantum liquid phases, including symmetry-breaking phases, topological orders, SPT/SET orders and certain gapless quantum phases. In particular, we show explicitly that, in each gapped phase realized by these two models, the spacetime observables form a fusion category enriched in a braided fusion category. We also study the categorical descriptions of the boundaries of these models. In the end, we provide a classification of and the categorical descriptions of all 1-dimensional (the spatial dimension) gapped quantum phases with a finite onsite symmetry. △ Less

Submitted 10 March, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: 27 pages. We add some remarks and references

Journal ref: J. High Energ. Phys. 2022, 22 (2022)

arXiv:2108.06605 [pdf, other]

doi 10.1016/j.cam.2022.114872

Gradient Projection Newton Algorithm for Sparse Collaborative Learning Using Synthetic and Real Datasets of Applications

Authors: Jun Sun, Lingchen Kong, Shenglong Zhou

Abstract: Exploring the relationship among multiple sets of data from one same group enables practitioners to make better decisions in medical science and engineering. In this paper, we propose a sparse collaborative learning (SCL) model, an optimization with double-sparsity constraints, to process the problem with two sets of data and a shared response variable. It is capable of dealing with the classifica… ▽ More Exploring the relationship among multiple sets of data from one same group enables practitioners to make better decisions in medical science and engineering. In this paper, we propose a sparse collaborative learning (SCL) model, an optimization with double-sparsity constraints, to process the problem with two sets of data and a shared response variable. It is capable of dealing with the classification problems or the regression problems dependent on the discreteness of the response variable as well as exploring the relationship between two datasets simultaneously. To solve SCL, we first present some necessary and sufficient optimality conditions and then design a gradient projection Newton algorithm which has proven to converge to a unique locally optimal solution globally with at least a quadratic convergence rate. Finally, the reported numerical experiments illustrate the efficiency of the proposed method. △ Less

Submitted 13 November, 2022; v1 submitted 14 August, 2021; originally announced August 2021.

Journal ref: Journal of Computational and Applied Mathematics 2022

arXiv:2108.02487 [pdf]

doi 10.1038/s41565-021-00954-9

Magnetic Skyrmion Bundles and Their Current-Driven Dynamics

Authors: ** Tang, Yaodong Wu, Weiwei Wang, Lingyao Kong, Boyao Lv, Wensen Wei, Jiadong Zang, Mingliang Tian, Haifeng Du

Abstract: Quantization of topological charges determines the various topological spin textures that are expected to play a key role in future spintronic devices. While the magnetic skyrmion with a unit topological charge Q has been extensively studied, spin textures with other integer valued have not been verified well so far. Here, we report the real-space image, creation, and manipulation of a type of mul… ▽ More Quantization of topological charges determines the various topological spin textures that are expected to play a key role in future spintronic devices. While the magnetic skyrmion with a unit topological charge Q has been extensively studied, spin textures with other integer valued have not been verified well so far. Here, we report the real-space image, creation, and manipulation of a type of multi Q three-dimensional skyrmionic texture, where a circular spin spiral ties a bunch of skyrmion tubes. We define these objects as skyrmion bundles, and show they have arbitrarily integer values Q from negative up to at least 55 in our experiment. These textures behave as quasiparticles in dynamics for the collective motions driven by electric pulses. Similar to the skyrmion, skyrmion bundles with non zero Q exhibit the skyrmion Hall effects with a Hall angle of 62 degree. Of particular interest, the skyrmion bundle with Q = 0 propagates collinearly with respect to the current flow without the skyrmion Hall effect. Our results open a new perspective for possible applications of multi Q magnetic objects in future spintronic devices. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Journal ref: Nature Nanotechnology (2021)

arXiv:2108.02485 [pdf, other]

doi 10.3847/2041-8213/ac1ad3

Luminosity dependence of the cyclotron line energy in 1A 0535+262 observed by Insight-HXMT during 2020 giant outburst

Authors: L. D. Kong, S. Zhang, L. Ji, P. Reig, V. Doroshenko, A. Santangelo, R. Staubert, S. N. Zhang, R. Soria, Z. Chang, Y. P. Chen, P. J. Wang, L. Tao, J. L. Qu

Abstract: We report on a detailed spectral analysis of the transient X-ray pulsar 1A~0535+262, which underwent the brightest giant outburst ever recorded for this source from November to December 2020 with a peak luminosity of $1.2$ $\times10^{38}\ \rm erg\ s^{-1}$. Thanks to the unprecedented energy coverage and high cadence observations provided by Insight-HXMT, we were able to find for the first time evi… ▽ More We report on a detailed spectral analysis of the transient X-ray pulsar 1A~0535+262, which underwent the brightest giant outburst ever recorded for this source from November to December 2020 with a peak luminosity of $1.2$ $\times10^{38}\ \rm erg\ s^{-1}$. Thanks to the unprecedented energy coverage and high cadence observations provided by Insight-HXMT, we were able to find for the first time evidence for a transition of the accretion regime. At high luminosity, above the critical luminosity $6.7\times10^{37}$ erg s$^{-1}$, the cyclotron absorption line energy anti-correlates with luminosity. Below the critical luminosity, a positive correlation is observed. The 1A~0535+262 becomes, therefore, the second source after V~0332+53, which clearly shows an anti-correlation above and transition between correlation and anti-correlation around the critical luminosity. The evolution of both the observed CRSF line energy and broadband X-ray continuum spectrum throughout the outburst exhibits significant differences during the rising and fading phases: that is, for a similar luminosity the spectral parameters take different values which results in hysteresis patterns for several spectral parameters including the cyclotron line energy. We argue that, similarly to V~0332+53, these changes might be related to different geometry of the emission region in rising and declining parts of the outburst, probably due to changes in the accretion disk structure and its interaction with the magnetosphere of the neutron star. △ Less

Submitted 6 August, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

arXiv:2108.02093 [pdf, other]

Free Lunch for Co-Saliency Detection: Context Adjustment

Authors: Lingdong Kong, Prakhar Ganesh, Tan Wang, Junhao Liu, Le Zhang, Yao Chen

Abstract: We unveil a long-standing problem in the prevailing co-saliency detection systems: there is indeed inconsistency between training and testing. Constructing a high-quality co-saliency detection dataset involves time-consuming and labor-intensive pixel-level labeling, which has forced most recent works to rely instead on semantic segmentation or saliency detection datasets for training. However, the… ▽ More We unveil a long-standing problem in the prevailing co-saliency detection systems: there is indeed inconsistency between training and testing. Constructing a high-quality co-saliency detection dataset involves time-consuming and labor-intensive pixel-level labeling, which has forced most recent works to rely instead on semantic segmentation or saliency detection datasets for training. However, the lack of proper co-saliency and the absence of multiple foreground objects in these datasets can lead to spurious variations and inherent biases learned by models. To tackle this, we introduce the idea of counterfactual training through context adjustment and propose a "cost-free" group-cut-paste (GCP) procedure to leverage off-the-shelf images and synthesize new samples. Following GCP, we collect a novel dataset called Context Adjustment Training (CAT). CAT consists of 33,500 images, which is four times larger than the current co-saliency detection datasets. All samples are automatically annotated with high-quality mask annotations, object categories, and edge maps. Extensive experiments on recent benchmarks are conducted, show that CAT can improve various state-of-the-art models by a large margin (5% ~ 25%). We hope that the scale, diversity, and quality of our dataset can benefit researchers in this area and beyond. Our dataset will be publicly accessible through our project page. △ Less

Submitted 30 September, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

arXiv:2107.14580 [pdf, other]

Distributed Event- and Self-Triggered Coverage Control with Speed Constrained Unicycle Robots

Authors: Yuni Zhou, Lingxuan Kong, Stefan Sosnowski, Qingchen Liu, Sandra Hirche

Abstract: Voronoi coverage control is a particular problem of importance in the area of multi-robot systems, which considers a network of multiple autonomous robots, tasked with optimally covering a large area. This is a common task for fleets of fixed-wing Unmanned Aerial Vehicles (UAVs), which are described in this work by a unicycle model with constant forward-speed constraints. We develop event-based co… ▽ More Voronoi coverage control is a particular problem of importance in the area of multi-robot systems, which considers a network of multiple autonomous robots, tasked with optimally covering a large area. This is a common task for fleets of fixed-wing Unmanned Aerial Vehicles (UAVs), which are described in this work by a unicycle model with constant forward-speed constraints. We develop event-based control/communication algorithms to relax the resource requirements on wireless communication and control actuators, an important feature for battery-driven or otherwise energy-constrained systems. To overcome the drawback that the event-triggered algorithm requires continuous measurement of system states, we propose a self-triggered algorithm to estimate the next triggering time. Hardware experiments illustrate the theoretical results. △ Less

Submitted 30 July, 2021; originally announced July 2021.

arXiv:2107.03858 [pdf, ps, other]

Categories of quantum liquids II

Authors: Liang Kong, Hao Zheng

Abstract: We continue to develop the theory of separable higher categories, including center functors, higher centralizers, modular extensions and group theoretical higher fusion categories. Moreover, we outline a theory of orthogonal higher categories to treat anti-unitary symmetries. Using these results we derive a systematic classification of gapped quantum liquids and predict many new SPT orders in spac… ▽ More We continue to develop the theory of separable higher categories, including center functors, higher centralizers, modular extensions and group theoretical higher fusion categories. Moreover, we outline a theory of orthogonal higher categories to treat anti-unitary symmetries. Using these results we derive a systematic classification of gapped quantum liquids and predict many new SPT orders in spacetime dimension $\ge3$. △ Less

Submitted 29 November, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 27 pages. Major revision

arXiv:2106.11970 [pdf, other]

Learned Interpretable Residual Extragradient ISTA for Sparse Coding

Authors: Lin Kong, Wei Sun, Fanhua Shang, Yuanyuan Liu, Hongying Liu

Abstract: Recently, the study on learned iterative shrinkage thresholding algorithm (LISTA) has attracted increasing attentions. A large number of experiments as well as some theories have proved the high efficiency of LISTA for solving sparse coding problems. However, existing LISTA methods are all serial connection. To address this issue, we propose a novel extragradient based LISTA (ELISTA), which has a… ▽ More Recently, the study on learned iterative shrinkage thresholding algorithm (LISTA) has attracted increasing attentions. A large number of experiments as well as some theories have proved the high efficiency of LISTA for solving sparse coding problems. However, existing LISTA methods are all serial connection. To address this issue, we propose a novel extragradient based LISTA (ELISTA), which has a residual structure and theoretical guarantees. In particular, our algorithm can also provide the interpretability for Res-Net to a certain extent. From a theoretical perspective, we prove that our method attains linear convergence. In practice, extensive empirical results verify the advantages of our method. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: Accepted for presentation at the ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

arXiv:2106.03904 [pdf, other]

When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting

Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodríguez, Chao Zhang, B. Aditya Prakash

Abstract: Accurate and trustworthy epidemic forecasting is an important problem that has impact on public health planning and disease mitigation. Most existing epidemic forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations; e.g. it is difficult to specify… ▽ More Accurate and trustworthy epidemic forecasting is an important problem that has impact on public health planning and disease mitigation. Most existing epidemic forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations; e.g. it is difficult to specify meaningful priors in Bayesian NNs, while methods like deep ensembling are computationally expensive in practice. In this paper, we fill this important gap. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP, which directly models the probability density of the forecast value. EPIFNP leverages a dynamic stochastic correlation graph to model the correlations between sequences in a non-parametric way, and designs different stochastic latent variables to capture functional uncertainty from different perspectives. Our extensive experiments in a real-time flu forecasting setting show that EPIFNP significantly outperforms previous state-of-the-art models in both accuracy and calibration metrics, up to 2.5x in accuracy and 2.4x in calibration. Additionally, due to properties of its generative process,EPIFNP learns the relations between the current season and similar patterns of historical seasons,enabling interpretable forecasts. Beyond epidemic forecasting, the EPIFNP can be of independent interest for advancing principled uncertainty quantification in deep sequential models for predictive analytics △ Less

Submitted 15 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2106.01924 [pdf]

Highly Efficient Ultrathin Light Emitting Diodes based on Perovskite Nanocrystals

Authors: Qun Wan, Weilin Zheng, Chen Zoub, Francesco Carulli, Congyang Zhang, Haili Song, Mingming Liu, Qinggang Zhang, Lih Y. Lin, Long Kong, Liang Li, Sergio Brovelli

Abstract: Light-emitting diodes based on perovskite nanocrystals (PNCs-LEDs) have gained great interest for next-generation display and lighting technologies prized for their color purity, high brightness and luminous efficiency approaching the intrinsic limit imposed by extraction of electroluminescence from the device structure. Although the time is ripe for the development of effective light outcoupling… ▽ More Light-emitting diodes based on perovskite nanocrystals (PNCs-LEDs) have gained great interest for next-generation display and lighting technologies prized for their color purity, high brightness and luminous efficiency approaching the intrinsic limit imposed by extraction of electroluminescence from the device structure. Although the time is ripe for the development of effective light outcoupling strategies to further boost the device performance, this technologically relevant aspect of PNC-LEDs is still without a definitive solution. Here, following theoretical guidelines and without the integration of complex photonic structures, we realize stable PNC-LEDs with EQE as high as 29.2% (average EQE=24.7%), which substantially break the outcoupling limit of common PNC-LEDs and systematically surpass any previous perovskite-based device. Key to such unprecedented performance is channeling the recombination zone in PNC emissive layers as thin as 10 nm, which we achieve by finely balancing the electron and hole transport using CsPbBr3 PNCs resurfaced with a nickel oxide layer. The ultra-thin approach general and, in principle, applicable to other perovskite nanostructures for fabricating highly efficient, color tunable transparent LEDs ideal for unobtrusive screens and displays and is compatible with the integration of photonic components for further enhanced performance. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2105.14850 [pdf, other]

Cascaded Head-colliding Attention

Authors: Lin Zheng, Zhiyong Wu, Lingpeng Kong

Abstract: Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that ma… ▽ More Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that many of the heads are redundant in practice, which greatly wastes the capacity of the model. To improve parameter efficiency, we re-formulate the MHA as a latent variable model from a probabilistic perspective. We present cascaded head-colliding attention (CODA) which explicitly models the interactions between attention heads through a hierarchical variational distribution. We conduct extensive experiments and demonstrate that CODA outperforms the transformer baseline, by $0.6$ perplexity on \texttt{Wikitext-103} in language modeling, and by $0.6$ BLEU on \texttt{WMT14 EN-DE} in machine translation, due to its improvements on the parameter efficiency.\footnote{Our implementation is publicly available at \url{https://github.com/LZhengisme/CODA}.} △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: ACL 2021 Camera-ready version

arXiv:2105.14462 [pdf, other]

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Authors: Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao

Abstract: A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information. Many recent studies report improvements when equip** their models with the multimodal module, despite the controversy of whether such improvements indeed come from the multimodal part. We revisit the contribution o… ▽ More A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information. Many recent studies report improvements when equip** their models with the multimodal module, despite the controversy of whether such improvements indeed come from the multimodal part. We revisit the contribution of multimodal information in MMT by devising two interpretable MMT models. To our surprise, although our models replicate similar gains as recently developed multimodal-integrated systems achieved, our models learn to ignore the multimodal information. Upon further investigation, we discover that the improvements achieved by the multimodal models over text-only counterparts are in fact results of the regularization effect. We report empirical findings that highlight the importance of MMT models' interpretability, and discuss how our findings will benefit future research. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: To appear at ACL 2021 main conference

arXiv:2105.12682 [pdf, other]

Zero-shot Medical Entity Retrieval without Annotation: Learning From Rich Knowledge Graph Semantics

Authors: Luyang Kong, Christopher Winestock, Parminder Bhatia

Abstract: Medical entity retrieval is an integral component for understanding and communicating information across various health systems. Current approaches tend to work well on specific medical domains but generalize poorly to unseen sub-specialties. This is of increasing concern under a public health crisis as new medical conditions and drug treatments come to light frequently. Zero-shot retrieval is cha… ▽ More Medical entity retrieval is an integral component for understanding and communicating information across various health systems. Current approaches tend to work well on specific medical domains but generalize poorly to unseen sub-specialties. This is of increasing concern under a public health crisis as new medical conditions and drug treatments come to light frequently. Zero-shot retrieval is challenging due to the high degree of ambiguity and variability in medical corpora, making it difficult to build an accurate similarity measure between mentions and concepts. Medical knowledge graphs (KG), however, contain rich semantics including large numbers of synonyms as well as its curated graphical structures. To take advantage of this valuable information, we propose a suite of learning tasks designed for training efficient zero-shot entity retrieval models. Without requiring any human annotation, our knowledge graph enriched architecture significantly outperforms common zero-shot benchmarks including BM25 and Clinical BERT with 7% to 30% higher recall across multiple major medical ontologies, such as UMLS, SNOMED, and ICD-10. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.06074 [pdf, other]

doi 10.1103/PhysRevLett.130.070601

Quantum Instruction Set Design for Performance

Authors: Cup** Huang, Tenghui Wang, Feng Wu, Dawei Ding, Qi Ye, Linghang Kong, Fang Zhang, Xiaotong Ni, Zhijun Song, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng, Jianxin Chen

Abstract: A quantum instruction set is where quantum hardware and software meet. We develop new characterization and compilation techniques for non-Clifford gates to accurately evaluate different quantum instruction set designs. We specifically apply them to our fluxonium processor that supports mainstream instruction $\mathrm{iSWAP}$ by calibrating and characterizing its square root $\mathrm{SQiSW}$. We me… ▽ More A quantum instruction set is where quantum hardware and software meet. We develop new characterization and compilation techniques for non-Clifford gates to accurately evaluate different quantum instruction set designs. We specifically apply them to our fluxonium processor that supports mainstream instruction $\mathrm{iSWAP}$ by calibrating and characterizing its square root $\mathrm{SQiSW}$. We measure a gate fidelity of up to $99.72\%$ with an average of $99.31\%$ and realize Haar random two-qubit gates using $\mathrm{SQiSW}$ with an average fidelity of $96.38\%$. This is an average error reduction of $41\%$ for the former and a $50\%$ reduction for the latter compared to using $\mathrm{iSWAP}$ on the same processor. This shows designing the quantum instruction set consisting of $\mathrm{SQiSW}$ and single-qubit gates on such platforms leads to a performance boost at almost no cost. △ Less

Submitted 28 June, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: 2 figures in main text and 21 figures in Supplementary Materials. This manuscript subsumes version 1 with significant improvements such as experimental demonstration and materials presentation

arXiv:2104.14063 [pdf, ps, other]

Nonuniform Berry-Esseen bound for self-normalized martingales

Authors: Songqi Wu, Lingjie Kong

Abstract: We give a nonuniform Berry-Esseen bound for self-normalized martingales, which bridges the gap between the result of Haeusler (1988) and Fan and Shao (2018). The bound coincides with the nonuniform Berry-Esseen bound of Haeusler and Joos (1988) for standardized martingales. As a consequence, a Berry-Esseen bound is obtained. We give a nonuniform Berry-Esseen bound for self-normalized martingales, which bridges the gap between the result of Haeusler (1988) and Fan and Shao (2018). The bound coincides with the nonuniform Berry-Esseen bound of Haeusler and Joos (1988) for standardized martingales. As a consequence, a Berry-Esseen bound is obtained. △ Less

Submitted 28 April, 2021; originally announced April 2021.

arXiv:2104.11957 [pdf, ps, other]

Positive solutions for a coupled nonlinear Kirchhoff-type system with vanishing potentials

Authors: Lingzheng Kong, Haibo Chen

Abstract: In this paper, we consider the strongly coupled nonlinear Kirchhoff-type system with vanshing potentials: \begin{equation*}\begin{cases} -\left(a_1+b_1\int_{\mathbb{R}^3}|\nabla u|^2\dx\right)Δu+λV(x)u=\fracα{α+β}|u|^{α-2}u|v|^β,&x\in\mathbb{R}^3,\\ -\left(a_2+b_2\int_{\mathbb{R}^3}|\nabla v|^2\dx\right)Δv+λW(x)v=\fracβ{α+β}|u|^α|v|^{β-2}v,&x\in\mathbb{R}^3,\\ u,v\in \mathcal{D}^{1,2}(\R^3), \end{… ▽ More In this paper, we consider the strongly coupled nonlinear Kirchhoff-type system with vanshing potentials: \begin{equation*}\begin{cases} -\left(a_1+b_1\int_{\mathbb{R}^3}|\nabla u|^2\dx\right)Δu+λV(x)u=\fracα{α+β}|u|^{α-2}u|v|^β,&x\in\mathbb{R}^3,\\ -\left(a_2+b_2\int_{\mathbb{R}^3}|\nabla v|^2\dx\right)Δv+λW(x)v=\fracβ{α+β}|u|^α|v|^{β-2}v,&x\in\mathbb{R}^3,\\ u,v\in \mathcal{D}^{1,2}(\R^3), \end{cases}\end{equation*} where $a_i>0$ are constants, $λ,b_i>0$ are parameters for $i=1,2$, $α,β>1$ and $α+β\leqslant 4$, $V(x)$, $W(x)$ are nonnegative continuous potentials, the nonlinear term $F(x,u,v)=|u|^α|v|^β$ is not 4-superlinear at infinity. Such problem cannot be studied directly by standard variational methods, even by restricting the associated energy functional on the Nehari manifold, because Palais-Smale sequences may not be bounded. Combining some new detailed estimates with truncation technique, we obtain the existence of positive vector solutions for the above system when $b_1+b_2$ small and $λ$ large. Moreover, the asymptotic behavior of these vector solutions is also explored as $\textbf{b}=(b_1,b_2)\to \bf{0}$ and $λ\to\infty$. In particular, our results extend some known ones in previous papers that only deals with the case where $4<α+β<6$. △ Less

Submitted 2 October, 2022; v1 submitted 24 April, 2021; originally announced April 2021.

arXiv:2104.03121 [pdf, other]

Enriched monoidal categories I: centers

Authors: Liang Kong, Wei Yuan, Zhi-Hao Zhang, Hao Zheng

Abstract: This work is the first one in a series, in which we develop a mathematical theory of enriched (braided) monoidal categories and their representations. In this work, we introduce the notion of the $E_0$-center ($E_1$-center or $E_2$-center) of an enriched (monoidal or braided monoidal) category, and compute the centers explicitly when the enriched (braided monoidal or monoidal) categories are obtai… ▽ More This work is the first one in a series, in which we develop a mathematical theory of enriched (braided) monoidal categories and their representations. In this work, we introduce the notion of the $E_0$-center ($E_1$-center or $E_2$-center) of an enriched (monoidal or braided monoidal) category, and compute the centers explicitly when the enriched (braided monoidal or monoidal) categories are obtained from the canonical constructions. These centers have important applications in the mathematical theory of gapless boundaries of 2+1D topological orders and that of topological phase transitions in physics. They also play very important roles in the higher representation theory, which is the focus of the second work in the series. △ Less

Submitted 28 April, 2024; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: 56 pages. published version

arXiv:2103.14415 [pdf]

Liquid Reconfigurable Stealth Window Constructed by Metamaterial Absorber

Authors: Xiangkun Kong, Weihao Lin, Xuemeng Wang, Lei Xing, Shunliu Jiang, Lingqi Kong

Abstract: In this paper, a liquid reconfigurable stealth window constructed by metamaterial absorber at microwave band is proposed. The stealth window consists of an anti-reflection glass with indium tin oxide (ITO) as resistive film and a liquid container made of polymethyl methacrylate (PMMA). Since the materials constituting the window are all transparent, the metamaterials that can be switched through t… ▽ More In this paper, a liquid reconfigurable stealth window constructed by metamaterial absorber at microwave band is proposed. The stealth window consists of an anti-reflection glass with indium tin oxide (ITO) as resistive film and a liquid container made of polymethyl methacrylate (PMMA). Since the materials constituting the window are all transparent, the metamaterials that can be switched through the liquid control system can always maintain high light transmission. The proposal can obtain a transmission passband from 2.3 GHz to 5 GHz with low insertion loss, especially at 2.45 GHz and 5 GHz with the insertion loss of the passband reach 0.51 and 0.99 , by alcohol drainage. It can also reflect electromagnetic waves at 2.45 GHz and absorb them from 4.5 GHz to 10.5 GHz with a strong absorptivity over 90% by alcohol injection, exhibiting the reconfigurable electromagnetic characteristic of switching between transmission state and absorption state. Furthermore, the proposed absorber shows its good transmission/absorption performance under different polarizations and obtains absorptivity over 90% when alcohol injection in an oblique incidence of 50°. Finally, the prototype window has been fabricated to demonstrate the validity of the proposed structure, which indicates that the proposal presents significant implications for smart stealth systems and WLAN communication that require switching of working states in a complex electromagnetic environment. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Showing 301–350 of 649 results for author: Kong, L