Search | arXiv e-print repository

Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

Authors: Abdullah F. Al-Battal, Soan T. M. Duong, Van Ha Tang, Quang Duc Tran, Steven Q. H. Truong, Chien Phan, Truong Q. Nguyen, Cheolhong An

Abstract: Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with resp… ▽ More Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with respect to surrounding tissue. Therefore, radiologists need to have an extensive experience to be able to identify and detect these lesions. Segmentation-based neural networks can assist radiologists with this task. Current state-of-the-art lesion segmentation networks use the encoder-decoder design paradigm based on the UNet architecture where the multi-phase CT scan volume is fed to the network as a multi-channel input. Although this approach utilizes information from all the phases and outperform single-phase segmentation networks, we demonstrate that their performance is not optimal and can be further improved by incorporating the learning from models trained on each single-phase individually. Our approach comprises three stages. The first stage identifies the regions within the liver where there might be lesions at three different scales (4, 8, and 16 mm). The second stage includes the main segmentation model trained using all the phases as well as a segmentation model trained on each of the phases individually. The third stage uses the multi-phase CT volumes together with the predictions from each of the segmentation models to generate the final segmentation map. Overall, our approach improves relative liver lesion segmentation performance by 1.6% while reducing performance variability across subjects by 8% when compared to the current state-of-the-art models. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.07382 [pdf, other]

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Authors: Chenyang An, Zhibo Chen, Qihao Ye, Emily First, Letian Peng, Jiayun Zhang, Zihan Wang, Sorin Lerner, **gbo Shang

Abstract: Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its traini… ▽ More Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Submitted to ACL on Feb.15th 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00523 [pdf, ps, other]

The algebra of hyperinterpolation-class on the sphere

Authors: Congpei An, Jiashu Ran

Abstract: This paper considers the so-called concept of hyperinterpolation-class, i.e., the set of all operators derived from the hyperinterpolation operator on the unit sphere. Concretely, we select four different elements in the hyperinterpolation-class, namely filtered hyperinterpolation, Lasso hyperinterpolation, hard thresholding hyperinterpolation and generalized hyperinterpolation introduced by Dai [… ▽ More This paper considers the so-called concept of hyperinterpolation-class, i.e., the set of all operators derived from the hyperinterpolation operator on the unit sphere. Concretely, we select four different elements in the hyperinterpolation-class, namely filtered hyperinterpolation, Lasso hyperinterpolation, hard thresholding hyperinterpolation and generalized hyperinterpolation introduced by Dai [10], to explore their algebraic properties on the sphere. Based on the idea of a discrete (semi) inner product, we propose the concepts of hyper self-adjoint operator, hyper projection operator and hyper algebra. Next, we prove generalized hyperinterpolation is hyper self-adjoint and commutative with hyperinterpolation. Then we establish corresponding results of the product, sum and difference of hyper projection operators. Last but not least, we present the results of ideals between hard thresholding hyperinterpolation and hyperinterpolation. We also give a preliminary result of involutions, and also introduce the concepts of hyper $C^{\ast}$-algebra and hyper homomorphism. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 17 pages, 1 figure

MSC Class: 41A10; 41A36; 47L20; 47L80

arXiv:2403.19927 [pdf, ps, other]

Parameter choice strategies for regularized least squares approximation of noisy continuous functions on the unit circle

Authors: Congpei An, Mou Cai

Abstract: In this paper, we consider a trigonometric polynomial reconstruction of continuous periodic functions from their noisy values at equidistant nodes of the unit circle by a regularized least squares method. We indicate that the constructed trigonometric polynomial can be determined in explicit due to the exactness of trapezoidal rule. Then a concrete error bound is derived based on the estimation of… ▽ More In this paper, we consider a trigonometric polynomial reconstruction of continuous periodic functions from their noisy values at equidistant nodes of the unit circle by a regularized least squares method. We indicate that the constructed trigonometric polynomial can be determined in explicit due to the exactness of trapezoidal rule. Then a concrete error bound is derived based on the estimation of the Lebesgue constant. In particular, we analyze three regularization parameter choice strategies: Morozov's discrepancy principal, L-curve and generalized cross-validation. Finally, numerical examples are given to perform that well chosen parameters by above strategy can improve approximation quality. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.17463 [pdf, other]

Training-Free Long-Context Scaling of Large Language Models

Authors: Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

Abstract: The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By… ▽ More The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By decomposing the attention computation for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens within the same chunk (Intra-Chunk) and across distinct chunks (Inter-Chunk), as well as integrates seamlessly with Flash Attention. In addition to its impressive extrapolation capability, DCA achieves performance on practical long-context tasks that is comparable to or even better than that of finetuned models. When compared with proprietary models, our training-free 70B model attains 94% of the performance of gpt-3.5-16k, indicating it is a viable open-source alternative. All code and data used in this work are released at \url{https://github.com/HKUNLP/ChunkLlama}. △ Less

Submitted 29 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.01173 [pdf, other]

doi 10.1103/PhysRevD.109.056005

$P$-wave states $T^-_{bb}$ from diquarks

Authors: Zu-Hang Lin, Chun-Sheng An, Cheng-Rong Deng

Abstract: We investigate the $P$-wave states $T^-_{bb}$ in the isospin singlet and three excited modes [excitation occurring in the diquark $[bb]^{s_1}_{c_1}$ ($ρ$-mode), antidiquark $[\bar{u}\bar{d}]^{s_2}_{c_2}$ ($r$-mode) or between them ($λ$-mode)] from diquarks in a quark model. We analyze the dynamical behaviors of the diquark $[bb]^{s_1}_{c_1}$, antidiquark $[\bar{u}\bar{d}]^{s_2}_{c_2}$ and their co… ▽ More We investigate the $P$-wave states $T^-_{bb}$ in the isospin singlet and three excited modes [excitation occurring in the diquark $[bb]^{s_1}_{c_1}$ ($ρ$-mode), antidiquark $[\bar{u}\bar{d}]^{s_2}_{c_2}$ ($r$-mode) or between them ($λ$-mode)] from diquarks in a quark model. We analyze the dynamical behaviors of the diquark $[bb]^{s_1}_{c_1}$, antidiquark $[\bar{u}\bar{d}]^{s_2}_{c_2}$ and their correlations in the states $T^-_{bb}$ by decomposing the interactions from various sources in the model. The absolute dominant color-spin configuration, more than $99\%$, in the $ρ$-mode with $1^1P_1$ is $[bb]^0_{\bar{\mathbf{3}}}[\bar{u}\bar{d}]^0_{\mathbf{3}}$. Its energy is lower by about $18$ MeV than the threshold $\bar{B}\bar{B}$ so that it can establish a compact bound state. The chromomagnetic and meson-exchange interactions in the antidiquark $[\bar{u}\bar{d}]^0_{\mathbf{3}}$ are responsible for its binding mechanism. Two other excited modes are higher than their respective threshold. The color configuration $\mathbf{6}\otimes\bar{\mathbf{6}}$ need to be handled discreetly in the tetraquark states. △ Less

Submitted 27 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: 7 pages, 1 figures, 4 tables, to be published in Phys. Rev. D

Journal ref: Phys. Rev. D 109, 056005 (2024)

arXiv:2310.14146 [pdf, other]

Cocaine Use Prediction with Tensor-based Machine Learning on Multimodal MRI Connectome Data

Authors: Anru R. Zhang, Ryan P. Bell, Chen An, Runshi Tang, Shana A. Hall, Cliburn Chan, Kareem Al-Khalil, Christina S. Meade

Abstract: This paper considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study utilized functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the datasets were transformed in… ▽ More This paper considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study utilized functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the datasets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from $275$ (individuals) $\times 2$ (fMRI and dMRI) $\times 246$ (ROIs) $\times 246$ (ROIs) to $275$ (individuals) $\times 2$ (fMRI and dMRI) $\times 6$ (clusters) $\times 6$ (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into 6 clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting dataset was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse. △ Less

Submitted 21 October, 2023; originally announced October 2023.

arXiv:2310.11451 [pdf, other]

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

Abstract: Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying… ▽ More Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer. △ Less

Submitted 8 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2310.05209 [pdf, other]

Scaling Laws of RoPE-based Extrapolation

Authors: Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin

Abstract: The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by replacing 10000, the rotary base of $θ_n={10000}^{-2n/d}$ in the original RoPE, with a larger value and providing longer fine-tuning text. In this work, we first observ… ▽ More The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by replacing 10000, the rotary base of $θ_n={10000}^{-2n/d}$ in the original RoPE, with a larger value and providing longer fine-tuning text. In this work, we first observe that fine-tuning a RoPE-based LLM with either a smaller or larger base in pre-training context length could significantly enhance its extrapolation performance. After that, we propose \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}}, a unified framework from the periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. In this process, we also explain the origin of the RoPE-based extrapolation issue by \textbf{\textit{critical dimension for extrapolation}}. Besides these observations and analyses, we achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B. △ Less

Submitted 13 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: 26 pages, 12 figures, Accepted by ICLR 2024

arXiv:2309.07652 [pdf]

Photochemical reaction enabling the engineering of photonic spin-orbit coupling in organic-crystal optical microcavities

Authors: Qian Liang, Xuekai Ma, Jiahuan Ren, Teng Long, Chunling Gu, Cunbin An, Hongbing Fu, Stefan Schumacher, Qing Liao

Abstract: The control and active manipulation of spin-orbit coupling (SOC) in photonic systems is fundamental in the development of modern spin optics and topological photonic devices. Here, we demonstrate the control of an artificial Rashba-Dresselhaus (RD) SOC mediated by photochemical reactions in a microcavity filled with an organic single-crystal of photochromic phase-change character. Splitting of the… ▽ More The control and active manipulation of spin-orbit coupling (SOC) in photonic systems is fundamental in the development of modern spin optics and topological photonic devices. Here, we demonstrate the control of an artificial Rashba-Dresselhaus (RD) SOC mediated by photochemical reactions in a microcavity filled with an organic single-crystal of photochromic phase-change character. Splitting of the circular polarization components of the optical modes induced by photonic RD SOC is observed experimentally in momentum space. By applying an ultraviolet light beam, we control the spatial molecular orientation through a photochemical reaction and with that we control the energies of the photonic modes. This way we realize a reversible conversion of spin-splitting of the optical modes with different energies, leading to an optically controlled switching between circularly and linearly polarized emission from our device. Our strategy of in situ and reversible engineering of SOC induced by a light field provides a promising approach to actively design and manipulate synthetic gauge fields towards future on-chip integration in photonics and topological photonic devices. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2307.11088 [pdf, other]

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

Authors: Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

Abstract: Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories. While proprietary models such as GPT-4 and Claude can largely preserve the reasoning ability in an extended context, open-source models are still progressing through the early stages of devel… ▽ More Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories. While proprietary models such as GPT-4 and Claude can largely preserve the reasoning ability in an extended context, open-source models are still progressing through the early stages of development. To bridge this gap, we propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs) addressing two key aspects: dataset construction and evaluation metrics. On the one hand, we build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs encompassing diverse question styles, domains, and input length (3k$\sim$200k tokens). On the other hand, we investigate the effectiveness in evalution metrics for LCLMs. Results show that popular n-gram matching metrics generally can not correlate well with human judgment, and thus we strongly advocate for length-instruction-enhanced (LIE) evaluation and employing LLM judges. We conducted a comprehensive study of 4 popular commercial LLMs and 12 open-source counterparts using the L-Eval benchmark. Our empirical findings offer useful insights into the study of LCLMs and lay the groundwork for the development of more principled evaluation of these models. △ Less

Submitted 4 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2305.18361 [pdf, other]

Deep learning network to correct axial and coronal eye motion in 3D OCT retinal imaging

Authors: Yiqian Wang, Alexandra Warter, Melina Cavichini, Varsha Alex, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, Cheolhong An

Abstract: Optical Coherence Tomography (OCT) is one of the most important retinal imaging technique. However, involuntary motion artifacts still pose a major challenge in OCT imaging that compromises the quality of downstream analysis, such as retinal layer segmentation and OCT Angiography. We propose deep learning based neural networks to correct axial and coronal motion artifacts in OCT based on a single… ▽ More Optical Coherence Tomography (OCT) is one of the most important retinal imaging technique. However, involuntary motion artifacts still pose a major challenge in OCT imaging that compromises the quality of downstream analysis, such as retinal layer segmentation and OCT Angiography. We propose deep learning based neural networks to correct axial and coronal motion artifacts in OCT based on a single volumetric scan. The proposed method consists of two fully-convolutional neural networks that predict Z and X dimensional displacement maps sequentially in two stages. The experimental result shows that the proposed method can effectively correct motion artifacts and achieve smaller error than other methods. Specifically, the method can recover the overall curvature of the retina, and can be generalized well to various diseases and resolutions. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.13667 [pdf, other]

Optimizing Non-Autoregressive Transformers with Contrastive Learning

Authors: Chenxin An, Jiangtao Feng, Fei Huang, Xipeng Qiu, Lingpeng Kong

Abstract: Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. They have achieved remarkable progress in machine translation as well as many other applications. However, a long-standing challenge for NATs is the learning of multi-modality data distribution, which is the main cause of the perf… ▽ More Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. They have achieved remarkable progress in machine translation as well as many other applications. However, a long-standing challenge for NATs is the learning of multi-modality data distribution, which is the main cause of the performance gap between NATs and ATs. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution. We derive contrastive constraints to stabilize the training process and integrate this resulting objective with the state-of-the-art NAT architecture DA-Transformer. Our model \method is examined on 3 different tasks, including machine translation, text summarization, and paraphrasing with 5 benchmarks. Results show that our approach outperforms previous non-autoregressive baselines by a significant margin and establishes new state-of-the-art results for non-autoregressive transformers on all the benchmarks. △ Less

Submitted 2 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.05863 [pdf, other]

Hybrid hyperinterpolation over general regions

Authors: Congpei An, Jiashu Ran, Alvise Sommariva

Abstract: We present an $\ell^2_2+\ell_1$-regularized discrete least squares approximation over general regions under assumptions of hyperinterpolation, named hybrid hyperinterpolation. Hybrid hyperinterpolation, using a soft thresholding operator and a filter function to shrink the Fourier coefficients approximated by a high-order quadrature rule of a given continuous function with respect to some orthonor… ▽ More We present an $\ell^2_2+\ell_1$-regularized discrete least squares approximation over general regions under assumptions of hyperinterpolation, named hybrid hyperinterpolation. Hybrid hyperinterpolation, using a soft thresholding operator and a filter function to shrink the Fourier coefficients approximated by a high-order quadrature rule of a given continuous function with respect to some orthonormal basis, is a combination of Lasso and filtered hyperinterpolations. Hybrid hyperinterpolation inherits features of them to deal with noisy data once the regularization parameter and the filter function are chosen well. We derive $L_2$ errors in theoretical analysis for hybrid hyperinterpolation to approximate continuous functions with noise data on sampling points. Numerical examples illustrate the theoretical results and show that well chosen regularization parameters can enhance the approximation quality over the unit-sphere and the union of disks. △ Less

Submitted 4 July, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 17 pages, 5 figures

MSC Class: 65D15; 65A05; 41A10; 33C5

arXiv:2301.09050 [pdf]

doi 10.1103/PhysRevMaterials.6.084803

Pressure-induced superconductivity in quasi-one-dimensional semimetal $\mathrm{Ta}_2 \mathrm{PdSe}_6$

Authors: Haiyang Yang, Yonghui Zhou, Liangyu Li, Zheng Chen, Zhuyi Zhang, Shuyang Wang, **g Wang, Xuliang Chen, Chao An, Ying Zhou, Min Zhang, Ranran Zhang, Xiangde Zhu, Lili Zhang, ** Yang, Zhaorong Yang

Abstract: Here we report the discovery of pressure-induced superconductivity in quasi-one-dimensional $\mathrm{Ta}_2 \mathrm{PdSe}_6$, through a combination of electrical transport, synchrotron x-ray diffraction, and theoretical calculations. Our transport measurements show that the superconductivity appears at a critical pressure $P_{\mathrm{c}} \sim 18.3$ GPa and is robust upon further compression up to… ▽ More Here we report the discovery of pressure-induced superconductivity in quasi-one-dimensional $\mathrm{Ta}_2 \mathrm{PdSe}_6$, through a combination of electrical transport, synchrotron x-ray diffraction, and theoretical calculations. Our transport measurements show that the superconductivity appears at a critical pressure $P_{\mathrm{c}} \sim 18.3$ GPa and is robust upon further compression up to $62.6$ GPa. The estimated upper critical field $μ_0 H_{\mathrm{c} 2}(0)$ in the pressurized $\mathrm{Ta}_2 \mathrm{PdSe}_6$ is much lower than the Pauli limiting field, in contrast to the case in its isostructural analogs $M_2 \mathrm{Pd}_{\mathrm{x}} X_5$ $(M=\mathrm{Nb}$, Ta; $X=\mathrm{S}, \mathrm{Se})$. Concomitant with the occurrence of superconductivity, anomalies in pressuredependent transport properties are observed, including sign reversal of Hall coefficient, abnormally enhanced resistance, and dramatically suppressed magnetoresistance. Meanwhile, room-temperature synchrotron x-ray diffraction experiments reveal the stability of the pristine monoclinic structure (space group $C 2 / m$ ) upon compression. Combined with the density functional theory calculations, we argue that a pressure-induced Lifshitz transition could be the electronic origin of the emergent superconductivity in $\mathrm{Ta}_2 \mathrm{PdSe}_6$. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: 7 pages, 7 figures

Journal ref: Physical Review Materials 6, 084803 (2022)

arXiv:2301.07365 [pdf, other]

doi 10.1140/epjc/s10052-023-11333-0

Production of $X_b$ via $Υ(5S, 6S)$ radiative decays

Authors: Xiao-Yun Wang, Zu-Xin Cai, Gang Li, Shi-Dong Liu, Chun-Sheng An, Ju-Jun Xie

Abstract: We investigate the production of $X_b$ in the process $Υ(5S,6S)\to γX_b$, where $X_b$ is assumed to be a $B {\bar B}^*$ molecular state. Two kinds of meson loops of $B^{(*)}{\bar B}^{(*)}$ and $B_1^{\prime}{\bar B}^{(*)}$ were considered. To explore the rescattering mechanism, we calculated the relevant branching ratios using the effective Lagrangian based on the heavy quark symmetry. The branchin… ▽ More We investigate the production of $X_b$ in the process $Υ(5S,6S)\to γX_b$, where $X_b$ is assumed to be a $B {\bar B}^*$ molecular state. Two kinds of meson loops of $B^{(*)}{\bar B}^{(*)}$ and $B_1^{\prime}{\bar B}^{(*)}$ were considered. To explore the rescattering mechanism, we calculated the relevant branching ratios using the effective Lagrangian based on the heavy quark symmetry. The branching ratios for the $Υ(5S\,,6S) \to γX_b$ were found to be at the orders of $10^{-7} \sim 10^{-6}$. Such sizeable branching ratios might be accessible at BelleII, which would provide important clues to the inner structures of the exotic state $X_b$. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 9 pages, 2 figures, comments are welcome

arXiv:2301.05634 [pdf, ps, other]

doi 10.1103/PhysRevC.107.025205

The quark orbital angular momentum of ground state octet baryons

Authors: **g-Feng Li, Cheng Chen, Gang Li, Chun-Sheng An, Cheng-Rong Deng, Ju-Jun Xie

Abstract: Here we study the quark orbital angular momentum of the ground octet baryons employing an extended chiral constituent quark model, within which the baryon wave functions are taken to be superposition of the traditional $qqq$ and the $qqqq\bar{q}$ higher Fock components. Coupling between the two configurations is estimated using the $^3P_{0}$ quark-antiquark creation mechanism, and the correspondin… ▽ More Here we study the quark orbital angular momentum of the ground octet baryons employing an extended chiral constituent quark model, within which the baryon wave functions are taken to be superposition of the traditional $qqq$ and the $qqqq\bar{q}$ higher Fock components. Coupling between the two configurations is estimated using the $^3P_{0}$ quark-antiquark creation mechanism, and the corresponding coupling strength is determined by fitting the sea flavor asymmetry of the nucleon. The obtained numerical results show that the quark angular momentum of the nucleon, $Σ$, $Λ$ and $Ξ$ hyperons are in the range $0.10$-$0.30$. In addition, the quark angular momentum of all the hyperons are a little bit smaller than that of the nucleon. And the octet baryons spin fractions taken by the intrinsic quark orbital angular momentum could be up to $60\%$ in present model. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 9 pages, 3 figures, 3 tables

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.06624 [pdf, ps, other]

Regularized Barzilai-Borwein method

Authors: Congpei An, Xin Xu

Abstract: We develop a novel stepsize based on \BB method for solving some challenging optimization problems efficiently, named regularized \BB (RBB) stepsize. We indicate that RBB stepsize is the close solution to a $\ell_{2}^{2}$-regularized least squares problem. When the regularized item vanishes, the RBB stepsize reduces to the original \BB stepsize. RBB stepsize includes a class of valid stepsizes, su… ▽ More We develop a novel stepsize based on \BB method for solving some challenging optimization problems efficiently, named regularized \BB (RBB) stepsize. We indicate that RBB stepsize is the close solution to a $\ell_{2}^{2}$-regularized least squares problem. When the regularized item vanishes, the RBB stepsize reduces to the original \BB stepsize. RBB stepsize includes a class of valid stepsizes, such as another version of \BB stepsize. The global convergence of the corresponding RBB algorithm is proved in solving convex quadratic optimization problems. One scheme for adaptively generating regularization parameters was proposed, named adaptive two-step parameter. An enhanced RBB stepsize is used for solving quadratic and general optimization problems more efficiently. RBB stepsize could overcome the instability of BB stepsize in many ill-conditioned optimization problems. Moreover, RBB stepsize is more robust than BB stepsize in numerical experiments. Numerical examples show the advantage of using the proposed stepsize to solve some challenging optimization problems vividly. △ Less

Submitted 15 April, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

arXiv:2210.13896 [pdf]

doi 10.1016/j.matt.2023.07.018

Two Distinct Charge Density Wave Orders and Emergent Superconductivity in Pressurized CuTe

Authors: Shuyang Wang, Qing Wang, Chao An, Yonghui Zhou, Ying Zhou, Xuliang Chen, Ning Hao, Zhaorong Yang

Abstract: The discovery of multiple charge-density-wave (CDW) orders in superconducting cuprates and Kagome CsV3Sb5 has offered a unique milieu for studying the interplay of CDW and superconductivity and altered our perspective on their nature. Here, we report a high-pressure study of quasi-one-dimensional CDW material CuTe through ultralow-temperature (400 mK) electrical transport and temperature-dependent… ▽ More The discovery of multiple charge-density-wave (CDW) orders in superconducting cuprates and Kagome CsV3Sb5 has offered a unique milieu for studying the interplay of CDW and superconductivity and altered our perspective on their nature. Here, we report a high-pressure study of quasi-one-dimensional CDW material CuTe through ultralow-temperature (400 mK) electrical transport and temperature-dependent Raman spectroscopy measurements and first-principles calculations. We provide solid evidence that the pristine CDW order (CDW1) transforms into a distinct CDW order (CDW2) at ~6.5 GPa. Calculations show that the driving force of CDW1 is due to the nesting effect and that of CDW2 probably arises from the electronic correlated interaction. Strikingly, pressure-induced superconductivity is observed with a dome-like phase diagram and its transition displays an extraordinary broadening along with the crossover from CDW1 to CDW2. These results demonstrate that pressurized CuTe provides a promising playground for understanding the intricated interplay of multiple CDWs and superconductivity. △ Less

Submitted 5 October, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: 21 pages, 4 figures

Journal ref: Matter 6, 3526 (2023)

arXiv:2210.06783 [pdf, ps, other]

doi 10.1103/PhysRevD.106.074026

Radiative decays of the neutral $Z_c(3900)$ and $Z_c(4020)$

Authors: Xiao-Yun Wang, Gang Li, Chun-Sheng An, Ju-Jun Xie

Abstract: We study the radiative decays $Z_c(3900)/Z_c(4020) \to γχ_{cJ}(γχ_{cJ}^\prime)$ ($J=0, 1, 2$), with the assumption that the $Z_c(3900)$ and $Z_c(4020)$ couple strongly to $D\bar D^* +c.c$ and $D^*{\bar D}^*$ channel, respectively. By considering the contributions of intermediate charmed mesons triangle loops within an effective Lagrangian approach, it is shown that the calculated partial widths of… ▽ More We study the radiative decays $Z_c(3900)/Z_c(4020) \to γχ_{cJ}(γχ_{cJ}^\prime)$ ($J=0, 1, 2$), with the assumption that the $Z_c(3900)$ and $Z_c(4020)$ couple strongly to $D\bar D^* +c.c$ and $D^*{\bar D}^*$ channel, respectively. By considering the contributions of intermediate charmed mesons triangle loops within an effective Lagrangian approach, it is shown that the calculated partial widths of $Z_c(3900) \to γχ_{cJ}$ are about a few hundreds keVs, while the obtained partial widths $Z_c(4020) \to γχ_{cJ}$ are about tens of keVs. The predicted partial widths of $Z_c(3900)\toγχ_{c0,1}^\prime$ are less than 1 keV, which mainly due to the very small phase space. For $Z_c(4020)\toγχ_{c0,2}^\prime$, the calculated partial widths are usually smaller than 1 keV. For the $Z_c(4020)\toγχ_{c1}^\prime$ process, the obtained partial widths can reach up to the order of 10 keV. Furthermore, the dependence of these ratios between different decay modes on the masses of $Z_c(3900)$ or $Z_c(4020)$ are also investigated, which may be a good quantity for the experiments. It is hoped that these calculations here could be tested by future experiments. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 9 pages, 6 figures. Accepted by Physical Review D

arXiv:2210.04841 [pdf, ps, other]

doi 10.1103/PhysRevD.106.096023

Color flux-tube nature of the states $T_{cs}(2900)$ and $T^a_{c\bar{s}}(2900)$

Authors: Jia Wei, Yi-Heng Wang, Chun-Sheng An, Cheng-Rong Deng

Abstract: Inspired by the states $T_{cs0}(2900)^0$, $T_{cs1}(2900)^0$, $T^a_{c\bar{s}0}(2900)^{0}$ and $T^a_{c\bar{s}0}(2900)^{++}$ reported by the LHCb Collaboration, we carry out a systematical investigation on the properties of the ground and $P$-wave states $[cs][\bar{u}\bar{d}]$ and $[cu][\bar{s}\bar{d}]$ with various spin, isospin or $U$-spin, and color combinations in a multiquark color flux-tube mod… ▽ More Inspired by the states $T_{cs0}(2900)^0$, $T_{cs1}(2900)^0$, $T^a_{c\bar{s}0}(2900)^{0}$ and $T^a_{c\bar{s}0}(2900)^{++}$ reported by the LHCb Collaboration, we carry out a systematical investigation on the properties of the ground and $P$-wave states $[cs][\bar{u}\bar{d}]$ and $[cu][\bar{s}\bar{d}]$ with various spin, isospin or $U$-spin, and color combinations in a multiquark color flux-tube model. Matching our results with the spin-parity and mass of the states $T_{cs0}(2900)^0$ and $T_{cs1}(2900)^0$, we can describe them as the compact states $[cs][\bar{u}\bar{d}]$ with $I(J^{P})=1(0^+)$ and $0(1^-)$ in the model, respectively. The ground state $T_{cs0}(2900)^0$ is mainly made of strongly overlapped an axial-vector $[cs]_{\bar{\mathbf{3}}_c}$ and an axial-vector $[\bar{u}\bar{d}]_{\mathbf{3}_c}$. The $P$-wave state $T_{cs1}(2900)^0$ is dominantly consisted of a gradually separated scalar or axial vector $[cs]_{\bar{\mathbf{3}}_c}$ and a scalar $[\bar{u}\bar{d}]_{\mathbf{3}_c}$ in the shape of a dumbbell. Supposing the states $T^a_{c\bar{s}0}(2900)^{0}$ and $T^a_{c\bar{s}0}(2900)^{++}$ belong to the same isospin triplet, the mass of the state $\left [[cu]_{\bar{\mathbf{3}}_c}[\bar{s}\bar{d}]_ {\mathbf{3}_c}\right ]_{\mathbf{1}_c}$ with symmetrical $U$-spin and $J^P=0^+$ is highly consistent with that of the states $T^a_{c\bar{s}0}(2900)^{0}$ and $T^a_{c\bar{s}0}(2900)^{++}$ in the model. After coupling two color configurations, the state $[cu][\bar{s}\bar{d}]$ is slightly lighter than the states $T^a_{c\bar{s}0}(2900)^{0}$ and $T^a_{c\bar{s}0}(2900)^{++}$. In addition, we also discuss the properties of other states in the model. △ Less

Submitted 28 November, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 9 pages, 4 tables, comments are welcome

Journal ref: Phys. RevD.106.096023 (2022)

arXiv:2210.04204 [pdf, ps, other]

Lasso trigonometric polynomial approximation for periodic function recovery in equidistant points

Authors: Congpei An, Mou Cai

Abstract: In this paper, we propose a fully discrete soft thresholding trigonometric polynomial approximation on $[-π,π],$ named Lasso trigonometric interpolation. This approximation is an $\ell_1$-regularized discrete least squares approximation under the same conditions of classical trigonometric interpolation on an equidistant grid. Lasso trigonometric interpolation is sparse and meanwhile it is an effic… ▽ More In this paper, we propose a fully discrete soft thresholding trigonometric polynomial approximation on $[-π,π],$ named Lasso trigonometric interpolation. This approximation is an $\ell_1$-regularized discrete least squares approximation under the same conditions of classical trigonometric interpolation on an equidistant grid. Lasso trigonometric interpolation is sparse and meanwhile it is an efficient tool to deal with noisy data. We theoretically analyze Lasso trigonometric interpolation for continuous periodic function. The principal results show that the $L_2$ error bound of Lasso trigonometric interpolation is less than that of classical trigonometric interpolation, which improved the robustness of trigonometric interpolation. This paper also presents numerical results on Lasso trigonometric interpolation on $[-π,π]$, with or without the presence of data errors. △ Less

Submitted 21 September, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: 18 pages, 5 figures

arXiv:2209.14634 [pdf, other]

Hard thresholding hyperinterpolation over general regions

Authors: Congpei An, Jia-Shu Ran

Abstract: This paper proposes a novel variant of hyperinterpolation, called hard thresholding hyperinterpolation. This approximation scheme of degree $n$ leverages a hard thresholding operator to filter all hyperinterpolation coefficients which approximate the Fourier coefficients of a continuous function by a quadrature rule with algebraic exactness $2n$. We prove that hard thresholding hyperinterpolation… ▽ More This paper proposes a novel variant of hyperinterpolation, called hard thresholding hyperinterpolation. This approximation scheme of degree $n$ leverages a hard thresholding operator to filter all hyperinterpolation coefficients which approximate the Fourier coefficients of a continuous function by a quadrature rule with algebraic exactness $2n$. We prove that hard thresholding hyperinterpolation is the unique solution to an $\ell_0$-regularized weighted discrete least squares approximation problem. Hard thresholding hyperinterpolation is not only idempotent and commutative with hyperinterpolation, but also satisfies the Pythagorean theorem. By estimating the reciprocal of the Christoffel function, we demonstrate that the upper bound of the uniform norm of hard thresholding hyperinterpolation operator is not greater than that of hyperinterpolation operator. Hard thresholding hyperinterpolation possesses denoising and basis selection abilities as Lasso hyperinterpolation. To judge the denoising effects of hard thresholding and Lasso hyperinterpolations, this paper yields a criterion that combines the regularization parameter and the product of noise coefficients and signs of hyperinterpolation coefficients. Numerical examples on the spherical triangle and the cube demonstrate the denoising performance of hard thresholding hyperinterpolation. △ Less

Submitted 26 November, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: 19 pages, 7 figures

MSC Class: 65D15; 65D05; 41A10; 33C52

arXiv:2209.14569 [pdf, other]

COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization

Authors: Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuan**g Huang, Xipeng Qiu

Abstract: Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives. However, the output summary is always evaluated from summary-level which leads to the inconsistency in training and evaluation. In this paper, we propose a Contrastive Learning based re-ranking framework for one-stage summarization called COLO. By m… ▽ More Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives. However, the output summary is always evaluated from summary-level which leads to the inconsistency in training and evaluation. In this paper, we propose a Contrastive Learning based re-ranking framework for one-stage summarization called COLO. By modeling a contrastive objective, we show that the summarization model is able to directly generate summaries according to the summary-level score without additional modules and parameters. Extensive experiments demonstrate that COLO boosts the extractive and abstractive results of one-stage systems on CNN/DailyMail benchmark to 44.58 and 46.33 ROUGE-1 score while preserving the parameter efficiency and inference efficiency. Compared with state-of-the-art multi-stage systems, we save more than 100 GPU training hours and obtaining 3~8 speed-up ratio during inference while maintaining comparable results. △ Less

Submitted 19 April, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted by COLING 2022

arXiv:2209.13856 [pdf, ps, other]

doi 10.1140/epjc/s10052-022-11111-4

Spectrum of the S-wave fully-heavy tetraquark states

Authors: Jie Zhang, **-Bao Wang, Gang Li, Chun-Sheng An, Cheng-Rong Deng, Ju-Jun Xie

Abstract: In present work, spectrum of the $S$-wave fully-heavy tetraquark states $QQ\bar{Q}\bar{Q}$ ($Q=c,b$), i.e., $cc\bar{c}\bar{c}$, $bb\bar{b}\bar{b}$, $cc\bar{b}\bar{b}$/$bb\bar{c}\bar{c}$, $bc\bar{c}\bar{c}$/ $cc\bar{b}\bar{c}$, $bb\bar{c}\bar{b}$/$cb\bar{b}\bar{b}$, and $bc\bar{b}\bar{c}$ are systematically investigated within an nonrelativistic constituent quark model, in which the Instanton-induc… ▽ More In present work, spectrum of the $S$-wave fully-heavy tetraquark states $QQ\bar{Q}\bar{Q}$ ($Q=c,b$), i.e., $cc\bar{c}\bar{c}$, $bb\bar{b}\bar{b}$, $cc\bar{b}\bar{b}$/$bb\bar{c}\bar{c}$, $bc\bar{c}\bar{c}$/ $cc\bar{b}\bar{c}$, $bb\bar{c}\bar{b}$/$cb\bar{b}\bar{b}$, and $bc\bar{b}\bar{c}$ are systematically investigated within an nonrelativistic constituent quark model, in which the Instanton-induced and one-gluon-exchange interactions are taken into account as the residual spin-dependent hyperfine interaction. Our results show that the states with $cc\bar{c}\bar{c}$ and $bb\bar{b}\bar{b}$ components could be located around $ 6500$ MeV and $ 19200$ MeV, respectively. Based on our calculations, the new $X(6900)$ state observed by LHCb may be not a ground $cc\bar{c}\bar{c}$ tetraquark state, while it could be an orbitally or radially excited state of $cc\bar{c}\bar{c}$ system. On the other hand, the recently reported $X(6600)$ state by CMS and ATLAS can be explained as a ground $cc\bar{c}\bar{c}$ tetraquark state with spin-parity $J^{PC} =0^{++}$. △ Less

Submitted 12 December, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: Version to appear in Eur. Phys. J. C

arXiv:2209.13786 [pdf, other]

A Parameter-free Nonconvex Low-rank Tensor Completion Model for Spatiotemporal Traffic Data Recovery

Authors: Yang He, Yuheng Jia, Liyang Hu, Chengchuan An, Zhenbo Lu, **gxin Xia

Abstract: Traffic data chronically suffer from missing and corruption, leading to accuracy and utility reduction in subsequent Intelligent Transportation System (ITS) applications. Noticing the inherent low-rank property of traffic data, numerous studies formulated missing traffic data recovery as a low-rank tensor completion (LRTC) problem. Due to the non-convexity and discreteness of the rank minimization… ▽ More Traffic data chronically suffer from missing and corruption, leading to accuracy and utility reduction in subsequent Intelligent Transportation System (ITS) applications. Noticing the inherent low-rank property of traffic data, numerous studies formulated missing traffic data recovery as a low-rank tensor completion (LRTC) problem. Due to the non-convexity and discreteness of the rank minimization in LRTC, existing methods either replaced rank with convex surrogates that are quite far away from the rank function or approximated rank with nonconvex surrogates involving many parameters. In this study, we proposed a Parameter-Free Non-Convex Tensor Completion model (TC-PFNC) for traffic data recovery, in which a log-based relaxation term was designed to approximate tensor algebraic rank. Moreover, previous studies usually assumed the observations are reliable without any outliers. Therefore, we extended the TC-PFNC to a robust version (RTC-PFNC) by modeling potential traffic data outliers, which can recover the missing value from partial and corrupted observations and remove the anomalies in observations. The numerical solutions of TC-PFNC and RTC-PFNC were elaborated based on the alternating direction multiplier method (ADMM). The extensive experimental results conducted on four real-world traffic data sets demonstrated that the proposed methods outperform other state-of-the-art methods in both missing and corrupted data recovery. The code used in this paper is available at: https://github.com/YoungHe49/T-ITSPFNC. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 10 pages, 7 figures

arXiv:2209.12206 [pdf, ps, other]

Investigations on the charmless decays of $X(3872)$ in intermediate meson loops model

Authors: Yan Wang, Qi Wu, Gang Li, Wen-Hua Qin, Xiao-Hai Liu, Chun-Sheng An, Ju-Jun Xie

Abstract: The charmless decay processes of $X(3872)$ provide us a good platform to study the nature and the decay mechanism of $X(3872)$. Based on a molecular nature of $X(3872)$ as a $\bar{D}D^*$ bound state, we have investigated the charmless decays $X(3872) \to VV$ and $VP$ via intermediate $D^*{\bar D} +c.c.$ meson loops, where $V$ and $P$ stand for light vector and pseudoscalar mesons, respectively. We… ▽ More The charmless decay processes of $X(3872)$ provide us a good platform to study the nature and the decay mechanism of $X(3872)$. Based on a molecular nature of $X(3872)$ as a $\bar{D}D^*$ bound state, we have investigated the charmless decays $X(3872) \to VV$ and $VP$ via intermediate $D^*{\bar D} +c.c.$ meson loops, where $V$ and $P$ stand for light vector and pseudoscalar mesons, respectively. We discuss three cases, i.e., pure neutral components ($θ=0$), isospin singlet ($θ=π/4$) and neutral components dominant ($θ= π/6$), where $θ$ is a phase angle describing the proportion of neutral and charged constituents. The proportion of neutral and charged constituent have an influence on the decay widths of $X(3872) \to VV$ and $VP$. With the coupling constant of $X(3872)$ to the $\bar{D}D^*$ channel obtained under the molecule ansatz of $X(3872)$ resonance, the predicted decay widths of $X(3872)\rightarrow VV$ are about tens of keVs, while the decay width can reach a few hundreds of keVs for $X(3872)\to VP$. The dependence of these ratios between different decay modes of $X(3872)\to VV$ and $X(3872)\to VP$ to the mixing angle $θ$ is also investigated. It is expected that the theoretical calculations here can be tested by future experiments. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: 9 pages, 8 figures, submitted to Phys. Rev. D

arXiv:2209.11012 [pdf, ps, other]

Bypassing the quadrature exactness assumption of hyperinterpolation on the sphere

Authors: Congpei An, Hao-Ning Wu

Abstract: This paper focuses on the approximation of continuous functions on the unit sphere by spherical polynomials of degree $n$ via hyperinterpolation. Hyperinterpolation of degree $n$ is a discrete approximation of the $L^2$-orthogonal projection of degree $n$ with its Fourier coefficients evaluated by a positive-weight quadrature rule that exactly integrates all spherical polynomials of degree at most… ▽ More This paper focuses on the approximation of continuous functions on the unit sphere by spherical polynomials of degree $n$ via hyperinterpolation. Hyperinterpolation of degree $n$ is a discrete approximation of the $L^2$-orthogonal projection of degree $n$ with its Fourier coefficients evaluated by a positive-weight quadrature rule that exactly integrates all spherical polynomials of degree at most $2n$. This paper aims to bypass this quadrature exactness assumption by replacing it with the Marcinkiewicz--Zygmund property proposed in a previous paper. Consequently, hyperinterpolation can be constructed by a positive-weight quadrature rule (not necessarily with quadrature exactness). This scheme is referred to as unfettered hyperinterpolation. This paper provides a reasonable error estimate for unfettered hyperinterpolation. The error estimate generally consists of two terms: a term representing the error estimate of the original hyperinterpolation of full quadrature exactness and another introduced as compensation for the loss of exactness degrees. A guide to controlling the newly introduced term in practice is provided. In particular, if the quadrature points form a quasi-Monte Carlo (QMC) design, then there is a refined error estimate. Numerical experiments verify the error estimates and the practical guide. △ Less

Submitted 4 October, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 22 pages, 7 figures

MSC Class: 65D32; 41A10; 41A55; 42C10; 33C55

arXiv:2207.05261 [pdf, other]

Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique

Authors: Changnam An, Eunkyung Han, Dongmyeong Noh, Ohkyoon Kwon, Sumi Lee, Hyunshim Han

Abstract: We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expression… ▽ More We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.10104 [pdf, ps, other]

A Generalization of Graham's Estimate on the Barban-Vehov Problem

Authors: Chen An

Abstract: Suppose $\{ λ_d\}$ are Selberg's sieve weights and $1 \le w < y \le x$. Graham's estimate on the Barban-Vehov problem shows that $\sum_{1 \le n \le x} (\sum_{d|n} λ_d)^2 = \frac{x}{\log(y/w)} + O(\frac{x}{\log^2(y/w)})$. We prove an analogue of this estimate for a sum over ideals of an arbitrary number field $k$. Our asymptotic estimate remains the same; the only difference is that the effective e… ▽ More Suppose $\{ λ_d\}$ are Selberg's sieve weights and $1 \le w < y \le x$. Graham's estimate on the Barban-Vehov problem shows that $\sum_{1 \le n \le x} (\sum_{d|n} λ_d)^2 = \frac{x}{\log(y/w)} + O(\frac{x}{\log^2(y/w)})$. We prove an analogue of this estimate for a sum over ideals of an arbitrary number field $k$. Our asymptotic estimate remains the same; the only difference is that the effective error term may depend on arithmetics of $k$. Our innovation involves multiple counting results on ideals instead of integers. Notably, some of the results are nontrivial generalizations. Furthermore, we prove a corollary that leads to a new zero density estimate. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: 23 pages

MSC Class: 11N45

arXiv:2206.02335 [pdf]

Super-resolution multicolor fluorescence microscopy enabled by an apochromatic super-oscillatory lens with extended depth-of-focus

Authors: Wenli Li, Pei He, Yulong Fan, Yangtao Du, Bo Gao, Zhiqin Chu, Chengxu An, Dangyuan Lei, Weizheng Yuan, Yiting Yu

Abstract: Multicolor super-resolution imaging remains an intractable challenge for both far-field and near-field based super-resolution techniques. Planar super-oscillatory lens (SOL), a far-field subwavelength-focusing diffractive lens device, holds great potential for achieving sub-diffraction-limit imaging at multiple wavelengths. However, conventional SOL devices suffer from a numerical aperture (NA) re… ▽ More Multicolor super-resolution imaging remains an intractable challenge for both far-field and near-field based super-resolution techniques. Planar super-oscillatory lens (SOL), a far-field subwavelength-focusing diffractive lens device, holds great potential for achieving sub-diffraction-limit imaging at multiple wavelengths. However, conventional SOL devices suffer from a numerical aperture (NA) related intrinsic tradeoff among the depth of focus (DoF), chromatic dispersion and focus spot size, being an essential characteristics of common diffractive optical elements. Typically, the limited DoF and significant chromatism associated with high NA can lead to unfavorable degradation of image quality although increasing NA imporves the resolution. Here, we apply a multi-objective genetic algorithm (GA) optimization approach to design an apochromatic binary-phase SOL that generates axially jointed multifoci concurrently having prolonged DoF, customized working distance (WD) and suppressed side-lobes yet minimized main-lobe size, optimizing the aforementioned NA-dependent tradeoff. Experimental implementation of this GA-optimized SOL demonstrates simultaneous focusing of blue, green and red light beams into an optical needle half of the incident wavelength in diameter at 428 um WD, resulting in an ultimate resolution better than one third of the incident wavelength in the lateral dimension. By integrating this apochromatic SOL device with a commercial fluorescence microscope, we employ the optical needle to perform, for the first time, three-dimensional super-resolution multicolor fluorescence imaging of the unseen fine structure of neurons at one go. The present study provides not only a practical route to far-field multicolor super-resolution imaging but also a viable approach for constructing imaging systems avoiding complex sample positioning and unfavorable photobleaching. △ Less

Submitted 5 June, 2022; originally announced June 2022.

arXiv:2205.14690 [pdf, other]

CoNT: Contrastive Neural Text Generation

Authors: Chenxin An, Jiangtao Feng, Kai Lv, Lingpeng Kong, Xipeng Qiu, Xuan**g Huang

Abstract: Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this… ▽ More Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms the conventional training framework on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1.50 BLEU on machine translation and 1.77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation. △ Less

Submitted 3 February, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2205.14419 [pdf, ps, other]

doi 10.1103/PhysRevC.105.065204

Investigations on the flavor-dependent axial charges of the octet baryons

Authors: Rui Qi, **-Bao Wang, Gang Li, Chun-Sheng An, Cheng-Rong Deng, Ju-Jun Xie

Abstract: We have investigated the axial charges of the ground octet baryons within the extended chiral constituent quark model, where all the possible compact five-quark Fock components $qqq(q\bar{q}) (q=u, d, s)$ in the baryons are considered. The transition couplings between the three- and five-quark components in the baryons are assumed to be via the $^{3}P_{0}$ mechanism, which could reproduce the sea… ▽ More We have investigated the axial charges of the ground octet baryons within the extended chiral constituent quark model, where all the possible compact five-quark Fock components $qqq(q\bar{q}) (q=u, d, s)$ in the baryons are considered. The transition couplings between the three- and five-quark components in the baryons are assumed to be via the $^{3}P_{0}$ mechanism, which could reproduce the sea asymmetry in proton very well. The numerical results for the flavor-dependent axial charges of the octet baryons are comparable to those predicted by other theoretical approaches. It is shown that the singlet axial charges of the octet baryons, which should indicate total baryons spin arising from the spin of the quarks, fall in the range $0.45-0.75$ in present model. This is in consistent with the predictions by lattice QCD and chiral perturbation theory. It's also very interesting that the light quarks spin $Δu$ and $Δd$ in the $Λ$ baryon are of small but negative values, which exactly vanish in the traditional three-quark model. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: Submitted to Phys. Rev. C. arXiv admin note: text overlap with arXiv:2106.00866

arXiv:2205.08218 [pdf, ps, other]

Is hyperinterpolation efficient in the approximation of singular and oscillatory functions?

Authors: Congpei An, Hao-Ning Wu

Abstract: Singular and oscillatory functions feature in numerous applications. The high-accuracy approximation of such functions shall greatly help us develop high-order methods for solving applied mathematics problems. This paper demonstrates that hyperinterpolation, a discrete projection method with coefficients obtained by evaluating the $L^2$ orthogonal projection coefficients using some numerical integ… ▽ More Singular and oscillatory functions feature in numerous applications. The high-accuracy approximation of such functions shall greatly help us develop high-order methods for solving applied mathematics problems. This paper demonstrates that hyperinterpolation, a discrete projection method with coefficients obtained by evaluating the $L^2$ orthogonal projection coefficients using some numerical integration methods, may be inefficient for approximating singular and oscillatory functions. A relatively large amount of numerical integration points are necessary for satisfactory accuracy. Moreover, in the spirit of product-integration, we propose an efficient modification of hyperinterpolation for such approximation. The proposed approximation scheme, called efficient hyperinterpolation, achieves satisfactory accuracy with fewer numerical integration points than the original scheme. The implementation of the new approximation scheme is relatively easy. Theorems are also given to explain the outperformance of efficient hyperinterpolation over the original scheme in such approximation, with the functions assumed to belong to $L^1(Ω)$, $L^2(Ω)$, and $\mathcal{C}(Ω)$ spaces, respectively. These theorems, as well as numerical experiments on the interval and the sphere, show that efficient hyperinterpolation has better accuracy in such approximation than the original one when the amount of numerical integration points is limited. △ Less

Submitted 19 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: A new theorem, Theorem 2.1, is presented

MSC Class: 65D32; 41A10; 41A55

arXiv:2204.13320 [pdf, ps, other]

doi 10.1140/epjc/s10052-022-10673-7

The low-lying hidden- and double-charm tetraquark states in a constituent quark model with Instanton-induced Interaction

Authors: **-Bao Wang, Gang Li, Chun-Sheng An, Cheng-Rong Deng, Ju-Jun Xie

Abstract: Spectrum of the low-lying hidden- and double-charm tetraquark states are investigated in a nonrelativistic quark potential model, where the Instanton-induced interaction is taken as the residual spin-dependent hyperfine interaction between quarks. The model parameters are fixed by fitting the spectrum of the ground hadron states. Our numerical results show that masses of several presently studied… ▽ More Spectrum of the low-lying hidden- and double-charm tetraquark states are investigated in a nonrelativistic quark potential model, where the Instanton-induced interaction is taken as the residual spin-dependent hyperfine interaction between quarks. The model parameters are fixed by fitting the spectrum of the ground hadron states. Our numerical results show that masses of several presently studied tetraquark states are close to those of the experimentally observed candidates of exotic meson, which indicates that the corresponding compact tetraquark components may take considerable probabilities in those observed exotic states. △ Less

Submitted 5 August, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Version to appear in Eur. Phys. J. C

arXiv:2203.16943 [pdf]

doi 10.1103/PhysRevB.106.104512

Pressure-induced superconductivity in kagome single crystal Pd3P2S8

Authors: Ying Zhou, Xinyi He, Shuyang Wang, **g Wang, Xuliang Chen, Yonghui Zhou, Chao An, Min Zhang, Zhitao Zhang, Zhaorong Yang

Abstract: Kagome lattice offers unique opportunities for the exploration of unusual quantum states of correlated electrons. Here, we report on the observation of superconductivity in a kagome single crystal Pd3P2S8 when a semiconducting to metallic transition is driven by pressure. High-pressure resistance measurements show that the metallization and superconductivity are simultaneously observed at about 11… ▽ More Kagome lattice offers unique opportunities for the exploration of unusual quantum states of correlated electrons. Here, we report on the observation of superconductivity in a kagome single crystal Pd3P2S8 when a semiconducting to metallic transition is driven by pressure. High-pressure resistance measurements show that the metallization and superconductivity are simultaneously observed at about 11 GPa. With increasing pressure, the superconducting critical temperature Tc is monotonously enhanced from 2.6 K to a maximum 7.7 K at ~52 GPa. Interestingly, superconductivity retains when the pressure is fully released. Synchrotron XRD and Raman experiments consistently evidence that the emergence of superconductivity is accompanied with an amorphization and the retainability of superconductivity upon decompression can be attributed to the irreversibility of the amorphization. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Journal ref: Physical Review B 106, 104512 (2022)

arXiv:2202.13691 [pdf, ps, other]

doi 10.1007/s10543-022-00935-x

On the quadrature exactness in hyperinterpolation

Authors: Congpei An, Hao-Ning Wu

Abstract: This paper investigates the role of quadrature exactness in the approximation scheme of hyperinterpolation. Constructing a hyperinterpolant of degree $n$ requires a positive-weight quadrature rule with exactness degree $2n$. We examine the behavior of such approximation when the required exactness degree $2n$ is relaxed to $n+k$ with $0<k\leq n$. Aided by the Marcinkiewicz--Zygmund inequality, we… ▽ More This paper investigates the role of quadrature exactness in the approximation scheme of hyperinterpolation. Constructing a hyperinterpolant of degree $n$ requires a positive-weight quadrature rule with exactness degree $2n$. We examine the behavior of such approximation when the required exactness degree $2n$ is relaxed to $n+k$ with $0<k\leq n$. Aided by the Marcinkiewicz--Zygmund inequality, we affirm that the $L^2$ norm of the exactness-relaxing hyperinterpolation operator is bounded by a constant independent of $n$, and this approximation scheme is convergent as $n\rightarrow\infty$ if $k$ is positively correlated to $n$. Thus, the family of candidate quadrature rules for constructing hyperinterpolants can be significantly enriched, and the number of quadrature points can be considerably reduced. As a potential cost, this relaxation may slow the convergence rate of hyperinterpolation in terms of the reduced degrees of quadrature exactness. Our theoretical results are asserted by numerical experiments on three of the best-known quadrature rules: the Gauss quadrature, the Clenshaw--Curtis quadrature, and the spherical $t$-designs. △ Less

Submitted 11 July, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: 16 pages, 5 figures, 1 table

MSC Class: 65D32; 41A10; 41A55

arXiv:2202.09817 [pdf, other]

$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

Authors: Yitao Liu, Chenxin An, Xipeng Qiu

Abstract: With the success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Although some parameter-efficient tuning paradigms have been proposed to address this problem, they still require large resources to compute the gradients in the training phase. In this paper, we propose… ▽ More With the success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Although some parameter-efficient tuning paradigms have been proposed to address this problem, they still require large resources to compute the gradients in the training phase. In this paper, we propose $\mathcal{Y}$-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. $\mathcal{Y}$-tuning learns dense representations for labels $\mathcal{Y}$ defined in a given task and aligns them to fixed feature representation. Without tuning the features of input text and model parameters, $\mathcal{Y}$-tuning is both parameter-efficient and training-efficient. For $\text{DeBERTa}_\text{XXL}$ with 1.6 billion parameters, $\mathcal{Y}$-tuning achieves performance more than $96\%$ of full fine-tuning on GLUE Benchmark with only $2\%$ tunable parameters and much fewer training costs. △ Less

Submitted 7 January, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

arXiv:2202.09022 [pdf, other]

TURNER: The Uncertainty-based Retrieval Framework for Chinese NER

Authors: Zhichao Geng, Hang Yan, Zhangyue Yin, Chenxin An, Xipeng Qiu

Abstract: Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to uti… ▽ More Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to utilize more general knowledge resources, e.g., search engines. In this paper, we propose TURNER: The Uncertainty-based Retrieval framework for Chinese NER. The idea behind TURNER is to imitate human behavior: we frequently retrieve auxiliary knowledge as assistance when encountering an unknown or uncertain entity. To improve the efficiency and effectiveness of retrieval, we first propose two types of uncertainty sampling methods for selecting the most ambiguous entity-level uncertain components of the input text. Then, the Knowledge Fusion Model re-predict the uncertain samples by combining retrieved knowledge. Experiments on four benchmark datasets demonstrate TURNER's effectiveness. TURNER outperforms existing lexicon-based approaches and achieves the new SOTA. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.06244 [pdf]

Pressure-induced superconductivity reentrant in transition metal dichalcogenide TiSe2

Authors: Wei Xia, Jiaxuan Wu, Zhongyang Li, Jian Yuan, Chao An, Xia Wang, Na Yu, Zhiqiang Zou, Gang Liu, Chunyin Zhou, Jiajia Feng, Lili Zhang, Zhaohui Dong, Bin Chen, Zhaorong Yang, Zhenhai Yu, Hanghui Chen, Yanfeng Guo

Abstract: Through either elements intercalation or application of pressure, transition metal dichalcogenide 1T-TiSe2 exhibits superconductivity in proximity to a charge density wave (CDW) quantum critical point (QCP), thus providing an ideal avenue to study the correlation between the two symmetry-breaking exotic quantum electronic states. We report herein that, in addition to the well-known superconducting… ▽ More Through either elements intercalation or application of pressure, transition metal dichalcogenide 1T-TiSe2 exhibits superconductivity in proximity to a charge density wave (CDW) quantum critical point (QCP), thus providing an ideal avenue to study the correlation between the two symmetry-breaking exotic quantum electronic states. We report herein that, in addition to the well-known superconducting dome that emerges within the low pressure range of 2 - 4 GPa and peaks with the maximal Tc of about 1.8 K, the pressure induces another separate superconducting transition starting around 15 GPa with a substantially higher Tc that reaches 5.6 K at about 21.5 GPa. The high-pressure X-ray diffraction and Raman spectroscopy measurements unveil that the superconductivity reentrant is caused by a first-order structural phase transition (from P-3m1 space group to Pnma space group), which is also supported by the density functional theory calculation. A comparative theoretical calculation also reveals that the conventional phonon-mediated mechanism can account for the superconductivity of 1T-TiSe2 under low pressure, while the electron-phonon coupling of 4O-TiSe2 under high pressure is too weak to induce the superconductivity with a Tc as high as 5.6 K. This implies that the emergent superconductivity in the 4O-TiSe2 may have an unconventional origin. Our finding would open a new window toward the discovery of more exotic quantum states in transition metal dichalcogenides via high pressure. △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: 15 pages, 4 figures

arXiv:2201.02979 [pdf, ps, other]

doi 10.1088/1361-6420/acd4e1

Enhanced total variation minimization for stable image reconstruction

Authors: Congpei An, Hao-Ning Wu, Xiaoming Yuan

Abstract: The total variation (TV) regularization has phenomenally boosted various variational models for image processing tasks. We propose to combine the backward diffusion process in the earlier literature of image enhancement with the TV regularization, and show that the resulting enhanced TV minimization model is particularly effective for reducing the loss of contrast. The main purpose of this paper i… ▽ More The total variation (TV) regularization has phenomenally boosted various variational models for image processing tasks. We propose to combine the backward diffusion process in the earlier literature of image enhancement with the TV regularization, and show that the resulting enhanced TV minimization model is particularly effective for reducing the loss of contrast. The main purpose of this paper is to establish stable reconstruction guarantees for the enhanced TV model from noisy subsampled measurements with two sampling strategies, non-adaptive sampling for general linear measurements and variable-density sampling for Fourier measurements. In particular, under some weaker restricted isometry property conditions, the enhanced TV minimization model is shown to have tighter reconstruction error bounds than various TV-based models for the scenario where the level of noise is significant and the amount of measurements is limited. Advantages of the enhanced TV model are also numerically validated by preliminary experiments on the reconstruction of some synthetic, natural, and medical images. △ Less

Submitted 19 August, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: 29 pages, 8 figures

MSC Class: 94A08; 94A20; 68U10; 68Q25

arXiv:2110.06754 [pdf, ps, other]

doi 10.1016/j.acha.2022.07.002

The springback penalty for robust signal recovery

Authors: Congpei An, Hao-Ning Wu, Xiaoming Yuan

Abstract: We propose a new penalty, the springback penalty, for constructing models to recover an unknown signal from incomplete and inaccurate measurements. Mathematically, the springback penalty is a weakly convex function. It bears various theoretical and computational advantages of both the benchmark convex $\ell_1$ penalty and many of its non-convex surrogates that have been well studied in the literat… ▽ More We propose a new penalty, the springback penalty, for constructing models to recover an unknown signal from incomplete and inaccurate measurements. Mathematically, the springback penalty is a weakly convex function. It bears various theoretical and computational advantages of both the benchmark convex $\ell_1$ penalty and many of its non-convex surrogates that have been well studied in the literature. We establish the exact and stable recovery theory for the recovery model using the springback penalty for both sparse and nearly sparse signals, respectively, and derive an easily implementable difference-of-convex algorithm. In particular, we show its theoretical superiority to some existing models with a sharper recovery bound for some scenarios where the level of measurement noise is large or the amount of measurements is limited. We also demonstrate its numerical robustness regardless of the varying coherence of the sensing matrix. The springback penalty is particularly favorable for the scenario where the incomplete and inaccurate measurements are collected by coherence-hidden or -static sensing hardware due to its theoretical guarantee of recovery with severe measurements, computational tractability, and numerical robustness for ill-conditioned sensing matrices. △ Less

Submitted 19 August, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: 26 pages, 8 figures

MSC Class: 94A12; 65K10; 90C26

Journal ref: Applied and Computational Harmonic Analysis 61 (2022), pp.319-346

arXiv:2110.06408 [pdf, ps, other]

doi 10.1103/PhysRevD.104.094008

The $Ω_{cc}$ resonances with negative parity in the chiral constituent quark model

Authors: **-Bao Wang, Gang Li, Cheng-Rong Deng, Chun-Sheng An, Ju-Jun Xie

Abstract: Spectrum of the low-lying $Ω_{cc}$ resonances with negative parity, which are assumed to be dominated by $sccq\bar{q}$ pentaquark components, is investigated using the chiral constituent quark model. Energies of the $Ω_{cc}$ resonances are obtained by considering the hyperfine interaction between quarks by exchanging Goldstone boson. Possible $sccq\bar{q}$ configurations with spin-parity… ▽ More Spectrum of the low-lying $Ω_{cc}$ resonances with negative parity, which are assumed to be dominated by $sccq\bar{q}$ pentaquark components, is investigated using the chiral constituent quark model. Energies of the $Ω_{cc}$ resonances are obtained by considering the hyperfine interaction between quarks by exchanging Goldstone boson. Possible $sccq\bar{q}$ configurations with spin-parity $1/2^{-}$, $3/2^{-}$ and $5/2^{-}$ are taken into account. Numerical results show that the lowest $Ω_{cc}$ resonances with negative parity may lie at $4050 \pm 100$ MeV. In addition, the transitions of the $Ω_{cc}$ resonance to a pseudoscalar meson and a ground baryon state are also investigated within the chiral Lagrangian approach. We expect that these $Ω_{cc}$ resonances could be observed in the $\bar{D}Ξ_{c}$ channel by future experiments. △ Less

Submitted 14 October, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: typos corrected

arXiv:2109.13770 [pdf, other]

Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

Authors: Andrew Lee, Jonathan K. Kummerfeld, Lawrence C. An, Rada Mihalcea

Abstract: Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise. These factors limit their use, particularly in settings such as mental health, where it is difficult to annotate datasets and model outputs have significant impact. We introduce a micromodel ar… ▽ More Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise. These factors limit their use, particularly in settings such as mental health, where it is difficult to annotate datasets and model outputs have significant impact. We introduce a micromodel architecture to address these challenges. Our approach allows researchers to build interpretable representations that embed domain knowledge and provide explanations throughout the model's decision process. We demonstrate the idea on multiple mental health tasks: depression classification, PTSD classification, and suicidal risk assessment. Our systems consistently produce strong results, even in low-resource scenarios, and are more interpretable than alternative methods. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: To appear in Findings of EMNLP 2021

arXiv:2109.07943 [pdf, other]

RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization

Authors: Chenxin An, Ming Zhong, Zhichao Geng, Jianqiang Yang, Xipeng Qiu

Abstract: Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write summaries in a particular format. But how to find the high-quality exemplars and incorporate them into summarization systems is still challenging and worth expl… ▽ More Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write summaries in a particular format. But how to find the high-quality exemplars and incorporate them into summarization systems is still challenging and worth exploring. In this paper, we propose RetrievalSum, a novel retrieval enhanced abstractive summarization framework consisting of a dense Retriever and a Summarizer. At first, several closely related exemplars are retrieved as supplementary input to help the generation model understand the text more comprehensively. Furthermore, retrieved exemplars can also play a role in guiding the model to capture the writing style of a specific corpus. We validate our method on a wide range of summarization datasets across multiple domains and two backbone models: BERT and BART. Results show that our framework obtains significant improvement by 1.38~4.66 in ROUGE-1 score when compared with the powerful pre-trained models, and achieve new state-of-the-art on BillSum. Human evaluation demonstrates that our retrieval enhanced model can better capture the domain-specific writing style. △ Less

Submitted 13 December, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

arXiv:2106.00866 [pdf, ps, other]

doi 10.1103/PhysRevD.103.114018

The axial charges of proton within an extended chiral constituent quark model

Authors: **-Bao Wang, Gang Li, Chun-Sheng An, Ju-Jun Xie

Abstract: We have performed a study of the isovector, octet and singlet axial charges of the proton in an extended chiral constituent quark model, where all the possible $uudq\bar{q}$~($q=u,d,s$) five-quark Fock components in the proton wave function are taken into account. The $^3P_0$ quark-antiquark creation mechanism is assumed to account for the transition coupling between three- and five-quark componen… ▽ More We have performed a study of the isovector, octet and singlet axial charges of the proton in an extended chiral constituent quark model, where all the possible $uudq\bar{q}$~($q=u,d,s$) five-quark Fock components in the proton wave function are taken into account. The $^3P_0$ quark-antiquark creation mechanism is assumed to account for the transition coupling between three- and five-quark components in proton, and the corresponding transition coupling strength is fixed by fitting the intrinsic sea flavor asymmetry $\bar{d}-\bar{u}$ data for proton. Accordingly, with all the parameters fixed by empirical values, the probabilities of the intrinsic five-quark Fock components in proton wave function should be $\sim30 - 50\%$, which lead to the numerical results for quark spin $Δu$, $Δd$ and $Δs$, as well the axial charges of proton consistent with the experimental data and predictions by other theoretical approaches. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 9 pages, 4 tables

Journal ref: Phys. Rev. D 103, 114018 (2021)

arXiv:2104.03057 [pdf, other]

Enhancing Scientific Papers Summarization with Citation Graph

Authors: Chenxin An, Ming Zhong, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuan**g Huang

Abstract: Previous work for text summarization in scientific domain mainly focused on the content of the input document, but seldom considering its citation network. However, scientific papers are full of uncommon domain-specific terms, making it almost impossible for the model to understand its true meaning without the help of the relevant research community. In this paper, we redefine the task of scientif… ▽ More Previous work for text summarization in scientific domain mainly focused on the content of the input document, but seldom considering its citation network. However, scientific papers are full of uncommon domain-specific terms, making it almost impossible for the model to understand its true meaning without the help of the relevant research community. In this paper, we redefine the task of scientific papers summarization by utilizing their citation graph and propose a citation graph-based summarization model CGSum which can incorporate the information of both the source paper and its references. In addition, we construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains and 661K citation relationships. The entire dataset constitutes a large connected citation graph. Extensive experiments show that our model can achieve competitive performance when compared with the pretrained models even with a simple architecture. The results also indicates the citation graph is crucial to better understand the content of papers and generate high-quality summaries. △ Less

Submitted 7 April, 2021; originally announced April 2021.

Comments: accepted by AAAI 2021

Showing 1–50 of 106 results for author: An, C