Search | arXiv e-print repository

The FRB-searching pipeline of the Tianlai Cylinder Pathfinder Array

Authors: Zijie Yu, Furen Deng, Shijie Sun, Chenhui Niu, Jixia Li, Fengquan Wu, Wei-Yang Wang, Yougang Wang, Shifan Zuo, Lin Shu, Jie Hao, Xiaohui Liu, Reza Ansari, Ue-Li Pen, Albert Stebbins, Peter Timbie, Xuelei Chen

Abstract: This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern ce… ▽ More This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern celestial semi-sphere. The FRB digital backend enables the formation of 96 digital beams, effectively covering an area of approximately 40 square degrees with 3 dB beam. Our pipeline demonstrates the capability to make automatic search of FRBs, detecting at quasi-real-time and classify FRB candidates automatically. The current FRB searching pipeline has an overall recall rate of 88\%. During the commissioning phase, we successfully detected signals emitted by four well-known pulsars: PSR B0329+54, B2021+51, B0823+26, and B2020+28. We report the first discovery of an FRB by our array, designated as FRB 20220414A. We also investigate the optimal arrangement for the digitally formed beams to achieve maximum detection rate by numerical simulation. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 27 pages, 21 figures, 7 tables, RAA accepted

arXiv:2406.06592 [pdf, other]

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures, 1 table

arXiv:2405.16178 [pdf, other]

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Authors: Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, **dong Chen

Abstract: Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse R… ▽ More Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, filtering out undesirable contexts enhances the model's focus on relevant context, inherently improving its generation quality. Evaluation results of two datasets show that Sparse RAG can strike an optimal balance between generation quality and computational efficiency, demonstrating its generalizability across both short- and long-form generation tasks. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.11502 [pdf, other]

CTGNN: Crystal Transformer Graph Neural Network for Crystal Material Property Prediction

Authors: Zijian Du, Luozhijie **, Le Shu, Yan Cen, Yuanfeng Xu, Yongfeng Mei, Hao Zhang

Abstract: The combination of deep learning algorithm and materials science has made significant progress in predicting novel materials and understanding various behaviours of materials. Here, we introduced a new model called as the Crystal Transformer Graph Neural Network (CTGNN), which combines the advantages of Transformer model and graph neural networks to address the complexity of structure-properties r… ▽ More The combination of deep learning algorithm and materials science has made significant progress in predicting novel materials and understanding various behaviours of materials. Here, we introduced a new model called as the Crystal Transformer Graph Neural Network (CTGNN), which combines the advantages of Transformer model and graph neural networks to address the complexity of structure-properties relation of material data. Compared to the state-of-the-art models, CTGNN incorporates the graph network structure for capturing local atomic interactions and the dual-Transformer structures to model intra-crystal and inter-atomic relationships comprehensively. The benchmark carried on by the proposed CTGNN indicates that CTGNN significantly outperforms existing models like CGCNN and MEGNET in the prediction of formation energy and bandgap properties. Our work highlights the potential of CTGNN to enhance the performance of properties prediction and accelerates the discovery of new materials, particularly for perovskite materials. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 17 pages

arXiv:2405.07429 [pdf, other]

JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning… ▽ More Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at https://github.com/LuoXubo/JointLoc. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 8 pages

arXiv:2403.09030 [pdf]

An AI-Driven Approach to Wind Turbine Bearing Fault Diagnosis from Acoustic Signals

Authors: Zhao Wang, Xiaomeng Li, Na Li, Longlong Shu

Abstract: This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and f… ▽ More This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and frequency domain information. The model exhibited outstanding accuracy on training samples and demonstrated excellent generalization ability during validation, indicating its proficiency of generalization capability. On the test samples, the model achieved remarkable classification performance, with an overall accuracy exceeding 99.5%, and a false positive rate of less than 1% for normal status. The findings of this study provide essential support for the diagnosis and maintenance of bearing faults in wind turbine generators, with the potential to enhance the reliability and efficiency of wind power generation. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2401.16980 [pdf, other]

doi 10.1103/PhysRevB.109.054516

Two-Dimensional Phase-Fluctuating Superconductivity in Bulk-Crystalline NdO$_{0.5}$F$_{0.5}$BiS$_2$

Authors: C. S. Chen, J. Küspert, I. Biało, J. Mueller, K. W. Chen, M. Y. Zou, D. G. Mazzone, D. Bucher, K. Tanaka, O. Ivashko, M. v. Zimmermann, Qisi Wang, Lei Shu, J. Chang

Abstract: We present a combined growth and transport study of superconducting single-crystalline NdO$_{0.5}$F$_{0.5}$BiS$_2$. Evidence of two-dimensional superconductivity with significant phase fluctuations of preformed Cooper pairs preceding the superconducting transition is reported. This result is based on three key observations. (1) The resistive superconducting transition temperature $T_c$ (defined by… ▽ More We present a combined growth and transport study of superconducting single-crystalline NdO$_{0.5}$F$_{0.5}$BiS$_2$. Evidence of two-dimensional superconductivity with significant phase fluctuations of preformed Cooper pairs preceding the superconducting transition is reported. This result is based on three key observations. (1) The resistive superconducting transition temperature $T_c$ (defined by resistivity $ρ\rightarrow 0$) increases with increasing disorder. (2) As $T\rightarrow T_c$, the conductivity diverges significantly faster than what is expected from Gaussian fluctuations in two and three dimensions. (3) Non-Ohmic resistance behavior is observed in the superconducting state. Altogether, our observations are consistent with a temperature regime of phase-fluctuating superconductivity. The crystal structure with magnetic ordering tendencies in the NdO$_{0.5}$F$_{0.5}$ layers and (super)conductivity in the BiS$_2$ layers is likely responsible for the two-dimensional phase fluctuations. As such, NdO$_{0.5}$F$_{0.5}$BiS$_2$ falls into the class of unconventional ``laminar" bulk superconductors that include cuprate materials and 4Hb-TaS$_2$. △ Less

Submitted 24 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.09755 [pdf, other]

Crystal Transformer Based Universal Atomic Embedding for Accurate and Transferable Prediction of Materials Properties

Authors: Luozhijie **, Zijian Du, Le Shu, Yongfeng Mei, Hao Zhang

Abstract: In this work, we propose a novel approach to generate universal atomic embeddings, significantly enhancing the representational and accuracy aspects of atomic embeddings, which ultimately improves the accuracy of property prediction. Moreover, we demonstrate the excellent transferability of universal atomic embeddings across different databases and various property tasks. Our approach centers on d… ▽ More In this work, we propose a novel approach to generate universal atomic embeddings, significantly enhancing the representational and accuracy aspects of atomic embeddings, which ultimately improves the accuracy of property prediction. Moreover, we demonstrate the excellent transferability of universal atomic embeddings across different databases and various property tasks. Our approach centers on develo** the CrystalTransformer model. Unlike traditional methods, this model does not possess a fundamental graph network architecture but utilizes the Transformer architecture to extract latent atomic features. This allows the CrystalTransformer to mitigate the inherent topological information bias of graph neural networks while maximally preserving the atomic chemical information, making it more accurate in encoding complex atomic features and thereby offering a deeper understanding of the atoms in materials. In our research, we highlight the advantages of CrystalTransformer in generating universal atomic embeddings through comparisons with current mainstream graph neural network models. Furthermore, we validate the effectiveness of universal atomic embeddings in enhancing the accuracy of model predictions for properties and demonstrate their transferability across different databases and property tasks through various experiments. As another key aspect of our study, we discover the strong physical interpretability implied in universal atomic embeddings through clustering and correlation analysis, indicating the immense potential of our universal atomic embeddings as atomic fingerprints. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 24 pages, 5 figures

arXiv:2401.07382 [pdf, other]

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

Authors: Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng

Abstract: Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework… ▽ More Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of Large Language Models (LLMs) to produce intermediate-step rewards during RL training. Our method involves coupling a policy model with a critic language model, which is responsible for providing comprehensive feedback of each part of the output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation. △ Less

Submitted 19 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.04546 [pdf, ps, other]

Multi-condensate lengths with degenerate excitation gaps in BaNi$_2$As$_2$ revealed by muon spin relaxation study

Authors: Kaiwen Chen, Zihao Zhu, Yaofeng Xie, Adrian D. Hillier, James S. Lord, Pengcheng Dai, Lei Shu

Abstract: The recently discovered (Ba,Sr)Ni$_2$As$_2$ family provides an ideal platform for investigating the interaction between electronic nematicity and superconductivity. Here we report the muon spin relaxation ($μ$SR) measurements on BaNi$_2$As$_2$. Transverse-field $μ$SR experiments indicate that the temperature dependence of superfluid density is best fitted with a single-band $s$-wave model. On the… ▽ More The recently discovered (Ba,Sr)Ni$_2$As$_2$ family provides an ideal platform for investigating the interaction between electronic nematicity and superconductivity. Here we report the muon spin relaxation ($μ$SR) measurements on BaNi$_2$As$_2$. Transverse-field $μ$SR experiments indicate that the temperature dependence of superfluid density is best fitted with a single-band $s$-wave model. On the other hand, the magnetic penetration depth $λ$ shows magnetic field dependence, which contradicts with the single-band fully-gapped scenario. Zero-field $μ$SR experiments indicate the absence of spontaneous magnetic field in the superconducting state, showing the preservation of time-reversal symmetry in the superconducting state. Our $μ$SR experiments suggest that BaNi$_2$As$_2$ is a fully-gapped multiband superconductor. The superconducting gap amplitudes of each band are nearly the same while different bands exhibit different coherence lengths. The present work helps to elucidate the controversial superconducting property of this parent compound, paving the way for further research on do** the system with Sr to enhance superconductivity. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by Phys. Rev. B

arXiv:2311.16344 [pdf, other]

Spatially Adaptive Cloth Regression with Implicit Neural Representations

Authors: Lei Shu, Vinicius Azevedo, Barbara Solenthaler, Markus Gross

Abstract: The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anis… ▽ More The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anisotropic cloth regression technique that capitalizes on the potential of implicit neural representations of surfaces. Our first core contribution is an innovative mesh-free sampling approach, crafted to reduce the reliance on traditional mesh structures, thereby offering greater flexibility and accuracy in capturing fine cloth details. Our second contribution is a novel adversarial training scheme, which is designed meticulously to strike a harmonious balance between the sampling and simulation objectives. The adversarial approach ensures that the wrinkles are represented with high fidelity, while also maintaining computational efficiency. Our results showcase through various cloth-object interaction scenarios that our method, given the same memory constraints, consistently surpasses traditional discrete representations, particularly when modelling highly-detailed localized wrinkles. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 16 pages, 13 figures

MSC Class: 68T07 ACM Class: I.3.0

arXiv:2311.15717 [pdf, other]

Evidence of spin density waves in La$_3$Ni$_2$O$_{7-δ}$

Authors: Kaiwen Chen, Xiangqi Liu, Jiachen Jiao, Muyuan Zou, Yixuan Luo, Qiong Wu, Ningyuan Zhang, Yanfeng Guo, Lei Shu

Abstract: The recently discovered superconductivity with critical temperature $T_c$ up to 80 K in the double-layer Nickelate La$_3$Ni$_2$O$_{7-δ}$ under pressure has drawn great attention. Here we report the positive muon spin relaxation ($μ^+$SR) study of polycrystalline La$_3$Ni$_2$O$_{6.92}$ under ambient pressure. Zero-field $μ^+$SR experiments reveal the existence of magnetic order in La$_3$Ni$_2$O… ▽ More The recently discovered superconductivity with critical temperature $T_c$ up to 80 K in the double-layer Nickelate La$_3$Ni$_2$O$_{7-δ}$ under pressure has drawn great attention. Here we report the positive muon spin relaxation ($μ^+$SR) study of polycrystalline La$_3$Ni$_2$O$_{6.92}$ under ambient pressure. Zero-field $μ^+$SR experiments reveal the existence of magnetic order in La$_3$Ni$_2$O$_{6.92}$ with $T_{N}=154\ \rm{K}$. The weak transverse field $μ^+$SR measurements confirms the bulk nature of magnetism. In addition, a small quantity of oxygen deficiencies can greatly broaden the internal magnetic field distribution sensed by muons. △ Less

Submitted 13 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.09204 [pdf, other]

Fusion-Eval: Integrating Assistant Evaluators with LLMs

Authors: Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, **dong Chen, Lei Meng

Abstract: Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant ev… ▽ More Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant evaluators. Each of these evaluators specializes in assessing distinct aspects of responses. Fusion-Eval achieves a 0.962 system-level Kendall-Tau correlation with humans on SummEval and a 0.744 turn-level Spearman correlation on TopicalChat, which is significantly higher than baseline methods. These results highlight Fusion-Eval's significant potential in the realm of natural language system evaluation. △ Less

Submitted 6 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.09179 [pdf, other]

SiRA: Sparse Mixture of Low Rank Adaptation

Authors: Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, **dong Chen, Lei Meng

Abstract: Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the imp… ▽ More Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA and other mixture of expert approaches across different single tasks and multitask settings. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.04815 [pdf, other]

Critique Ability of Large Language Models

Authors: Liangchen Luo, Zi Lin, Yinxiao Liu, Lei Shu, Yun Zhu, **gbo Shang, Lei Meng

Abstract: Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not… ▽ More Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not only serve as a reliable evaluator, but also as a source of supervised signals for model tuning. Particularly, if a model can self-critique, it has the potential for autonomous self-improvement. To examine this, we introduce a unified evaluation framework for assessing the critique abilities of LLMs. We develop a benchmark called CriticBench, which comprises 3K high-quality natural language queries and corresponding model responses; and annotate the correctness of these responses. The benchmark cover tasks such as math problem-solving, code completion, and question answering. We evaluate multiple LLMs on the collected dataset and our analysis reveals several noteworthy insights: (1) Critique is generally challenging for most LLMs, and this capability often emerges only when models are sufficiently large. (2) In particular, self-critique is especially difficult. Even top-performing LLMs struggle to achieve satisfactory performance. (3) Models tend to have lower critique accuracy on problems where they are most uncertain. To this end, we introduce a simple yet effective baseline named self-check, which leverages self-critique to improve task performance for various models. We hope this study serves as an initial exploration into understanding the critique abilities of LLMs, and aims to inform future research, including the development of more proficient critic models and the application of critiques across diverse tasks. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.16982 [pdf, other]

Superconducting Properties of La$_2$(Cu$_{1-x}$Ni_x)$_5$As$_3$O$_2$: A $\rm μ$SR Study

Authors: Qiong Wu, Kaiwen Chen, Zihao Zhu, Cheng Tan, Yanxing Yang, Xin Li, Toni Shiroka, Xu Chen, Jiangang Guo, Xiaolong Chen, Lei Shu

Abstract: We report the results of muon spin rotation and relaxation ($\rm μ$SR) measurements on the recently discovered layered Cu-based superconducting material La$_{2}($Cu$_{1-x}$Ni$_{x}$)$_{5}$As$_{3}$O$_{2}$ ($x =$ 0.40, 0.45). Transverse-field $\rm μ$SR experiments on both samples show that the temperature dependence of superfluid density is best described by a two-band model. The absolute values of z… ▽ More We report the results of muon spin rotation and relaxation ($\rm μ$SR) measurements on the recently discovered layered Cu-based superconducting material La$_{2}($Cu$_{1-x}$Ni$_{x}$)$_{5}$As$_{3}$O$_{2}$ ($x =$ 0.40, 0.45). Transverse-field $\rm μ$SR experiments on both samples show that the temperature dependence of superfluid density is best described by a two-band model. The absolute values of zero-temperature magnetic penetration depth $λ_{\rm ab}(0)$ were found to be 427(1.7) nm and 422(1.5) nm for $x =$ 0.40 and 0.45, respectively. Both compounds are located between the unconventional and the standard BCS superconductors in the Uemura plot. No evidence of time-reversal symmetry (TRS) breaking in the superconducting state is suggested by zero-field $\rm μ$SR measurements. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Journal ref: Phys. Rev. B 107, 214502 (2003)

arXiv:2309.16947 [pdf, other]

Muon Spin Relaxation Study of frustrated Tm$_3$Sb$_3$Mg$_2$O$_{14}$ with kagomé lattice

Authors: Yanxing Yang, Kaiwen Chen, Zhaofeng Ding, Adrian D. Hillier, Lei Shu

Abstract: The structure and magnetic properties of rare-earth ions Tm$^{3+}$ kagomé lattice Tm$_3$Sb$_3$Mg$_2$O$_{14}$ are studied by X-ray diffraction, magnetic susceptibility and muon spin relaxation ($μ$SR) experiments. The existence of a small amount of Tm/Mg site-mixing disorder is revealed. DC magnetic susceptibility measurement shows that Tm$^{3+}$ magnetic moments are antiferromagnetically correlate… ▽ More The structure and magnetic properties of rare-earth ions Tm$^{3+}$ kagomé lattice Tm$_3$Sb$_3$Mg$_2$O$_{14}$ are studied by X-ray diffraction, magnetic susceptibility and muon spin relaxation ($μ$SR) experiments. The existence of a small amount of Tm/Mg site-mixing disorder is revealed. DC magnetic susceptibility measurement shows that Tm$^{3+}$ magnetic moments are antiferromagnetically correlated with a negative Curie-Weiss temperature of -26.3 K. Neither long-range magnetic order nor spin-glass transition is observed by DC and AC magnetic susceptibility, and confirmed by $μ$SR experiment down to 0.1 K. However, the emergence of short-range magnetic order is indicated by the zero-field $μ$SR experiments, and the absence of spin dynamics at low temperatures is evidenced by the longitudinal-field $μ$SR technique. Compared with the results of Tm$_3$Sb$_3$Zn$_2$O$_{14}$, another Tm-based kagomé lattice with much more site-mixing disorder, the gapless spin liquid like behaviors in Tm$_3$Sb$_3$Zn$_2$O$_{14}$ can be induced by disorder effect. Samples with perfect geometrical frustration are in urgent demand to establish whether QSL exits in this kind of materials with rare-earth kagomé lattice. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Journal ref: Chin. Phys. Lett. 39 (2022) 107502

arXiv:2309.08026 [pdf, other]

Determinants of successful mitigation in coupled social-climate dynamics

Authors: Longmei Shu, Feng Fu

Abstract: Understanding the impact of human behavior is crucial for successful mitigation of climate change across the globe. To shed light onto this issue, here we couple the forest dieback model with human behaviors. Using evolutionary game theory, we build a time-delay system where forest growth is impacted by both temperature and human mitigation choices, the latter being informed by temperature forecas… ▽ More Understanding the impact of human behavior is crucial for successful mitigation of climate change across the globe. To shed light onto this issue, here we couple the forest dieback model with human behaviors. Using evolutionary game theory, we build a time-delay system where forest growth is impacted by both temperature and human mitigation choices, the latter being informed by temperature forecasts. Simulations of the coupled system over 200 years show us the varying outcomes: forest dies out and no one is a mitigator, forest dies out and everyone is a mitigator, or the forest survives and everyone is a mitigator. There exist rare cases where no one is a mitigator and yet the forest survives, but with a low coverage. We also find occasional oscillations where the proportion of mitigators vary between 0 and 1. Our results are based on simple models but have profound insights into determinants of behavior changes desired in social-climate dynamics. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.11807 [pdf, other]

Towards an On-device Agent for Text Rewriting

Authors: Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, **dong Chen, Lei Meng

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a s… ▽ More Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a small size with the need to retain the emergent capabilities of the LLM, that requires costly data collection. To address the above challenge, we introduce a new instruction tuning approach for building a mobile-centric text rewriting model. Our strategies enable the generation of high quality training data without any human labeling. In addition, we propose a heuristic reinforcement learning framework which substantially enhances performance without requiring preference data. To further bridge the performance gap with the larger server-side model, we propose an effective approach that combines the mobile rewrite agent with the server model using a cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions. Through empirical experiments, we demonstrate that our on-device model surpasses the current state-of-the-art LLMs in text rewriting while maintaining a significantly reduced model size. Notably, we show that our proposed cascading approach improves model performance. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.00063 [pdf, ps, other]

Isospectral Reductions of Non-negative Matrices

Authors: Alexandre Baraviera, Pedro Duarte, Longmei Shu, Maria Joana Torres

Abstract: Isospectral reduction is an important tool for network/matrix analysis as it reduces the dimension of a matrix/network while preserving all its eigenvalues and eigenvectors. The main contribution of this manuscript is a proposed algorithmic scheme to approximate the stationary measure of a stochastic matrix based on isospectral reduction. This scheme can be advantageous when there is more than one… ▽ More Isospectral reduction is an important tool for network/matrix analysis as it reduces the dimension of a matrix/network while preserving all its eigenvalues and eigenvectors. The main contribution of this manuscript is a proposed algorithmic scheme to approximate the stationary measure of a stochastic matrix based on isospectral reduction. This scheme can be advantageous when there is more than one eigenvalue near 1, precisely the case where iterative methods perform poorly. In addition we give a partial explanation why this scheme should work well, showing that in some situations isospectral reduction improves the spectral gap. △ Less

Submitted 27 September, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

arXiv:2305.15685 [pdf, other]

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Authors: Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, **dong Chen, Lei Meng

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of s… ▽ More Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of single sentences. In this work, we develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages including 1) generating rewriting instruction data from Wiki edits and public corpus through instruction generation and chain-of-thought prompting; 2) collecting comparison data for reward model training through a new ranking function. To facilitate this research, we introduce OpenRewriteEval, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions. Our results show significant improvements over a variety of baselines. The public repository is available on GitHub under Google Research (https://github.com/google-research/google-research/tree/master/rewritelm). △ Less

Submitted 19 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Journal ref: AAAI 2024

arXiv:2304.11658 [pdf, other]

Capturing Fine-grained Semantics in Contrastive Graph Representation Learning

Authors: Lin Shu, Chuan Chen, Zibin Zheng

Abstract: Graph contrastive learning defines a contrastive task to pull similar instances close and push dissimilar instances away. It learns discriminative node embeddings without supervised labels, which has aroused increasing attention in the past few years. Nevertheless, existing methods of graph contrastive learning ignore the differences between diverse semantics existed in graphs, which learn coarse-… ▽ More Graph contrastive learning defines a contrastive task to pull similar instances close and push dissimilar instances away. It learns discriminative node embeddings without supervised labels, which has aroused increasing attention in the past few years. Nevertheless, existing methods of graph contrastive learning ignore the differences between diverse semantics existed in graphs, which learn coarse-grained node embeddings and lead to sub-optimal performances on downstream tasks. To bridge this gap, we propose a novel Fine-grained Semantics enhanced Graph Contrastive Learning (FSGCL) in this paper. Concretely, FSGCL first introduces a motif-based graph construction, which employs graph motifs to extract diverse semantics existed in graphs from the perspective of input data. Then, the semantic-level contrastive task is explored to further enhance the utilization of fine-grained semantics from the perspective of model training. Experiments on five real-world datasets demonstrate the superiority of our proposed FSGCL over state-of-the-art methods. To make the results reproducible, we will make our codes public on GitHub after this paper is accepted. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2301.08986 [pdf, other]

Adapting a Language Model While Preserving its General Knowledge

Authors: Zixuan Ke, Yijia Shao, Haowei Lin, Hu Xu, Lei Shu, Bing Liu

Abstract: Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM… ▽ More Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge. Experimental results will demonstrate the effectiveness of the proposed approach. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: EMNLP 2022

arXiv:2210.05549 [pdf, other]

Continual Training of Language Models for Few-Shot Learning

Authors: Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, Bing Liu

Abstract: Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowl… ▽ More Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Journal ref: EMNLP 2022

arXiv:2210.03272 [pdf, other]

doi 10.1088/1674-4527/ac977c

A Fast Transient Backend to Detect FRBs with the Tianlai Dish Pathfinder Array

Authors: Zijie Yu, Furen Deng, Shijie Sun, Chenhui Niu, Jixia Li, Fengquan Wu, Wei-Yang Wang, Yougang Wang, Hui Feng, Lin Shu, Jie Hao, Reza Ansari, Albert Stebbins, Xuelei Chen

Abstract: The Tianlai Dish Pathfinder array is a radio interferometer array consisting of 16 six meter dish antennas. The original digital backend integration time is at the seconds level, designed for HI intensity map** experiment. A new digital backend with millisecond response is added to enable it to search for fast radio burst (FRB) during its observations. The design and calibration of this backend,… ▽ More The Tianlai Dish Pathfinder array is a radio interferometer array consisting of 16 six meter dish antennas. The original digital backend integration time is at the seconds level, designed for HI intensity map** experiment. A new digital backend with millisecond response is added to enable it to search for fast radio burst (FRB) during its observations. The design and calibration of this backend, and the real time search pipeline for it are described in this paper. It is capable of forming 16 digital beams for each linear polarisation, covering an area of 19.6 square degrees. The search pipeline is capable of searching for, recording and classifying FRBs automatically in real time. In commissioning, we succeeded in capturing the signal pulses from the pulsars PSR B0329+54 and B2021+51. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 16 pages, 14 figures, RAA accepted

Journal ref: Research in Astronomy and Astrophysics, 22, 125007 (2022)

arXiv:2209.04277 [pdf]

Flexo-photovoltaic effect and above-bandgap photovoltage in halide perovskites

Authors: Zhiguo Wang, Shengwen Shu, Xiaoyong Wei, Renhong Liang, Shanming Ke, Longlong Shu, Gustau Catalan

Abstract: Halide perovskites have outstanding photovoltaic properties which have been optimized through interfacial engineering. However, as these materials approach the limits imposed by the physics of semiconductor junctions, it is urgent to explore alternatives, such as the bulk photovoltaic effect, whose physical origin is different and not bound by the same limits. In this context, we focus on the flex… ▽ More Halide perovskites have outstanding photovoltaic properties which have been optimized through interfacial engineering. However, as these materials approach the limits imposed by the physics of semiconductor junctions, it is urgent to explore alternatives, such as the bulk photovoltaic effect, whose physical origin is different and not bound by the same limits. In this context, we focus on the flexo-photovoltaic effect, a type of bulk photovoltaic effect that was recently observed in oxides under strain gradients. We have measured the flexo-photovoltaic effect of MAPbBr3 and MAPbI3 crystals under bending and found it to be orders of magnitude larger than for SrTiO3, the benchmark flexo-photovoltaic oxide. For sufficiently large strain gradients, photovoltages bigger than the bandgap can be produced. Bulk photovoltaic effects are additive and, for MAPbI3, the flexo-photovoltage exists on top of a native bulk photovoltage that is hysteretic, consistent with the electrically switchable macroscopic polarization of this material. The results suggest that harnessing the flexo-photovoltaic effect through strain gradient engineering can provide a functional leap forward for halide perovskites. △ Less

Submitted 4 January, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: 20 pages, 11 figures

arXiv:2208.13685 [pdf, other]

FedEgo: Privacy-preserving Personalized Federated Graph Learning with Ego-graphs

Authors: Taolin Zhang, Chuan Chen, Yaomin Chang, Lin Shu, Zibin Zheng

Abstract: As special information carriers containing both structure and feature information, graphs are widely used in graph mining, e.g., Graph Neural Networks (GNNs). However, in some practical scenarios, graph data are stored separately in multiple distributed parties, which may not be directly shared due to conflicts of interest. Hence, federated graph neural networks are proposed to address such data s… ▽ More As special information carriers containing both structure and feature information, graphs are widely used in graph mining, e.g., Graph Neural Networks (GNNs). However, in some practical scenarios, graph data are stored separately in multiple distributed parties, which may not be directly shared due to conflicts of interest. Hence, federated graph neural networks are proposed to address such data silo problems while preserving the privacy of each party (or client). Nevertheless, different graph data distributions among various parties, which is known as the statistical heterogeneity, may degrade the performance of naive federated learning algorithms like FedAvg. In this paper, we propose FedEgo, a federated graph learning framework based on ego-graphs to tackle the challenges above, where each client will train their local models while also contributing to the training of a global model. FedEgo applies GraphSAGE over ego-graphs to make full use of the structure information and utilizes Mixup for privacy concerns. To deal with the statistical heterogeneity, we integrate personalization into learning and propose an adaptive mixing coefficient strategy that enables clients to achieve their optimal personalization. Extensive experimental results and in-depth analysis demonstrate the effectiveness of FedEgo. △ Less

Submitted 9 September, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 25 pages, submitted to ACM Transactions on Knowledge Discovery from Data (TKDD)

arXiv:2208.13306 [pdf, other]

doi 10.1098/rspa.2022.0567

Eco-Evolutionary Dynamics of Bimatrix Games

Authors: Longmei Shu, Feng Fu

Abstract: Feedbacks between strategies and the environment are common in social-ecological, evolutionary-ecological, and even psychological-economic systems. Utilizing common resources is always a dilemma for community members, like tragedy of the commons. Here we consider replicator dynamics with feedback-evolving games, where the payoffs switch between two different matrices. Although each payoff matrix o… ▽ More Feedbacks between strategies and the environment are common in social-ecological, evolutionary-ecological, and even psychological-economic systems. Utilizing common resources is always a dilemma for community members, like tragedy of the commons. Here we consider replicator dynamics with feedback-evolving games, where the payoffs switch between two different matrices. Although each payoff matrix on its own represents an environment where cooperators and defectors can't coexist stably, we show that it's possible to design appropriate switching control laws and achieve persistent oscillations of strategy abundance. This result should help guide the widespread problem of population state control in microbial experiments and other social problems with eco-evolutionary feedback loops. △ Less

Submitted 28 August, 2022; originally announced August 2022.

arXiv:2207.00702 [pdf, ps, other]

doi 10.1103/PhysRevB.106.014409

Spin excitations in the quantum dipolar magnet Yb(BaBO$_3$)$_3$

Authors: C. Y. Jiang, Y. X. Yang, Y. X. Gao, Z. T. Wan, Z. H. Zhu, T. Shiroka, C. S. Chen, Q. Wu, X. Li, J. C. Jiao, K. W. Chen, Y. Bao, Z. M. Tian, L. Shu

Abstract: We report results of magnetization, specific-heat and muon-spin relaxation measurements on single crystals of disorder-free Yb$^{3+}$ triangular lattice Yb(BaBO$_3$)$_3$. The magnetization experiments show anisotropic magnetic properties with Curie-Weiss temperatures $θ_{\perp}=-1.40$~K ($H \perp c$) and $θ_{\parallel}=-1.16$~K ($H \parallel c$) determined from low temperature data. The absence of… ▽ More We report results of magnetization, specific-heat and muon-spin relaxation measurements on single crystals of disorder-free Yb$^{3+}$ triangular lattice Yb(BaBO$_3$)$_3$. The magnetization experiments show anisotropic magnetic properties with Curie-Weiss temperatures $θ_{\perp}=-1.40$~K ($H \perp c$) and $θ_{\parallel}=-1.16$~K ($H \parallel c$) determined from low temperature data. The absence of both long-range antiferromagnetic order and spin freezing is confirmed down to 0.27 K at zero field. A two-level Schottky anomaly due to the opening of the ground-state Kramers doublet is observed from the low-temperature specific-heat measurements when the applied magnetic fields $μ_0H >0.7$~T. At zero field, the increase of both $C_{\rm mag}/T$ and the muon spin relaxation rate $λ$ below 1~K is due to the electronic spin excitations, which often exist in quantum magnets where dipole-dipole interaction creates an anisotropy of magnetic properties. The spin excitation is also supported by the unusual maximum of field dependence of $λ$ due to the field-induced increase of the density of excitations. We argue that dipolar interaction is dominant and induces the spin dynamics in the quantum magnet Yb(BaBO$_3$)$_3$. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: accepted by Phys. Rev. B

arXiv:2203.13238 [pdf, other]

Open-set Recognition via Augmentation-based Similarity Learning

Authors: Sepideh Esmaeilpour, Lei Shu, Bing Liu

Abstract: The primary assumption of conventional supervised learning or classification is that the test samples are drawn from the same distribution as the training samples, which is called closed set learning or classification. In many practical scenarios, this is not the case because there are unknowns or unseen class samples in the test data, which is called the open set scenario, and the unknowns need t… ▽ More The primary assumption of conventional supervised learning or classification is that the test samples are drawn from the same distribution as the training samples, which is called closed set learning or classification. In many practical scenarios, this is not the case because there are unknowns or unseen class samples in the test data, which is called the open set scenario, and the unknowns need to be detected. This problem is referred to as the open set recognition problem and is important in safety-critical applications. We propose to detect unknowns (or unseen class samples) through learning pairwise similarities. The proposed method works in two steps. It first learns a closed set classifier using the seen classes that have appeared in training and then learns how to compare seen classes with pseudo-unseen (automatically generated unseen class samples). The pseudo-unseen generation is carried out by performing distribution shifting augmentations on the seen or training samples. We call our method OPG (Open set recognition based on Pseudo unseen data Generation). The experimental evaluation shows that the learned similarity-based features can successfully distinguish seen from unseen in benchmark datasets for open set recognition. △ Less

Submitted 21 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.12839 [pdf]

doi 10.1073/pnas.2216367120

Probing FeSi, a d-electron topological Kondo insulator candidate, with magnetic field, pressure, and microwaves

Authors: Alexander Breindel, Yuhang Deng, Camilla M. Moir, Yuankan Fang, Sheng Ran, Hongbo Lou, Shubin Li, Qiaoshi Zeng, Lei Shu, Christian T. Wolowiec, Ivan K. Schuller, Priscila F. S. Rosa, Zachary Fisk, John Singleton, M. Brian Maple

Abstract: Recently, evidence for a conducting surface state below 19 K was reported for the correlated d-electron small gap semiconductor FeSi. In the work reported herein, the conducting surface state and the bulk phase of FeSi were probed via electrical resistivity measurements as a function of temperature T, magnetic field B to 60 T and pressure P to 7.6 GPa, and by means of a magnetic field modulated mi… ▽ More Recently, evidence for a conducting surface state below 19 K was reported for the correlated d-electron small gap semiconductor FeSi. In the work reported herein, the conducting surface state and the bulk phase of FeSi were probed via electrical resistivity measurements as a function of temperature T, magnetic field B to 60 T and pressure P to 7.6 GPa, and by means of a magnetic field modulated microwave spectroscopy (MFMMS) technique. The properties of FeSi were also compared to those of the Kondo insulator SmB6 to address the question of whether FeSi is a d-electron analogue of an f-electron Kondo insulator and, in addition, a topological Kondo insulator. The overall behavior of the magnetoresistance MR of FeSi at temperatures above and below the onset temperature (T_S) 19 K of the conducting surface state is similar to that of SmB6. The two energy gaps, inferred from the resistivity data in the semiconducting regime, increase with pressure up to about 7 GPa, followed by a drop which coincides with a sharp suppression of T_S. This behavior is similar to that reported for SmB6, except that the two energy gaps in SmB6 decrease with pressure before drop** abruptly at T_S. The MFMMS measurements showed a sharp feature at T_S (19 K) for FeSi, but no such feature was observed at T_S 4.5 K for SmB6. The absence of a feature at T_S for SmB6 may be due to experimental issues and will be the subject of a future investigation. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2202.02976 [pdf, other]

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

Abstract: Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled… ▽ More Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured prediction. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches. △ Less

Submitted 8 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: NeurIPS2022

arXiv:2202.01924 [pdf, other]

Zero-Shot Aspect-Based Sentiment Analysis

Authors: Lei Shu, Hu Xu, Bing Liu, Jiahua Chen

Abstract: Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Infere… ▽ More Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Inference (CORN). Later ABSA tasks can be cast into NLI for zero-shot transfer. We evaluate CORN on ABSA tasks, ranging from aspect extraction (AE), aspect sentiment classification (ASC), to end-to-end aspect-based sentiment analysis (E2E ABSA), which show ABSA can be conducted without any human annotated ABSA data. △ Less

Submitted 14 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2201.12978 [pdf, other]

Muon Spin Relaxation Study of Spin Dynamics in Quantum Spin Liquid Candidate H$_3$LiIr$_2$O$_6$

Authors: Yan-Xing Yang, Liang-Long Huang, Zi-Hao Zhu, Chang-Sheng Chen, Qiong Wu, Zhao-Feng Ding, Cheng Tan, Pabi K. Biswas, Adrian D. Hillier, You-Guo Shi, Da-Peng Yu, Cai Liu, Le Wang, Fei Ye, Jia-Wei Mei, Lei Shu

Abstract: We present detail thermodynamic and muon spin relaxation ($μ$SR) studies of quantum spin liquid (QSL) candidate H$_3$LiIr$_2$O$_6$. In agreement with the low temperature thermodynamic evidence (\textit{e.g.} bulk magnetization and heat capacity) for the absence of magnetic transition, zero-field (ZF)-$μ$SR measurements indicate the absence of static magnetic ordering or spin freezing down to our l… ▽ More We present detail thermodynamic and muon spin relaxation ($μ$SR) studies of quantum spin liquid (QSL) candidate H$_3$LiIr$_2$O$_6$. In agreement with the low temperature thermodynamic evidence (\textit{e.g.} bulk magnetization and heat capacity) for the absence of magnetic transition, zero-field (ZF)-$μ$SR measurements indicate the absence of static magnetic ordering or spin freezing down to our lowest temperature of 80~mK. Both ZF- and longitudinal-field (LF)-$μ$SR measurements reveal persistent spin fluctuations at low temperatures. These results provide well-established evidence of a QSL state in H$_3$LiIr$_2$O$_6$. Furthermore, the observation of the time-field scaling behavior of $μ$SR spectra $A(t)\sim A(t/H^{0.46})$, and the low temperature power-law specific heat coefficient $C/T \sim T^{-0.57}$, indicate the finite density of state in the form of $N(E) \sim E^{-0.5}$, in a good agreement with the disorder-induced states in the Kitaev spin liquid. △ Less

Submitted 30 January, 2022; originally announced January 2022.

arXiv:2201.09435 [pdf, other]

doi 10.1103/PhysRevB.105.174418

Three-dimensional Sandglass Magnet with Non-Kramers ions

Authors: Yan-Xing Yang, Yao Wang, Zhao-Feng Ding, Adrian D. Hillier, Lei Shu

Abstract: Magnetic susceptibility, specific heat, and muon spin relaxation ($μ$SR) measurements have been performed on a newly synthesized three-dimensional sandglass-type lattice Tm$_3$SbO$_7$, where two inequivalent sets of non-Kramers Tm$^{3+}$ ions (Tm$^{3+}_1$ and Tm$^{3+}_2)$ show crystal electrical field effect at different temperature ranges. The existence of an ordered or a glassy state down to 0.1… ▽ More Magnetic susceptibility, specific heat, and muon spin relaxation ($μ$SR) measurements have been performed on a newly synthesized three-dimensional sandglass-type lattice Tm$_3$SbO$_7$, where two inequivalent sets of non-Kramers Tm$^{3+}$ ions (Tm$^{3+}_1$ and Tm$^{3+}_2)$ show crystal electrical field effect at different temperature ranges. The existence of an ordered or a glassy state down to 0.1~K in zero field is excluded. The low-energy properties of Tm$_3$SbO$_7$ are dominated by the lowest non-Kramers quasi-doublet of $\rm Tm^{3+}_1$, and the energy splitting is regarded as an intrinsic transverse field. Therefore, the low-temperature paramagnetic phenomenon in Tm$_3$SbO$_7$ is explained by a transverse field Ising model, which is supported by the quantitative simulation of specific heat data. In addition, the perturbation from Tm$^{3+}_2$ may play an important role in accounting for the low temperature spin dynamics behavior observed by $μ$SR. △ Less

Submitted 3 May, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

arXiv:2112.11646 [pdf, other]

doi 10.1038/s41534-022-00598-0

Contextuality in infinite one-dimensional translation-invariant local Hamiltonians: strengths and limits

Authors: Kaiyan Yang, Xiao Zeng, Yu**g Luo, Guowu Yang, Lan Shu, Miguel Navascués, Zizhu Wang

Abstract: In recent years there has been a growing interest in treating many-body systems as Bell scenarios, where lattice sites play the role of distant parties and only near-neighbor statistics are accessible. We investigate contextuality arising from three Bell scenarios in infinite, translation-invariant 1D models: nearest-neighbor with two dichotomic observables per site; nearest- and next-to-nearest n… ▽ More In recent years there has been a growing interest in treating many-body systems as Bell scenarios, where lattice sites play the role of distant parties and only near-neighbor statistics are accessible. We investigate contextuality arising from three Bell scenarios in infinite, translation-invariant 1D models: nearest-neighbor with two dichotomic observables per site; nearest- and next-to-nearest neighbor with two dichotomic observables per site and nearest-neighbor with three dichotomic observables per site. For the first scenario, we give strong evidence that it cannot exhibit contextuality, not even in non-signaling physical theories beyond quantum mechanics. For the second one, we identify several low-dimensional models that reach the ultimate quantum limits, paving the way for self-testing ground states of quantum many-body systems. For the last scenario, which generalizes the Heisenberg model, we give strong evidence that, in order to exhibit contextuality, the dimension of the local quantum system must be at least 3. △ Less

Submitted 21 July, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Comments: 16 pages, updated version with new results

Journal ref: npj Quantum Information 8, 89 (2022)

arXiv:2112.10021 [pdf, other]

Continual Learning with Knowledge Transfer for Sentiment Classification

Authors: Zixuan Ke, Bing Liu, Hao Wang, Lei Shu

Abstract: This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks… ▽ More This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks to the new task to help it learn a better model for the new task? And, can old models for previous tasks be improved in the process as well? This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of KAN is demonstrated through extensive experiments. △ Less

Submitted 18 December, 2021; originally announced December 2021.

Journal ref: ECML-PKDD 2020

arXiv:2112.06523 [pdf]

Fluctuating magnetic droplets immersed in a sea of quantum spin liquid

Authors: Z. H. Zhu, B. L. Pan, L. P. Nie, J. M. Ni, Y. X. Yang, C. S. Chen, Y. Y. Huang, E. J. Cheng, Y. J. Yu, A. D. Hillier, X. H. Chen, T. Wu, Y. Zhou, S. Y. Li, L. Shu

Abstract: The search of quantum spin liquid (QSL), an exotic magnetic state with strongly-fluctuating and highly-entangled spins down to zero temperature, is a main theme in current condensed matter physics. However, there is no smoking-gun evidence for deconfined spinons in any QSL candidate so far. The disorders and competing exchange interactions may prevent the formation of an ideal QSL state on frustra… ▽ More The search of quantum spin liquid (QSL), an exotic magnetic state with strongly-fluctuating and highly-entangled spins down to zero temperature, is a main theme in current condensed matter physics. However, there is no smoking-gun evidence for deconfined spinons in any QSL candidate so far. The disorders and competing exchange interactions may prevent the formation of an ideal QSL state on frustrated spin lattices. Here we report comprehensive and systematic measurements of the magnetic susceptibility, ultra-low temperature specific heat, muon spin relaxation (muSR), nuclear magnetic resonance (NMR), and thermal conductivity for NaYbSe2 single crystals, in which Yb3+ ions with effective spin-1/2 form a perfect triangular lattice. All these complementary techniques find no evidence of long-range magnetic order down to their respective base temperatures. Instead, specific heat, muSR and NMR measurements suggest the coexistence of quasi-static and dynamic spins in NaYbSe2. The scattering from these quasi-static spins may cause the absence of magnetic thermal conductivity. Thus, we propose a scenario of fluctuating ferrimagnetic droplets immersed in a sea of QSL. This may be quite common on the way pursuing an ideal QSL, and provides a brand-new platform to study how a QSL state survives impurities and coexists with other magnetically ordered states. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Journal ref: The Innovation 4, 100459 (2023)

arXiv:2112.06252 [pdf, ps, other]

The factorizations of $H^ρ(\mathbb{R}^n)$ via multilinear Calderón-Zygmund operators on weighted Lebesgue spaces

Authors: Dinghuai Wang, Rongxiang Zhu, Lisheng Shu

Abstract: We extend the recently much-studied Hardy factorization theorems to the weight case. The key point of this paper is to establish the factorization theorems without individual condition on the weight functions. As a direct application, we obtain the characterizations of $BMO(\mathbb{R}^n)$ space and Lipschitz spaces via the weighted boundedness of commutators of multilinear Calderón-Zygmund operato… ▽ More We extend the recently much-studied Hardy factorization theorems to the weight case. The key point of this paper is to establish the factorization theorems without individual condition on the weight functions. As a direct application, we obtain the characterizations of $BMO(\mathbb{R}^n)$ space and Lipschitz spaces via the weighted boundedness of commutators of multilinear Calderón-Zygmund operators with the genuinely multilinear weights. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: 22 pages

arXiv:2112.05970 [pdf, other]

doi 10.1088/1367-2630/ac48ea

Muon spin rotation and relaxation study on topological noncentrosymmetric superconductor PbTaSe$_2$

Authors: Z. H. Zhu, C. Tan, J. Zhang, P. K. Biswas, A. D. Hillier, M. X. Wang, Y. X. Yang, C. S. Chen, Z. F. Ding, S. Y. Li, L. Shu

Abstract: Topological superconductivity is an exotic phenomenon due to the symmetry-protected topological surface state, in which a quantum system has an energy gap in the bulk but supports gapless excitations confined to its boundary. Symmetries including central and time-reversal (TRS), along with their relations with topology, are crucial for topological superconductivity. We report muon spin relaxation/… ▽ More Topological superconductivity is an exotic phenomenon due to the symmetry-protected topological surface state, in which a quantum system has an energy gap in the bulk but supports gapless excitations confined to its boundary. Symmetries including central and time-reversal (TRS), along with their relations with topology, are crucial for topological superconductivity. We report muon spin relaxation/rotation ($μ$SR) experiments on a topological noncentrosymmetric superconductor PbTaSe$_2$ to study its TRS and gap symmetry. Zero-field $μ$SR experiments indicate the absence of internal magnetic field in the superconducting state, consistent with previous $μ$SR results. Furthermore, transverse-field $μ$SR measurements reveals that the superconducting gap of PbTaSe$_2$ is an isotropic three-dimensional fully-gapped single-band. The fully-gapped results can help understand the pairing mechanism and further classify the topological superconductivity in this system. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2112.02714 [pdf, other]

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Authors: Zixuan Ke, Bing Liu, Hu Xu, Lei Shu

Abstract: This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not b… ▽ More This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Journal ref: EMNLP 2021

arXiv:2112.02706 [pdf, other]

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Authors: Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu, Lei Shu

Abstract: Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and K… ▽ More Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR △ Less

Submitted 5 December, 2021; originally announced December 2021.

Journal ref: NeurIPS 2021

arXiv:2111.04198 [pdf, other]

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Tian Lan, Lei Shu, Ehsan Shareghi, Nigel Collier

Abstract: Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative s… ▽ More Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach. △ Less

Submitted 28 April, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: Camera-ready for NAACL 2022

arXiv:2111.02724 [pdf]

Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Authors: Chao Qi, Junfeng Gao, Simon Pearson, Helen Harman, Kunjie Chen, Lei Shu

Abstract: Tea chrysanthemum detection at its flowering stage is one of the key components for selective chrysanthemum harvesting robot development. However, it is a challenge to detect flowering chrysanthemums under unstructured field environments given the variations on illumination, occlusion and object scale. In this context, we propose a highly fused and lightweight deep learning architecture based on Y… ▽ More Tea chrysanthemum detection at its flowering stage is one of the key components for selective chrysanthemum harvesting robot development. However, it is a challenge to detect flowering chrysanthemums under unstructured field environments given the variations on illumination, occlusion and object scale. In this context, we propose a highly fused and lightweight deep learning architecture based on YOLO for tea chrysanthemum detection (TC-YOLO). First, in the backbone component and neck component, the method uses the Cross-Stage Partially Dense Network (CSPDenseNet) as the main network, and embeds custom feature fusion modules to guide the gradient flow. In the final head component, the method combines the recursive feature pyramid (RFP) multiscale fusion reflow structure and the Atrous Spatial Pyramid Pool (ASPP) module with cavity convolution to achieve the detection task. The resulting model was tested on 300 field images, showing that under the NVIDIA Tesla P100 GPU environment, if the inference speed is 47.23 FPS for each image (416 * 416), TC-YOLO can achieve the average precision (AP) of 92.49% on our own tea chrysanthemum dataset. In addition, this method (13.6M) can be deployed on a single mobile GPU, and it could be further developed as a perception system for a selective chrysanthemum harvesting robot in the future. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.00964 [pdf, ps, other]

New function classes of Morrey-Campanato type and their applications

Authors: Dinghuai Wang, Lisheng Shu

Abstract: The aim of this paper is to introduce and investigative some new function classes of Morrey-Campanato type. Let $0<p<\infty$ and $0\leq λ<n+p$. We say that $f\in \mathcal{\bar{L}}^{p,λ}(Ω)$ if $$\sup_{x_{0}\in Ω,ρ>0}ρ^{-λ}\int_{Ω(x_{0},ρ)}\big|f(x)-|f|_{Ω(x_{0},ρ)}\big|^pdx<\infty,$$ where $Ω(x_{0},ρ)=Q(x_{0},ρ)\cap Ω$ and $Q(x,ρ)$ is denote the cube of $\mathbb{R}^n$. Some basic properties and ch… ▽ More The aim of this paper is to introduce and investigative some new function classes of Morrey-Campanato type. Let $0<p<\infty$ and $0\leq λ<n+p$. We say that $f\in \mathcal{\bar{L}}^{p,λ}(Ω)$ if $$\sup_{x_{0}\in Ω,ρ>0}ρ^{-λ}\int_{Ω(x_{0},ρ)}\big|f(x)-|f|_{Ω(x_{0},ρ)}\big|^pdx<\infty,$$ where $Ω(x_{0},ρ)=Q(x_{0},ρ)\cap Ω$ and $Q(x,ρ)$ is denote the cube of $\mathbb{R}^n$. Some basic properties and characterizations of these classes are presented. If $0\leq λ<n$, the space is equivalent to related Morrey space. If $λ=n$, then $f \in \mathcal{\bar{L}}^{p,n}(Ω)$ if and only if $f\in BMO(Ω)$ with $f^{-}\in L^{\infty}(Ω)$, where $f^{-}=-\min\{0,f\}$. If $n<λ\leq n+p$, the $\mathcal{\bar{L}}^{p,λ}(Ω)$ functions establish an integral characterization of the nonnegative Hölder continue functions. As applications, this paper gives unified criterions on the necessity of bounded commutators of maximal functions. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Comments: 29 pages

arXiv:2109.14739 [pdf, other]

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, Yi Zhang

Abstract: Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In add… ▽ More Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators. △ Less

Submitted 1 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: Camera-ready for ACL2022 main conference

arXiv:2109.02748 [pdf, other]

doi 10.1609/aaai.v36i6.20610

Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP

Authors: Sepideh Esmaeilpour, Bing Liu, Eric Robertson, Lei Shu

Abstract: In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper stu… ▽ More In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin. △ Less

Submitted 22 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

arXiv:2102.09271 [pdf, other]

Intrinsic new properties of a quantum spin liquid

Authors: Yanxing Yang, Xin Li, Cheng Tan, Zihao Zhu, Jian Zhang, Zhaofeng Ding, Qiong Wu, Changshen Chen, Toni Shiroka, Yuanhua Xia, Douglas E. MacLaughlin, Chandra M. Varma, Lei Shu

Abstract: Quantum fluctuations are expected to lead to highly entangled spin-liquid states in certain two-dimensional spin-1/2 compounds. We have synthesized and measured thermodynamic properties and muon spin relaxation rates in the copper-based two-dimensional triangular-lattice spin liquids Lu$_3$Cu$_2$Sb$_3$O$_{14}$ and Lu$_3$CuZnSb$_3$O$_{14}$. The former is the least disordered of this kind discovered… ▽ More Quantum fluctuations are expected to lead to highly entangled spin-liquid states in certain two-dimensional spin-1/2 compounds. We have synthesized and measured thermodynamic properties and muon spin relaxation rates in the copper-based two-dimensional triangular-lattice spin liquids Lu$_3$Cu$_2$Sb$_3$O$_{14}$ and Lu$_3$CuZnSb$_3$O$_{14}$. The former is the least disordered of this kind discovered to date. Magnetic entropy generation at high temperatures has been ruled out after carefully correcting for the lattice specific heat. Surprisingly, roughly half of the magnetic entropy is missing down to temperatures of O(10$^{-3}$) the exchange energy, independent of magnetic field up to $gμ_B H \gtrsim k_BΘ_W$, where $Θ_W$ is the Weiss temperature. The magnetic specific heat divided by temperature $C_M(T)/T$ and muon spin relaxation rate $λ(T)$ are both temperature-independent at low temperatures, followed by logarithmic decreases with increasing temperature. This behavior can be simply characterized by scale-invariant time-dependent fluctuations with a single parameter. Since no cooperative effects due to impurities are observed, the measured properties are intrinsic. They are evidence that in Lu$_3$Cu$_2$Sb$_3$O$_{14}$ massive quantum fluctuations lead to either a gigantic specific heat peak from singlet excitations at very low temperatures or, perhaps less likely, an extensively degenerate possibly topological singlet ground state. △ Less

Submitted 21 July, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

arXiv:2011.00169 [pdf, other]

Understanding Pre-trained BERT for Aspect-based Sentiment Analysis

Authors: Hu Xu, Lei Shu, Philip S. Yu, Bing Liu

Abstract: This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important fe… ▽ More This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: COLING 2020

arXiv:2009.12046 [pdf, other]

Controllable Text Generation with Focused Variation

Authors: Lei Shu, Alexandros Papangelis, Yi-Chia Wang, Gokhan Tur, Hu Xu, Zhaleh Feizollahi, Bing Liu, Piero Molino

Abstract: This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebo… ▽ More This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Showing 1–50 of 156 results for author: Shu, L