Search | arXiv e-print repository

A Note on Generalized Repunit Numerical Semigroups

Authors: Feihu Liu, Guoce Xin, Suting Ye, **g**g Yin

Abstract: Let $A=(a_1, a_2, ..., a_n)$ be relative prime positive integers with $a_i\geq 2$. The Frobenius number $F(A)$ is the largest integer not belonging to the numerical semigroup $\langle A\rangle$ generated by $A$. The genus $g(A)$ is the number of positive integer elements that are not in $\langle A\rangle$. The Frobenius problem is to find $F(A)$ and $g(A)$ for a given sequence $A$. In this note, w… ▽ More Let $A=(a_1, a_2, ..., a_n)$ be relative prime positive integers with $a_i\geq 2$. The Frobenius number $F(A)$ is the largest integer not belonging to the numerical semigroup $\langle A\rangle$ generated by $A$. The genus $g(A)$ is the number of positive integer elements that are not in $\langle A\rangle$. The Frobenius problem is to find $F(A)$ and $g(A)$ for a given sequence $A$. In this note, we study the Frobenius problem of $A=\left(a,ba+d,b^2a+\frac{b^2-1}{b-1}d,...,b^ka+\frac{b^k-1}{b-1}d\right)$ and obtain formulas for $F(A)$ and $g(A)$ when $a\geq k-1$. Our formulas simplifies further for some special cases, such as repunit, Mersenne and Thabit numerical semigroups. The idea is similar to that in [\cite{LiuXin23},arXiv:2306.03459]. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.03459

arXiv:2306.09616 [pdf, other]

Probing the Anomalous Hall Transport and Magnetic Reversal of Chiral-Lattice Antiferromagnet Co$_{1/3}$NbS$_2$

Authors: **fan Gu, Yuxuan Peng, Shiqi Yang, Huan Wang, Shenyong Ye, Hanwen Wang, Yan** Li, Tianlong Xia, **bo Yang, Yu Ye

Abstract: Antiferromagnets exhibiting giant anomalous Hall effect (AHE) and anomalous Nernst effect (ANE) have recently aroused broad interest, not only for their potential applications in future electronic devices, but also because of the rich physics arising from the Berry curvature near the Fermi level. $\rm{Co_{1/3}NbS_2}$, by intercalating $\rm{Co^{2+}}$ ions between $\rm{NbS_2}$ layers, is a quasi-two… ▽ More Antiferromagnets exhibiting giant anomalous Hall effect (AHE) and anomalous Nernst effect (ANE) have recently aroused broad interest, not only for their potential applications in future electronic devices, but also because of the rich physics arising from the Berry curvature near the Fermi level. $\rm{Co_{1/3}NbS_2}$, by intercalating $\rm{Co^{2+}}$ ions between $\rm{NbS_2}$ layers, is a quasi-two-dimensional layered antiferromagnet with a chiral lattice. A large AHE has been observed in $\rm{Co_{1/3}NbS_2}$, but its origin is under debate. In this letter, we report the large AHE and ANE in exfoliated $\rm{Co_{1/3}NbS_2}$ flakes. By analyzing the thermoelectric data via the Mott relation, we determined that the observed large AHE and ANE primarily result from the intrinsic Berry curvature. We also observed the magnetic domains in $\rm{Co_{1/3}NbS_2}$ by reflective magnetic circular dichroism measurements. Combined with electrical transport measurements, we confirmed that the magnetic reversal in $\rm{Co_{1/3}NbS_2}$ is determined by domain wall motion, and the critical field ($H_c$) exhibits a memory effect of consecutive magnetic sweeps. Our work provides insight into the topological properties of $\rm{Co_{1/3}NbS_2}$ and paves the way to studying the spin configuration and magnetic domain dynamics in this fascinating antiferromagnet. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 8 pages, 4 figures

arXiv:2306.05927 [pdf]

doi 10.1038/s41467-024-49325-7

Emergent normal fluid in the superconducting ground state of overdoped cuprates

Authors: Shusen Ye, Miao Xu, Hongtao Yan, Zi-Xiang Li, Changwei Zou, Xintong Li, Yiwen Chen, Xingjiang Zhou, Dung-Hai Lee, Yayu Wang

Abstract: The microscopic mechanism for the disappearance of superconductivity in overdoped cuprates is still under heated debate. Here we use scanning tunneling spectroscopy to investigate the evolution of quasiparticle interference phenomenon in $\rm Bi_2Sr_2CuO_{6+δ}$ over a wide range of hole densities. We find that when the system enters the overdoped regime, a peculiar quasiparticle interference wavev… ▽ More The microscopic mechanism for the disappearance of superconductivity in overdoped cuprates is still under heated debate. Here we use scanning tunneling spectroscopy to investigate the evolution of quasiparticle interference phenomenon in $\rm Bi_2Sr_2CuO_{6+δ}$ over a wide range of hole densities. We find that when the system enters the overdoped regime, a peculiar quasiparticle interference wavevector with quarter-circle pattern starts to emerge even at zero bias, and its intensity grows with increasing do** level. Its energy dispersion is incompatible with the octet model for d-wave superconductivity, but is highly consistent with the scattering interference of gapless normal carriers. The weight of the gapless quasiparticle interference is mainly located at the antinodes and is independent of temperature. We propose that the normal fluid emerges from the pair-breaking scattering between flat antinodal bands in the quantum ground state, which is the primary cause for the reduction of superfluid density and suppression of superconductivity in overdoped cuprates. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Journal ref: Nature Communications 15, 4939 (2024)

arXiv:2306.05926 [pdf]

doi 10.1038/s41567-023-02100-9

The emergence of global phase coherence from local pairing in underdoped cuprates

Authors: Shusen Ye, Changwei Zou, Hongtao Yan, Yu Ji, Miao Xu, Zehao Dong, Yiwen Chen, Xingjiang Zhou, Yayu Wang

Abstract: In conventional metal superconductors such as aluminum, the large number of weakly bounded Cooper pairs become phase coherent as soon as they start to form. The cuprate high critical temperature ($T_c$) superconductors, in contrast, belong to a distinctively different category. To account for the high $T_c$, the attractive pairing interaction is expected to be strong and the coherence length is sh… ▽ More In conventional metal superconductors such as aluminum, the large number of weakly bounded Cooper pairs become phase coherent as soon as they start to form. The cuprate high critical temperature ($T_c$) superconductors, in contrast, belong to a distinctively different category. To account for the high $T_c$, the attractive pairing interaction is expected to be strong and the coherence length is short. Being doped Mott insulators, the cuprates are known to have low superfluid density, thus are susceptible to phase fluctuations. It has been proposed that pairing and phase coherence may occur separately in cuprates, and $T_c$ corresponds to the phase coherence temperature controlled by the superfluid density. To elucidate the microscopic processes of pairing and phase ordering in cuprates, here we use scanning tunneling microscopy to image the evolution of electronic states in underdoped $\rm Bi_2La_xSr_{2-x}CuO_{6+δ}$. Even in the insulating sample, we observe a smooth crossover from the Mott insulator to superconductor-type spectra on small islands with chequerboard order and emerging quasiparticle interference patterns following the octet model. Each chequerboard plaquette contains approximately two holes, and exhibits a stripy internal structure that has strong influence on the superconducting features. Across the insulator to superconductor boundary, the local spectra remain qualitatively the same while the quasiparticle interferences become long-ranged. These results suggest that the chequerboard plaquette with internal stripes plays a crucial role on local pairing in cuprates, and the global phase coherence is established once its spatial occupation exceeds a threshold. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.04893 [pdf, other]

Co** with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization

Authors: Shuo Ye, Shujian Yu, Wen** Hou, Yu Wang, Xinge You

Abstract: Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species. Previous studies always implicitly assume that the training and test data have the same underlying distributions, and that features extracted by modern backbone architectures remain discriminative and generalize well to unseen test data. However, we empirically justify that th… ▽ More Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species. Previous studies always implicitly assume that the training and test data have the same underlying distributions, and that features extracted by modern backbone architectures remain discriminative and generalize well to unseen test data. However, we empirically justify that these conditions are not always true on benchmark datasets. To this end, we combine the merits of invariant risk minimization (IRM) and information bottleneck (IB) principle to learn invariant and minimum sufficient (IMS) representations for FGVC, such that the overall model can always discover the most succinct and consistent fine-grained features. We apply the matrix-based R{é}nyi's $α$-order entropy to simplify and stabilize the training of IB; we also design a ``soft" environment partition scheme to make IRM applicable to FGVC task. To the best of our knowledge, we are the first to address the problem of FGVC from a generalization perspective and develop a new information-theoretic solution accordingly. Extensive experiments demonstrate the consistent performance gain offered by our IMS. △ Less

Submitted 9 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Manuscript accepted by CVIU, code is available at Github

arXiv:2306.03459 [pdf, other]

A Combinatorial Model of Numerical Semigroup

Authors: Feihu Liu, Guoce Xin, Suting Ye, **g**g Yin

Abstract: Let $A=(a_1, a_2, ..., a_n)$ be relative prime positive integers with $a_i\geq 2$. The Frobenius number $F(A)$ is the largest integer not belonging to the numerical semigroup $\langle A\rangle$ generated by $A$. The genus $g(A)$ is the number of positive integer elements that are not in $\langle A\rangle$. The Frobenius problem is to find $F(A)$ and $g(A)$ for a given sequence $A$. In this paper,… ▽ More Let $A=(a_1, a_2, ..., a_n)$ be relative prime positive integers with $a_i\geq 2$. The Frobenius number $F(A)$ is the largest integer not belonging to the numerical semigroup $\langle A\rangle$ generated by $A$. The genus $g(A)$ is the number of positive integer elements that are not in $\langle A\rangle$. The Frobenius problem is to find $F(A)$ and $g(A)$ for a given sequence $A$. In this paper, we study the Frobenius problem of $A=\left(a,h_1a+b_1d,h_2a+b_2d,...,h_ka+b_kd\right)$ with some restrictions. An innovation is that $d$ can be a negative integer. In particular, when $A=\left(a,ba+d,b^2a+\frac{b^2-1}{b-1}d,...,b^ka+\frac{b^k-1}{b-1}d\right)$, we obtain formulas for $F(A)$ and $g(A)$ when $a\geq k-1-\frac{d-1}{b-1}$. Our formulas simplifies further for some special cases, such as Mersenne, Thabit and repunit numerical semigroups. We obtain explicit closed formulas for generalized Mersenne, Thabit and repunit numerical semigroups and some more general numerical semigroups. Finally, we partially solve an open problem for the Porth numerical semigroup. △ Less

Submitted 7 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.02346 [pdf, other]

CDLT: A Dataset with Concept Drift and Long-Tailed Distribution for Fine-Grained Visual Categorization

Authors: Shuo Ye, Yufeng Shi, Ruxin Wang, Yu Wang, Jiamiao Xu, Chuanwu Yang, Xinge You

Abstract: Data is the foundation for the development of computer vision, and the establishment of datasets plays an important role in advancing the techniques of fine-grained visual categorization~(FGVC). In the existing FGVC datasets used in computer vision, it is generally assumed that each collected instance has fixed characteristics and the distribution of different categories is relatively balanced. In… ▽ More Data is the foundation for the development of computer vision, and the establishment of datasets plays an important role in advancing the techniques of fine-grained visual categorization~(FGVC). In the existing FGVC datasets used in computer vision, it is generally assumed that each collected instance has fixed characteristics and the distribution of different categories is relatively balanced. In contrast, the real world scenario reveals the fact that the characteristics of instances tend to vary with time and exhibit a long-tailed distribution. Hence, the collected datasets may mislead the optimization of the fine-grained classifiers, resulting in unpleasant performance in real applications. Starting from the real-world conditions and to promote the practical progress of fine-grained visual categorization, we present a Concept Drift and Long-Tailed Distribution dataset. Specifically, the dataset is collected by gathering 11195 images of 250 instances in different species for 47 consecutive months in their natural contexts. The collection process involves dozens of crowd workers for photographing and domain experts for labelling. Extensive baseline experiments using the state-of-the-art fine-grained classification models demonstrate the issues of concept drift and long-tailed distribution existed in the dataset, which require the attention of future researches. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.14877 [pdf, other]

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Authors: Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo

Abstract: Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt selection methods that aim to choose the optimal prompt among the candidates for a given task but have failed to provide a comprehensive and fair comparison between each other. In this paper, we propose a unified framework to interpret and evaluate the existing probabilit… ▽ More Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt selection methods that aim to choose the optimal prompt among the candidates for a given task but have failed to provide a comprehensive and fair comparison between each other. In this paper, we propose a unified framework to interpret and evaluate the existing probability-based prompt selection methods by performing extensive experiments on 13 common and diverse NLP tasks. We find that each of the existing methods can be interpreted as some variant of the method that maximizes mutual information between the input and the predicted output (MI). Utilizing this finding, we develop several other combinatorial variants of MI and increase the effectiveness of the oracle prompt selection method from 87.79% to 94.98%, measured as the ratio of the performance of the selected prompt to that of the optimal oracle prompt. Furthermore, considering that all the methods rely on the output probability distribution of the model that might be biased, we propose a novel calibration method called Calibration by Marginalization (CBM) that is orthogonal to the existing methods and helps increase the prompt selection effectiveness of the best method to 96.85%, achieving 99.44% of the oracle prompt F1 without calibration. △ Less

Submitted 8 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: TACL 2024 (Pre-MIT Press publication version)

arXiv:2305.14405 [pdf, other]

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Authors: Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Yiran Li, An Zou

Abstract: The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of… ▽ More The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations. This transformation seamlessly enables the execution of various DNN models using a single General-Purpose Matrix Multiplication (GEMM) accelerator. Extensive experimental results spanning different DNN models demonstrate that our approach preserves network accuracy while providing both generality and application-specific levels of computation efficiency. This allows a broad spectrum of DNN models to be executed using a single GEMM accelerator, eliminating the need for additional special function units. △ Less

Submitted 8 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 11 pages, 6figures, Submitted to 41st International Conference on Machine Learning

arXiv:2305.14045 [pdf, other]

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Authors: Seungone Kim, Se June Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, Minjoon Seo

Abstract: Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Colle… ▽ More Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Collection, which augments the existing Flan Collection (including only 9 CoT tasks) with additional 1.84 million rationales across 1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT Collection enables smaller LMs to have better CoT capabilities on unseen tasks. On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of +4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task accuracy. Furthermore, we show that instruction tuning with CoT Collection allows LMs to possess stronger few-shot learning capabilities on 4 domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and +2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until the max length by a +13.98% margin. Our code, the CoT Collection data, and model checkpoints are publicly available. △ Less

Submitted 14 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 (Main Conference)

arXiv:2305.05861 [pdf]

Template-based eukaryotic genome editing directed by SviCas3

Authors: Wang-Yu Tong, Yong Li, Shou-Dong Ye, An-**g Wang, Yan-Yan Tang, Mei-Li Li, Zhong-Fan Yu, Ting-Ting Xia, Qing-Yang Liu, Si-Qi Zhu

Abstract: RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA… ▽ More RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA is no less than that of SviCas3 guided by RNA. In particular, t-DNA, as a template and a guide, does not require a proto-spacer-adjacent motif, demonstrating that CRISPR, as the basis for crRNA design, is not required for the SviCas3-mediated gene and base editing. This discovery will broaden our understanding of enzyme diversity in CRISPR-Cas systems, will provide important tools for the creation and modification of living things and the treatment of human genetic diseases, and will usher in a new era of DNA-guided gene editing and base editing. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 113 pages, 12 figures and 4 tables

arXiv:2305.05131 [pdf, ps, other]

Solvable subgroup theorem, length function and topological entropy

Authors: Shengkui Ye

Abstract: We prove a general solvable subgroup theorem in terms of length functions. As applications, we obtain a solvable subgroup theorem in dynamical systems: any solvable group of finite Hirsch length acting on a smooth manifold with uniformly positive topological entropies must be virtually $\mathbb{Z}^n$. We prove a general solvable subgroup theorem in terms of length functions. As applications, we obtain a solvable subgroup theorem in dynamical systems: any solvable group of finite Hirsch length acting on a smooth manifold with uniformly positive topological entropies must be virtually $\mathbb{Z}^n$. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: no figures

MSC Class: 37C05; 20F65

arXiv:2304.09039 [pdf, other]

The Frobenius Formula for $A=(a,ha+d,ha+b_2d,...,ha+b_kd)$

Authors: Feihu Liu, Guoce Xin, Suting Ye, **g**g Yin

Abstract: Given relative prime positive integers $A=(a_1, a_2, ..., a_n)$, the Frobenius number $g(A)$ is the largest integer not representable as a linear combination of the $a_i$'s with nonnegative integer coefficients. We find the ``Stable" property introduced for the square sequence $A=(a,a+1,a+2^2,\dots, a+k^2)$ naturally extends for $A(a)=(a,ha+dB)=(a,ha+d,ha+b_2d,...,ha+b_kd)$. This gives a parallel… ▽ More Given relative prime positive integers $A=(a_1, a_2, ..., a_n)$, the Frobenius number $g(A)$ is the largest integer not representable as a linear combination of the $a_i$'s with nonnegative integer coefficients. We find the ``Stable" property introduced for the square sequence $A=(a,a+1,a+2^2,\dots, a+k^2)$ naturally extends for $A(a)=(a,ha+dB)=(a,ha+d,ha+b_2d,...,ha+b_kd)$. This gives a parallel characterization of $g(A(a))$ as a ``congruence class function" modulo $b_k$ when $a$ is large enough. For orderly sequence $B=(1,b_2,\dots,b_k)$, we find good bound for $a$. In particular we calculate $g(a,ha+dB)$ for $B=(1,2,b,b+1)$, $B=(1,2,b,b+1,2b)$, $B=(1,b,2b-1)$ and $B=(1,2,...,k,K)$. Our idea also applies to the case $B=(b_1,b_2,...,b_k)$, $b_1> 1$. △ Less

Submitted 26 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.02944 [pdf, other]

Design and performance testing of a T0 detector for the CSR External-target Experiment

Authors: D. Hu, X. Wang, M. Shao, Y. Zhou, S. Ye, L. Zhao, Y. Sun, J. Lu, H. Xu

Abstract: The Cooling Storage Ring (CSR) External-target Experiment (CEE) at the Heavy Ion Research Facility in Lanzhou (HIRFL), China, is the first multi-purpose nuclear physics experimental device to operate in the Giga electron-volt (GeV) energy range. The primary goals of the CEE are to study the bulk properties of dense matter and to understand the quantum chromo-dynamic (QCD) phase diagram by measurin… ▽ More The Cooling Storage Ring (CSR) External-target Experiment (CEE) at the Heavy Ion Research Facility in Lanzhou (HIRFL), China, is the first multi-purpose nuclear physics experimental device to operate in the Giga electron-volt (GeV) energy range. The primary goals of the CEE are to study the bulk properties of dense matter and to understand the quantum chromo-dynamic (QCD) phase diagram by measuring the charged particles produced in heavy-ion collisions in the target region with a large acceptance. The CEE is a spectrometer that focuses on charged final-state particle measurements running on the HIRFL-CSR. The time-of-flight (TOF) system is critical for identifying charged particles in the GeV energy region. In the CEE spectrometer, the TOF system consists of three parts: T0, internal TOF, and external TOF, which are used for the final-state particle identification. The T0 detector provides a high-precision start time for the TOF system by measuring the crossing time of the heavy ion beam. This study details the design, performance simulation, and performance testing of the T0 detector. The simulation results and heavy-ion beam test show that the T0 detector prototype has an excellent time resolution, which is better than 30 ps, and fulfills the requirements of the CEE. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: 20 pages,19 figures

arXiv:2304.02745 [pdf, other]

Analysis of Dynamic Voronoi Diagrams in the Hilbert Metric

Authors: Madeline Bumpus, Xufeng Caesar Dai, Auguste H. Gezalyan, Sam Munoz, Renita Santhoshkumar, Songyu Ye, David M. Mount

Abstract: The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons. The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons. △ Less

Submitted 1 July, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.11651 [pdf]

AlphaMat: A Material Informatics Hub Connecting Data, Features, Models and Applications

Authors: Zhilong Wang, Junfei Cai, An Chen, Yanqiang Han, Kehao Tao, Simin Ye, Shiwei Wang, Imran Ali, **** Li

Abstract: The development of modern civil industry, energy and information technology is inseparable from the rapid explorations of new materials, which are hampered by months to years of painstaking attempts, resulting in only a small fraction of materials being determined in a vast chemical space. Artificial intelligence (AI)-based methods are promising to address this gap, but face many challenges such a… ▽ More The development of modern civil industry, energy and information technology is inseparable from the rapid explorations of new materials, which are hampered by months to years of painstaking attempts, resulting in only a small fraction of materials being determined in a vast chemical space. Artificial intelligence (AI)-based methods are promising to address this gap, but face many challenges such as data scarcity and inaccurate material descriptor coding. Here, we develop an AI platform, AlphaMat, that connects materials and applications. AlphaMat is not limited by the data scale (from 101 to 106) and can design structural and component descriptors that are effective for docking with various AI models. With prediction time of milliseconds and high accuracy, AlphaMat exhibits strong powers to model at least 12 common attributes (formation energy, band gap, ionic conductivity, magnetism, phonon property, bulk modulus, dielectric constant, adsorption energy, etc.), resulting in an unexplored material database with over 117,000 entries. We further demonstrate the ability of AlphaMat to mine and design materials, which successfully discover thousands of new materials in photonics, batteries, catalysts, and capacitors from the largest inorganic compound databases that cover all elements in periodic table. This work proposes the first material informatics hub that does not require users to have strong programming knowledge to build AI models to design materials. Users can either directly retrieve our database or easily build AI models through AlphaMat to discover and design the required materials. AlphaMat can shorten the cycle of database construction and material discovery by at least decades, and its effective use will facilitate the applications of AI technology in material science and lead scientific and technological progress to a new height. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.07375 [pdf, other]

doi 10.3847/1538-4357/ace1eb

On the Tidal Capture of White Dwarfs by Intermediate-mass Black Holes in Dense Stellar Environments

Authors: Claire S. Ye, Giacomo Fragione, Rosalba Perna

Abstract: Intermediate-mass black holes (IMBHs) are the missing link between stellar-mass and supermassive black holes, widely believed to reside in at least some dense star clusters, but not yet observed directly. Tidal disruptions of white dwarfs (WDs) are luminous only for black holes less massive than $\sim 10^5\,M_{\odot}$, therefore providing a unique smoking gun that could finally prove the existence… ▽ More Intermediate-mass black holes (IMBHs) are the missing link between stellar-mass and supermassive black holes, widely believed to reside in at least some dense star clusters, but not yet observed directly. Tidal disruptions of white dwarfs (WDs) are luminous only for black holes less massive than $\sim 10^5\,M_{\odot}$, therefore providing a unique smoking gun that could finally prove the existence of IMBHs beyond any reasonable doubt. Here, we investigate the tidal captures of WDs by IMBHs in dense star clusters, and estimate a typical rate of $\sim 1\,{\rm Myr}^{-1}$ for galactic nuclei and $\sim 0.01\,{\rm Myr}^{-1}$ for globular clusters. Following the capture, the WD inspirals onto the IMBH producing gravitational waves detectable out to $\sim100$ Mpc by LISA for $\sim 10^4\,M_{\odot}$ IMBHs. The subsequent tidal strip**/disruption of the WD can also release bright X-ray and gamma-ray emission with luminosities of at least $\gtrsim10^{40}\,\rm{erg\,s^{-1}}$, detectable by \textit{Chandra}, \textit{Swift}, and upcoming telescopes, such as the \textit{Einstein Probe}. △ Less

Submitted 12 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: 13 pages, 8 figures. Published at ApJ

arXiv:2303.03645 [pdf, other]

Filter Pruning based on Information Capacity and Independence

Authors: Xiaolong Tang, Shuo Ye, Yufeng Shi, Tianheng Hu, Qinmu Peng, Xinge You

Abstract: Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Spec… ▽ More Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity, and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely-used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing FLOPs by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in accuracy of 2.64%. △ Less

Submitted 12 June, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS).The code will be available at https://github.com/txl-hub/ICI

arXiv:2303.03131 [pdf, other]

Video Question Answering Using CLIP-Guided Visual-Text Attention

Authors: Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, Xudong Jiang

Abstract: Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text featur… ▽ More Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text features using a BERT from the target application domain, and utilize CLIP to extract a pair of visual-text features from the general-knowledge domain through the domain-specific learning. We then propose a Cross-domain Learning to extract the attention information between visual and linguistic features across the target domain and general domain. The set of CLIP-guided visual-text features are integrated to predict the answer. The proposed method is evaluated on MSVD-QA and MSRVTT-QA datasets, and outperforms state-of-the-art methods. △ Less

Submitted 8 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Submitted to the 2023 IEEE International Conference on Image Processing (ICIP 2023)

ACM Class: I.2.10

arXiv:2303.03105 [pdf, other]

Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Authors: Weikai Kong, Shuhong Ye, Chenglin Yao, Jianfeng Ren

Abstract: Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the so… ▽ More Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the solver to decide if the collected information is sufficient to conclude an answer. We then propose a Confidence-based Event-centric Online Video Question Answering (CEO-VQA) model to solve this problem. Furthermore, a dataset called Answer Target in Background Stream (ATBS) is constructed to evaluate this newly developed online VideoQA application. Compared to the baseline VideoQA method that watches the whole video, the experimental results show that the proposed method achieves a significant performance gain. △ Less

Submitted 7 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted for publication at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2303.02455 [pdf, other]

DistilPose: Tokenized Pose Regression with Heatmap Distillation

Authors: Suhang Ye, Yingyi Zhang, Jie Hu, Liujuan Cao, Shengchuan Zhang, Lei Shen, Jun Wang, Shouhong Ding, Rongrong Ji

Abstract: In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. S… ▽ More In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. Specifically, DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling Encoder (TDE) and Simulated Heatmaps. TDE aligns the feature spaces of heatmap-based and regression-based models by introducing tokenization, while Simulated Heatmaps transfer explicit guidance (distribution and confidence) from teacher heatmaps into student models. Extensive experiments show that the proposed DistilPose can significantly improve the performance of the regression-based models while maintaining efficiency. Specifically, on the MSCOCO validation dataset, DistilPose-S obtains 71.6% mAP with 5.36M parameter, 2.38 GFLOPs and 40.2 FPS, which saves 12.95x, 7.16x computational cost and is 4.9x faster than its teacher model with only 0.9 points performance drop. Furthermore, DistilPose-L obtains 74.4% mAP on MSCOCO validation dataset, achieving a new state-of-the-art among predominant regression-based models. △ Less

Submitted 16 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: accepted by CVPR2023

arXiv:2302.14691 [pdf, other]

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Authors: Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

Abstract: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe tha… ▽ More In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists language models to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP. △ Less

Submitted 24 December, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: AAAI 2024

arXiv:2302.03202 [pdf, other]

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

Authors: Joel Jang, Seungone Kim, Seonghyeon Ye, Doyoung Kim, Lajanugen Logeswaran, Moontae Lee, Kyungjae Lee, Minjoon Seo

Abstract: Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training tasks is the key component in making stronger MT LMs. In this work, we report an unexpected finding that an expert LM fine-tuned on just a single task can outperfo… ▽ More Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training tasks is the key component in making stronger MT LMs. In this work, we report an unexpected finding that an expert LM fine-tuned on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by a mean accuracy of 3.20% and 1.29%, respectively. This finding casts doubt on the previously held belief that simply scaling the number of tasks makes stronger MT LMs. Leveraging this finding, we further show that this distributed approach of training a separate expert LM per training task instead of a single MT LM for zero-shot inference possesses many benefits including (1) avoiding negative task transfer that often occurs during instruction tuning, (2) being able to continually learn new tasks without having to re-train on previous tasks to avoid catastrophic forgetting, and (3) showing compositional capabilities when merging individual experts together. The code is available at https://github.com/joeljang/ELM. △ Less

Submitted 8 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.08666 [pdf, ps, other]

Haves and Have-Nots: A Theory of Economic Sufficientarianism

Authors: Christopher P. Chambers, Siming Ye

Abstract: We introduce a generalization of the concept of sufficientarianism, intended to rank allocations involving multiple consumption goods. In ranking allocations of goods for a fixed society of agents, sufficientarianism posits that allocations are compared according to the number of individuals whose consumption is deemed sufficient. We base our analysis on a novel ethical concept, which we term suff… ▽ More We introduce a generalization of the concept of sufficientarianism, intended to rank allocations involving multiple consumption goods. In ranking allocations of goods for a fixed society of agents, sufficientarianism posits that allocations are compared according to the number of individuals whose consumption is deemed sufficient. We base our analysis on a novel ethical concept, which we term sufficientarian judgment. Sufficientarian judgment asserts that if in starting from an allocation in which all agents have identical consumption, a change in one agent's consumption hurts society, then there is no change in any other agent's consumption which could subsequently benefit society. Sufficientarianism is shown to be equivalent to sufficientarian judgment, symmetry, and separability. We investigate our axioms in an abstract environment, and in specific economic environments. Finally, we argue formally that sufficientarian judgment is closely related to the leximin principle. △ Less

Submitted 18 September, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2301.01424 [pdf, other]

doi 10.1145/3550469.3555426

Scene Synthesis from Human Motion

Authors: Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu

Abstract: Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize d… ▽ More Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: 9 pages, 8 figures. Published in SIGGRAPH Asia 2022. Sifan Ye and Yixing Wang share equal contribution. Huazhe Xu and Jiajun Wu share equal contribution

arXiv:2212.10212 [pdf, other]

doi 10.1103/PhysRevLett.130.233602

Probing two driven double quantum dots strongly coupled to a cavity

Authors: Si-Si Gu, Sigmund Kohler, Yong-Qiang Xu, Rui Wu, Shun-Li Jiang, Shu-Kun Ye, Ting Lin, Bao-Chuan Wang, Hai-Ou Li, Gang Cao, Guo-** Guo

Abstract: We experimentally and theoretically study a driven hybrid circuit quantum electrodynamics (cQED) system beyond the dispersive coupling regime. Treating the cavity as part of the driven system, we develop a theory applicable to such strongly coupled and to multi-qubit systems. The fringes measured for a single driven double quantum dot (DQD)-cavity setting and the enlarged splittings of the hybrid… ▽ More We experimentally and theoretically study a driven hybrid circuit quantum electrodynamics (cQED) system beyond the dispersive coupling regime. Treating the cavity as part of the driven system, we develop a theory applicable to such strongly coupled and to multi-qubit systems. The fringes measured for a single driven double quantum dot (DQD)-cavity setting and the enlarged splittings of the hybrid Floquet states in the presence of a second DQD are well reproduced with our model. This opens a path to study Floquet states of multi-qubit systems with arbitrarily strong coupling and reveals a new perspective for understanding strongly driven hybrid systems. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 9 pages, 6 figures

Journal ref: Phys. Rev. Lett. 130, 233602 (2023)

arXiv:2212.09996 [pdf, other]

A marginalized three-part interrupted time series regression model for proportional data

Authors: Shangyuan Ye, Maricela Cruz, Yuchen Hu, Yun Yu

Abstract: Interrupted time series (ITS) is often used to evaluate the effectiveness of a health policy intervention that accounts for the temporal dependence of outcomes. When the outcome of interest is a percentage or percentile, the data can be highly skewed, bounded in $[0, 1]$, and have many zeros or ones. A three-part Beta regression model is commonly used to separate zeros, ones, and positive values e… ▽ More Interrupted time series (ITS) is often used to evaluate the effectiveness of a health policy intervention that accounts for the temporal dependence of outcomes. When the outcome of interest is a percentage or percentile, the data can be highly skewed, bounded in $[0, 1]$, and have many zeros or ones. A three-part Beta regression model is commonly used to separate zeros, ones, and positive values explicitly by three submodels. However, incorporating temporal dependence into the three-part Beta regression model is challenging. In this article, we propose a marginalized zero-one-inflated Beta time series model that captures the temporal dependence of outcomes through copula and allows investigators to examine covariate effects on the marginal mean. We investigate its practical performance using simulation studies and apply the model to a real ITS study. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.03242 [pdf, other]

Robust Point Cloud Segmentation with Noisy Annotations

Authors: Shuquan Ye, Dongdong Chen, Songfang Han, **g Liao

Abstract: Point cloud segmentation is a fundamental task in 3D. Despite recent progress on point cloud segmentation with the power of deep networks, current learning methods based on the clean label assumptions may fail with noisy labels. Yet, class labels are often mislabeled at both instance-level and boundary-level in real-world datasets. In this work, we take the lead in solving the instance-level label… ▽ More Point cloud segmentation is a fundamental task in 3D. Despite recent progress on point cloud segmentation with the power of deep networks, current learning methods based on the clean label assumptions may fail with noisy labels. Yet, class labels are often mislabeled at both instance-level and boundary-level in real-world datasets. In this work, we take the lead in solving the instance-level label noise by proposing a Point Noise-Adaptive Learning (PNAL) framework. Compared to noise-robust methods on image tasks, our framework is noise-rate blind, to cope with the spatially variant noise rate specific to point clouds. Specifically, we propose a point-wise confidence selection to obtain reliable labels from the historical predictions of each point. A cluster-wise label correction is proposed with a voting strategy to generate the best possible label by considering the neighbor correlations. To handle boundary-level label noise, we also propose a variant ``PNAL-boundary " with a progressive boundary label cleaning strategy. Extensive experiments demonstrate its effectiveness on both synthetic and real-world noisy datasets. Even with $60\%$ symmetric noise and high-level boundary noise, our framework significantly outperforms its baselines, and is comparable to the upper bound trained on completely clean data. Moreover, we cleaned the popular real-world dataset ScanNetV2 for rigorous experiment. Our code and data is available at https://github.com/pleaseconnectwifi/PNAL. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: To Appear at TPAMI 2022. arXiv admin note: substantial text overlap with arXiv:2107.14230

arXiv:2211.16504 [pdf, other]

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Authors: Shuquan Ye, Yujia Xie, Dongdong Chen, Yichong Xu, Lu Yuan, Chenguang Zhu, **g Liao

Abstract: This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models. Despite the great success, we observe that existing VL-models still lack commonsense knowledge/reasoning ability (e.g., "Lemons are sour"), which is a vital component towards artificial general intelligence. Through our analysis, we find one important reason is that existing large-s… ▽ More This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models. Despite the great success, we observe that existing VL-models still lack commonsense knowledge/reasoning ability (e.g., "Lemons are sour"), which is a vital component towards artificial general intelligence. Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective. Rather than collecting a new VL training dataset, we propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE). It can be viewed as one type of data augmentation technique, which can inject commonsense knowledge into existing VL datasets on the fly during training. More specifically, we leverage the commonsense knowledge graph (e.g., ConceptNet) and create variants of text description in VL datasets via bidirectional sub-graph sequentialization. For better commonsense evaluation, we further propose the first retrieval-based commonsense diagnostic benchmark. By conducting extensive experiments on some representative VL-models, we demonstrate that our DANCE technique is able to significantly improve the commonsense ability while maintaining the performance on vanilla retrieval tasks. The code and data are available at https://github.com/pleaseconnectwifi/DANCE △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Code: https://github.com/pleaseconnectwifi/DANCE Project page: shuquanye.com/DANCE_website

arXiv:2211.15653 [pdf, other]

doi 10.1007/s11214-023-00984-w

Energetic electron precipitation driven by electromagnetic ion cyclotron waves from ELFIN's low altitude perspective

Authors: V. Angelopoulos, X. -J. Zhang, A. V. Artemyev, D. Mourenas, E. Tsai, C. Wilkins, A. Runov, J. Liu, D. L. Turner, W. Li, K. Khurana, R. E. Wirz, V. A. Sergeev, X. Meng, J. Wu, M. D. Hartinger, T. Raita, Y. Shen, X. An, X. Shi, M. F. Bashir, X. Shen, L. Gan, M. Qin, L. Capannolo , et al. (61 additional authors not shown)

Abstract: We review comprehensive observations of electromagnetic ion cyclotron (EMIC) wave-driven energetic electron precipitation using data from the energetic electron detector on the Electron Losses and Fields InvestigatioN (ELFIN) mission, two polar-orbiting low-altitude spinning CubeSats, measuring 50-5000 keV electrons with good pitch-angle and energy resolution. EMIC wave-driven precipitation exhibi… ▽ More We review comprehensive observations of electromagnetic ion cyclotron (EMIC) wave-driven energetic electron precipitation using data from the energetic electron detector on the Electron Losses and Fields InvestigatioN (ELFIN) mission, two polar-orbiting low-altitude spinning CubeSats, measuring 50-5000 keV electrons with good pitch-angle and energy resolution. EMIC wave-driven precipitation exhibits a distinct signature in energy-spectrograms of the precipitating-to-trapped flux ratio: peaks at 0.5 MeV which are abrupt (bursty) with significant substructure (occasionally down to sub-second timescale). Multiple ELFIN passes over the same MLT sector allow us to study the spatial and temporal evolution of the EMIC wave - electron interaction region. Using two years of ELFIN data, we assemble a statistical database of 50 events of strong EMIC wave-driven precipitation. Most reside at L=5-7 at dusk, while a smaller subset exists at L=8-12 at post-midnight. The energies of the peak-precipitation ratio and of the half-peak precipitation ratio (our proxy for the minimum resonance energy) exhibit an L-shell dependence in good agreement with theoretical estimates based on prior statistical observations of EMIC wave power spectra. The precipitation ratio's spectral shape for the most intense events has an exponential falloff away from the peak (i.e., on either side of 1.45 MeV). It too agrees well with quasi-linear diffusion theory based on prior statistics of wave spectra. Sub-MeV electron precipitation observed concurrently with strong EMIC wave-driven 1MeV precipitation has a spectral shape that is consistent with efficient pitch-angle scattering down to 200-300 keV by much less intense higher frequency EMIC waves. These results confirm the critical role of EMIC waves in driving relativistic electron losses. Nonlinear effects may abound and require further investigation. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.03577 [pdf]

Regrowth-free AlGaInAs MQW polarization controller integrated with sidewall grating DFB laser

Authors: Xiao Sun, Song Liang, Weiqing Cheng, Shengwei Ye, Yiming Sun, Yongguang Huang, Ruikang Zhang, Jichuan Xiong, Xuefeng Liu, John H. Marsh, Lian** Hou

Abstract: We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polari… ▽ More We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polarization rotation from approximately counterclockwise circular polarization to linear polarization. Based on the identical epitaxial layer scheme, all of the PMC, DPS, and DFB laser can be integrated monolithically using only a single step of metalorganic vapor phase epitaxy and two steps of III V material dry etching. For the DFB-PMC device, a high TE to TM polarization conversion efficiency 98% over a wide range of DFB injection currents is reported at 1555 nm wavelength. For the DFB-PMC-DPS device, a 60 degree rotation of the Stokes vector was obtained on the Poincaré sphere with a range of bias voltage from 0 V to -4.0 V at IDFB is 170 mA. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2210.10519

arXiv:2210.10519 [pdf]

doi 10.1364/OL.478765

Stepped-height ridge waveguide MQW polarization mode converter monolithically integrated with sidewall grating DFB laser

Authors: Xiao Sun, Weiqing Cheng, Song Liang, Shengwei Ye, Yongguang Huang, Ruikang Zhang, Bocang Qiu, Jichuan Xiong, Xuefeng Liu, John H. Marsh, Lian** Hou

Abstract: We report the first demonstration of a 1555 nm stepped-height ridge waveguide polarization mode converter monolithically integrated with a side wall grating distributed-feedback (DFB) laser using the identical epitaxial layer scheme. The device shows stable single longitudinal mode (SLM) operation with the output light converted from TE to TM polarization with an efficiency of >94% over a wide ran… ▽ More We report the first demonstration of a 1555 nm stepped-height ridge waveguide polarization mode converter monolithically integrated with a side wall grating distributed-feedback (DFB) laser using the identical epitaxial layer scheme. The device shows stable single longitudinal mode (SLM) operation with the output light converted from TE to TM polarization with an efficiency of >94% over a wide range of DFB injection currents (IDFB) from 140 mA to 190 mA. The highest TM mode purity of 98.2% was obtained at IDFB=180 mA. A particular advantage of this device is that only a single step of metalorganic vapor-phase epitaxy and two steps of III-V material dry etching are required for the whole integrated device fabrication, significantly reducing complexity and cost. △ Less

Submitted 7 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.03029 [pdf, other]

Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt

Authors: Seonghyeon Ye, Joel Jang, Doyoung Kim, Yongrae Jo, Minjoon Seo

Abstract: Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through pro… ▽ More Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through prompt tuning, store the samples of the training instances mapped with the prompt embeddings, and retrieve the corresponding prompt embedding of the training instance closest to the query instance during inference. While only adding 0.007% additional parameters, retrieval of soft prompt enhances the performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39% points. Also, we report an interesting finding that retrieving source embeddings trained on similar answer choice formats is more important than those on similar task types. △ Less

Submitted 16 October, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: EMNLP 2023 Findings

arXiv:2210.02969 [pdf, other]

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

Authors: Seonghyeon Ye, Doyoung Kim, Joel Jang, Joongbo Shin, Minjoon Seo

Abstract: Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipp… ▽ More Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 11B-sized Flipped outperforms zero-shot T0-11B and even a 16 times larger 3-shot GPT-3 (175B) on average by 8.4% and 9.7% points, respectively. Flipped gives particularly large improvements on tasks with unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning. △ Less

Submitted 6 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: ICLR 2023

arXiv:2209.12711 [pdf, other]

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Authors: Joel Jang, Seonghyeon Ye, Minjoon Seo

Abstract: Previous work has shown that there exists a scaling law between the size of Language Models (LMs) and their zero-shot performance on different downstream NLP tasks. In this work, we show that this phenomenon does not hold when evaluating large LMs on tasks with negated prompts, but instead shows an inverse scaling law. We evaluate 9 different tasks with negated prompts on (1) pretrained LMs (OPT &… ▽ More Previous work has shown that there exists a scaling law between the size of Language Models (LMs) and their zero-shot performance on different downstream NLP tasks. In this work, we show that this phenomenon does not hold when evaluating large LMs on tasks with negated prompts, but instead shows an inverse scaling law. We evaluate 9 different tasks with negated prompts on (1) pretrained LMs (OPT & GPT-3) of varying sizes (125M - 175B), (2) LMs further pretrained to generalize to novel prompts (InstructGPT), (3) LMs provided with few-shot examples, and (4) LMs fine-tuned specifically on negated prompts; all LM types perform worse on negated prompts as they scale and show a huge performance gap between the human performance when comparing the average score on both original and negated prompts. By highlighting a critical limitation of existing LMs and methods, we urge the community to develop new approaches of develo** LMs that actually follow the given instructions. We provide the code and the datasets to explore negated prompts at https://github.com/joeljang/negated-prompts-for-llms △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.07735 [pdf, other]

Enhance the Visual Representation via Discrete Adversarial Training

Authors: Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue

Abstract: Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We noti… ▽ More Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022, https://github.com/alibaba/easyrobust

arXiv:2209.00277 [pdf, other]

doi 10.1145/3503161.3547996

Video-Guided Curriculum Learning for Spoken Video Grounding

Authors: Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

Abstract: In this paper, we introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. Compared with using text, employing audio requires the model to directly exploit the useful phonemes and syllables related to the video from raw speech. Moreover, we randomly add environmental noises to this speech audio, further increasing the… ▽ More In this paper, we introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. Compared with using text, employing audio requires the model to directly exploit the useful phonemes and syllables related to the video from raw speech. Moreover, we randomly add environmental noises to this speech audio, further increasing the difficulty of this task and better simulating real applications. To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise. Considering during inference the model can not obtain ground truth video segments, we design a curriculum strategy that gradually shifts the input video from the ground truth to the entire video content during pre-training. Finally, the model can learn how to extract critical visual information from the entire video clip to help understand the spoken language. In addition, we collect the first large-scale spoken video grounding dataset based on ActivityNet, which is named as ActivityNet Speech dataset. Extensive experiments demonstrate our proposed video-guided curriculum learning can facilitate the pre-training process to obtain a mutual audio encoder, significantly promoting the performance of spoken video grounding tasks. Moreover, we prove that in the case of noisy sound, our model outperforms the method that grounding video with ASR transcripts, further demonstrating the effectiveness of our curriculum strategy. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: Accepted by ACM MM 2022

arXiv:2208.14531 [pdf, other]

Hollow Rectangular Waveguide-fed Holographic Beamforming Antenna Additively Manufactured (3D Printed) with Conductive Polymer

Authors: Insang Yoo, Jonah Gollub, Shengrong Ye, Allen Gray, Okan Yurduseven, Manohar D. Deshpande, David R. Smith

Abstract: We present the design and fabrication of 3D printed holographic beamforming antennas. The antennas utilize additively manufactured hollow rectangular waveguides that feed radiating rectilinear slots inserted into the upper conducting wall. The lengths of the individual slots are altered to implement a holographic beamforming solution designed using a coupled dipole formalism. For rapid verificatio… ▽ More We present the design and fabrication of 3D printed holographic beamforming antennas. The antennas utilize additively manufactured hollow rectangular waveguides that feed radiating rectilinear slots inserted into the upper conducting wall. The lengths of the individual slots are altered to implement a holographic beamforming solution designed using a coupled dipole formalism. For rapid verification, the designed antennas are fabricated using a desktop dual-extrusion fused filament 3D printer. The body of each antenna and its inner conducting surface are respectively printed using polylactic acid and biodegradable conductive polyester composite material (i.e., Electrifi), which is later deposited with a layer of copper on its surface to improve surface conductivity and reduce surface roughness. The beamforming performance of the fabricated antennas is confirmed via experiments. The 3D printed metasurface antennas using the proposed fabrication technique illustrate emerging capabilities in the rapid prototy** of complex electromagnetic structures. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2207.11176 [pdf, ps, other]

doi 10.1007/s43037-023-00268-z

Generalized Hilbert Operator Acting on Weighted Bergman Spaces and on Dirichlet Spaces

Authors: Shanli Ye, Guanghao Feng

Abstract: Let $μ$ be a positive Borel measure on the interval [0,1). For $β> 0$, The generalized Hankel matrix $\mathcal{H}_{μ,β}= (μ_{n,k,β})_{n,k\geq0}$ with entries $μ_{n,k,β}= \int_{[0.1)}\frac{Γ(n+β)}{n!Γ(β)} t^{n+k}dμ(t)$, induces formally the operator $$\mathcal{H}_{μ,β}(f)(z)=\sum_{n=0}^\infty \left(\sum_{k=0}^\infty μ_{n,k,β}a_k\right)z^n$$ on the space of all analytic function… ▽ More Let $μ$ be a positive Borel measure on the interval [0,1). For $β> 0$, The generalized Hankel matrix $\mathcal{H}_{μ,β}= (μ_{n,k,β})_{n,k\geq0}$ with entries $μ_{n,k,β}= \int_{[0.1)}\frac{Γ(n+β)}{n!Γ(β)} t^{n+k}dμ(t)$, induces formally the operator $$\mathcal{H}_{μ,β}(f)(z)=\sum_{n=0}^\infty \left(\sum_{k=0}^\infty μ_{n,k,β}a_k\right)z^n$$ on the space of all analytic function $f(z)=\sum_{k=0}^ \infty a_k z^n$ in the unit disc $\mathbb{D}$. In this paper, we characterize those positive Borel measures on $[0,1)$ such that $\mathcal{H}_{μ,β}(f)(z)= \int_{[0,1)} \frac{f(t)}{(1-tz)^β} dμ(t)$ for all in weighted Bergman Spaces $A_α^p(0<p<\infty,\; α>-1)$, and among them we describe those for which $\mathcal{H}_{μ,β}(β>0)$ is a bounded(resp.,compact) operator on weighted Bergman spaces and Dirichlet spaces. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2206.12024

Journal ref: Banach Journal of Mathematical Analysis ,2023, 38,

arXiv:2207.11170 [pdf, ps, other]

Generalized Hilbert Operator Acting on Bloch Type Spaces

Authors: Shanli Ye, Zhihui Zhou

Abstract: Let $μ$ be a positive Borel measure on the interval [0,1). For $α>0$, the Hankel matrix $\mathcal{H}_{μ,α}=(μ_{n,k,α})_{n,k\geq 0}$ with entries $μ_{n,k,α}=\int_{[0,1)}\frac{Γ(n+α)}{n!Γ(α)}t^{n+k}dμ(t)$ formally induces the operator $$\mathcal{H}_{μ,α}(f)(z)=\sum_{n=0}^{\infty}\left(\sum_{k=0}^{\infty} μ_{n, k,α} a_{k}\right)z^{n} $$ on the space of all analytic functions… ▽ More Let $μ$ be a positive Borel measure on the interval [0,1). For $α>0$, the Hankel matrix $\mathcal{H}_{μ,α}=(μ_{n,k,α})_{n,k\geq 0}$ with entries $μ_{n,k,α}=\int_{[0,1)}\frac{Γ(n+α)}{n!Γ(α)}t^{n+k}dμ(t)$ formally induces the operator $$\mathcal{H}_{μ,α}(f)(z)=\sum_{n=0}^{\infty}\left(\sum_{k=0}^{\infty} μ_{n, k,α} a_{k}\right)z^{n} $$ on the space of all analytic functions $f(z)=\sum_{k=0}^{\infty}a_{k}z^{k}$ in the unit disc $\mathbb{D}$. In this paper, we characterize the measures $μ$ for which $\mathcal{H}_{μ,α}$ ($α\geq 2$) is a bounded (resp., compact) operator from the Bloch type space $\mathscr{B}_β$ ($0<β<\infty$) into $\mathscr{B}_{α-1}$. We also give a necessary condition for which $\mathcal{H}_{μ,α}$ is a bounded operator by acting on Bloch type spaces for general cases. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.08368 [pdf, ps, other]

A Derivative-Hilbert operator Acting on Dirichlet spaces

Authors: Yun Xu, Shanli Ye, Zhihui Zhou

Abstract: Let $μ$ be a positive Borel measure on the interval $[0,1)$. The Hankel matrix $\mathcal{H}_μ=(μ_{n,k})_{n,k\geq 0}$ with entries $μ_{n,k}=μ_{n+k}$, where $μ_{n}=\int_{[0,1)}t^ndμ(t)$, induces formally the operator as $$\mathcal{DH}_μ(f)(z)=\sum_{n=0}^\infty\left(\sum_{k=0}^\infty μ_{n,k}a_k\right)(n+1)z^n , z\in \mathbb{D},$$ where $f(z)=\sum_{n=0}^{\infty}a_nz^n$ is an analytic function in… ▽ More Let $μ$ be a positive Borel measure on the interval $[0,1)$. The Hankel matrix $\mathcal{H}_μ=(μ_{n,k})_{n,k\geq 0}$ with entries $μ_{n,k}=μ_{n+k}$, where $μ_{n}=\int_{[0,1)}t^ndμ(t)$, induces formally the operator as $$\mathcal{DH}_μ(f)(z)=\sum_{n=0}^\infty\left(\sum_{k=0}^\infty μ_{n,k}a_k\right)(n+1)z^n , z\in \mathbb{D},$$ where $f(z)=\sum_{n=0}^{\infty}a_nz^n$ is an analytic function in $\mathbb{D}$. In this paper, we characterize those positive Borel measures on $[0, 1)$ for which $\mathcal{DH}_μ$ is bounded (resp. compact) from Dirichlet spaces $\mathcal{D}_α( 0<α\leq2 )$ into $\mathcal{D}_β( 2\leqβ<4 )$. △ Less

Submitted 17 July, 2022; originally announced July 2022.

MSC Class: 47B35; 30H99

arXiv:2207.03504 [pdf, other]

doi 10.3847/1538-4357/ac9cd0

Millisecond Pulsars in Dense Star Clusters: Evolution, Scaling Relations, and the Galactic-Center Gamma-ray Excess

Authors: Claire S. Ye, Giacomo Fragione

Abstract: The number of millisecond pulsars (MSPs) observed in Milky Way globular clusters has increased explosively in recent years, but the underlying population is still uncertain due to observational biases. We use state-of-the-art $N$-body simulations to study the evolution of MSP populations in dense star clusters. These cluster models span a wide range in initial conditions, including different initi… ▽ More The number of millisecond pulsars (MSPs) observed in Milky Way globular clusters has increased explosively in recent years, but the underlying population is still uncertain due to observational biases. We use state-of-the-art $N$-body simulations to study the evolution of MSP populations in dense star clusters. These cluster models span a wide range in initial conditions, including different initial masses, metallicities, and virial radii, which nearly cover the full range of properties exhibited by the population of globular clusters in the Milky Way. We demonstrate how different initial cluster properties affect the number of MSPs, for which we provide scaling relations as a function of cluster age and mass. As an application, we use our formulae to estimate the number of MSPs delivered to the Galactic Center from inspiralling globular clusters to probe the origin of the Galactic-Center gamma-ray excess detected by Fermi. We predict about $400$ MSPs in the Galactic Center from disrupted globular clusters, which can potentially explain most of the observed gamma-ray excess. △ Less

Submitted 19 November, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: 14 pages, 15 figures. Accepted for publication in ApJ

arXiv:2206.12024 [pdf, ps, other]

A Derivative-Hilbert operator acting on Hardy spaces

Authors: Shanli Ye, Guanghao Feng

Abstract: Let $μ$ be a positive Borel measure on the interval [0,1). The Hankel matrix $\mathcal{H}_μ= (μ_{n,k})_{n,k\geq0}$ with entries $μ_{n,k}= μ_{n+k}$, where $μ_n=\int_{ [0,1)}t^ndμ(t)$, induces formally the operator $$\mathcal{DH}_μ(f)(z)=\sum_{n=0}^\infty (\sum_{k=0}^\infty μ_{n,k}a_k)(n+1)z^n$$ on the space of all analytic function $f(z)=\sum_{k=0}^ \infty a_k z^n$ in the unit disc $\mathbb{D}$. We… ▽ More Let $μ$ be a positive Borel measure on the interval [0,1). The Hankel matrix $\mathcal{H}_μ= (μ_{n,k})_{n,k\geq0}$ with entries $μ_{n,k}= μ_{n+k}$, where $μ_n=\int_{ [0,1)}t^ndμ(t)$, induces formally the operator $$\mathcal{DH}_μ(f)(z)=\sum_{n=0}^\infty (\sum_{k=0}^\infty μ_{n,k}a_k)(n+1)z^n$$ on the space of all analytic function $f(z)=\sum_{k=0}^ \infty a_k z^n$ in the unit disc $\mathbb{D}$. We characterize those positive Borel measures on $[0,1)$ such that $\mathcal{DH}_μ(f)(z)= \int_{[0,1)} \frac{f(t)}{(1-tz)^2} dμ(t)$ for all in Hardy spaces $H^p(0<p<\infty)$, and among them we describe those for which $\mathcal{DH}_μ$ is a bounded(resp.,compact) operator from $H^p(0<p <\infty)$ into $H^q(q > p$ and $q\geq 1$). We also study the analogous problem in Hardy spaces $H^p(1\leq p\leq 2)$. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2204.14211 [pdf, other]

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

Authors: Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Minjoon Seo

Abstract: Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus… ▽ More Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning. The dataset and the code are available at https://github.com/joeljang/temporalwiki. △ Less

Submitted 12 April, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: published at EMNLP 2022

arXiv:2204.10482 [pdf, other]

Recurrent Affine Transformation for Text-to-image Synthesis

Authors: Senmao Ye, Fei Liu, Minkui Tan

Abstract: Text-to-image synthesis aims to generate natural images conditioned on text descriptions. The main difficulty of this task lies in effectively fusing text information into the image synthesis process. Existing methods usually adaptively fuse suitable text information into the synthesis process with multiple isolated fusion blocks (e.g., Conditional Batch Normalization and Instance Normalization)… ▽ More Text-to-image synthesis aims to generate natural images conditioned on text descriptions. The main difficulty of this task lies in effectively fusing text information into the image synthesis process. Existing methods usually adaptively fuse suitable text information into the synthesis process with multiple isolated fusion blocks (e.g., Conditional Batch Normalization and Instance Normalization). However, isolated fusion blocks not only conflict with each other but also increase the difficulty of training (see first page of the supplementary). To address these issues, we propose a Recurrent Affine Transformation (RAT) for Generative Adversarial Networks that connects all the fusion blocks with a recurrent neural network to model their long-term dependency. Besides, to improve semantic consistency between texts and synthesized images, we incorporate a spatial attention model in the discriminator. Being aware of matching image regions, text descriptions supervise the generator to synthesize more relevant image contents. Extensive experiments on the CUB, Oxford-102 and COCO datasets demonstrate the superiority of the proposed model in comparison to state-of-the-art models \footnote{https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis.git} △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.10095 [pdf, other]

R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Authors: Yu Wang, Shuo Ye, Shujian Yu, Xinge You

Abstract: Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of i… ▽ More Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck~(IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve outperforming performance than other state-of-the-art approaches and baseline models. △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.07169 [pdf, other]

doi 10.3847/2041-8213/ac7ec4

Formation of Low-mass Black Holes and Single Millisecond Pulsars in Globular Clusters

Authors: Kyle Kremer, Claire S. Ye, Fulya Kıroğlu, James C. Lombardi Jr., Scott M. Ransom, Frederic A. Rasio

Abstract: Close encounters between neutron stars and main-sequence stars occur in globular clusters and may lead to various outcomes. Here we study encounters resulting in tidal disruption of the star. Using $N$-body models, we predict the typical stellar masses in these disruptions and the dependence of the event rate on host cluster properties. We find that tidal disruption events occur most frequently in… ▽ More Close encounters between neutron stars and main-sequence stars occur in globular clusters and may lead to various outcomes. Here we study encounters resulting in tidal disruption of the star. Using $N$-body models, we predict the typical stellar masses in these disruptions and the dependence of the event rate on host cluster properties. We find that tidal disruption events occur most frequently in core-collapsed globular clusters and that roughly $25\%$ of the disrupted stars are merger products (i.e., blue straggler stars). Using hydrodynamic simulations, we model the tidal disruptions themselves (over timescales of days) to determine the mass bound to the neutron star and the properties of the accretion disks formed. In general, we find that roughly $80-90\%$ of the initial stellar mass becomes bound to the neutron star following disruption. Additionally, we find that neutron stars receive impulsive kicks of up to about $20\,$km/s as a result of the asymmetry of unbound ejecta; these kicks place these neutron stars on elongated orbits within their host cluster, with apocenter distances well outside the cluster core. Finally, we model the evolution of the (hypercritical) accretion disks on longer timescales (days to years after disruption) to estimate the accretion rate onto the neutron stars and accompanying spin-up. As long as $\gtrsim1\%$ of the bound mass accretes onto the neutron star, millisecond spin periods can be attained. We argue the growing numbers of isolated millisecond pulsars observed in globular clusters may have formed, at least in part, through this mechanism. In the case of significant mass growth, some of these neutron stars may collapse to form low-mass ($\lesssim3\,M_{\odot}$) black holes. △ Less

Submitted 27 June, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 14 pages, 4 figures, 2 tables. Accepted for publication in ApJL

arXiv:2204.01499 [pdf, other]

FedRecAttack: Model Poisoning Attack to Federated Recommendation

Authors: Dazhong Rong, Shuai Ye, Ruoyan Zhao, Hon Ning Yuen, Jianhai Chen, Qinming He

Abstract: Federated Recommendation (FR) has received considerable popularity and attention in the past few years. In FR, for each user, its feature vector and interaction data are kept locally on its own client thus are private to others. Without the access to above information, most existing poisoning attacks against recommender systems or federated learning lose validity. Benifiting from this characterist… ▽ More Federated Recommendation (FR) has received considerable popularity and attention in the past few years. In FR, for each user, its feature vector and interaction data are kept locally on its own client thus are private to others. Without the access to above information, most existing poisoning attacks against recommender systems or federated learning lose validity. Benifiting from this characteristic, FR is commonly considered fairly secured. However, we argue that there is still possible and necessary security improvement could be made in FR. To prove our opinion, in this paper we present FedRecAttack, a model poisoning attack to FR aiming to raise the exposure ratio of target items. In most recommendation scenarios, apart from private user-item interactions (e.g., clicks, watches and purchases), some interactions are public (e.g., likes, follows and comments). Motivated by this point, in FedRecAttack we make use of the public interactions to approximate users' feature vectors, thereby attacker can generate poisoned gradients accordingly and control malicious users to upload the poisoned gradients in a well-designed way. To evaluate the effectiveness and side effects of FedRecAttack, we conduct extensive experiments on three real-world datasets of different sizes from two completely different scenarios. Experimental results demonstrate that our proposed FedRecAttack achieves the state-of-the-art effectiveness while its side effects are negligible. Moreover, even with small proportion (3%) of malicious users and small proportion (1%) of public interactions, FedRecAttack remains highly effective, which reveals that FR is more vulnerable to attack than people commonly considered. △ Less

Submitted 13 October, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: This paper has been accepted by IEEE International Conference on Data Engineering 2022 (Second Research Round)

arXiv:2203.09611 [pdf, other]

doi 10.1080/13658816.2022.2053980

STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity

Authors: Yuhao Kang, Kunlin Wu, Song Gao, Ignavier Ng, **meng Rao, Shan Ye, Fan Zhang, Teng Fei

Abstract: Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-B… ▽ More Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc. △ Less

Submitted 30 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: International Journal of Geographical Information Science, Year 2022

arXiv:2203.07436 [pdf, other]

SuperAnimal pretrained pose estimation models for behavioral analysis

Authors: Shaokai Ye, Anastasiia Filippova, Jessy Lauer, Steffen Schneider, Maxime Vidal, Tian Qiu, Alexander Mathis, Mackenzie Weygandt Mathis

Abstract: Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a serie… ▽ More Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation. △ Less

Submitted 30 December, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Models and demos available at http://modelzoo.deeplabcut.org

Showing 51–100 of 256 results for author: Ye, S