Search | arXiv e-print repository

Weight decomposition of $\mathfrak{sl}_d(\mathbb R)$ with respect to the adjoint representation of $\mathfrak{so}(p,q)$

Abstract: In this concise article, we compute the weight decomposition of $\mathfrak{sl}_d(\mathbb R)$ with respect to the adjoint representation of $\mathfrak{so}(p,q)$, where $d=p+q$ and demonstrate in detail that $\mathfrak{sl}_d(\mathbb R)$ comprises two irreducible $\mathfrak{so}(p,q)$-invariant subspaces. This can be employed to establish the well-known fact that the identity component of… ▽ More In this concise article, we compute the weight decomposition of $\mathfrak{sl}_d(\mathbb R)$ with respect to the adjoint representation of $\mathfrak{so}(p,q)$, where $d=p+q$ and demonstrate in detail that $\mathfrak{sl}_d(\mathbb R)$ comprises two irreducible $\mathfrak{so}(p,q)$-invariant subspaces. This can be employed to establish the well-known fact that the identity component of $\mathrm{SO}(p,q)$ is a maximal connected subgroup of $\mathrm{SL}_d(\mathbb R)$. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 7 pages

MSC Class: 17B10

arXiv:2402.12660 [pdf, other]

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion

Authors: Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han, Zhizheng Wu

Abstract: In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showcasing the step-by-step denoising of the noisy spectrum and its transformation into a clean spectrum that captures the desired singer's timbre. The system also fac… ▽ More In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showcasing the step-by-step denoising of the noisy spectrum and its transformation into a clean spectrum that captures the desired singer's timbre. The system also facilitates side-by-side comparisons of different conditions, such as source content, melody, and target timbre, highlighting the impact of these conditions on the diffusion generation process and resulting conversions. Through comprehensive evaluations, SingVisio demonstrates its effectiveness in terms of system design, functionality, explainability, and user-friendliness. It offers users of various backgrounds valuable learning experiences and insights into the diffusion model for singing voice conversion. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12517 [pdf, other]

A Unified-Field Monolithic Fictitious Domain-Finite Element Method for Fluid-Structure-Contact Interactions and Applications to Deterministic Lateral Displacement Problems

Authors: Cheng Wang, Pengtao Sun, Yumiao Zhang, **chao Xu, Yan Chen, Jiarui Han

Abstract: Based upon two overlapped, body-unfitted meshes, a type of unified-field monolithic fictitious domain-finite element method (UFMFD-FEM) is developed in this paper for moving interface problems of dynamic fluid-structure interactions (FSI) accompanying with high-contrast physical coefficients across the interface and contacting collisions between the structure and fluidic channel wall when the stru… ▽ More Based upon two overlapped, body-unfitted meshes, a type of unified-field monolithic fictitious domain-finite element method (UFMFD-FEM) is developed in this paper for moving interface problems of dynamic fluid-structure interactions (FSI) accompanying with high-contrast physical coefficients across the interface and contacting collisions between the structure and fluidic channel wall when the structure is immersed in the fluid. In particular, the proposed novel numerical method consists of a monolithic, stabilized mixed finite element method within the frame of fictitious domain/immersed boundary method (IBM) for generic fluid-structure-contact interaction (FSCI) problems in the Eulerian-updated Lagrangian description, while involving the no-slip type of interface conditions on the fluid-structure interface, and the repulsive contact force on the structural surface when the immersed structure contacts the fluidic channel wall. The developed UFMFD-FEM for FSI or FSCI problems can deal with the structural motion with large rotational and translational displacements and/or large deformation in an accurate and efficient fashion, which are first validated by two benchmark FSI problems and one FSCI model problem, then by experimental results of a realistic FSCI scenario -- the microfluidic deterministic lateral displacement (DLD) problem that is applied to isolate circulating tumor cells (CTCs) from blood cells in the blood fluid through a cascaded filter DLD microchip in practice, where a particulate fluid with the pillar obstacles effect in the fluidic channel, i.e., the effects of fluid-structure interaction and structure collision, play significant roles to sort particles (cells) of different sizes with tilted pillar arrays. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 32 pages, 42 figures, 5 tables, 66 references

MSC Class: 65M22; 65M60; 65M85; 65Z05; 65D17; 70F35; 70F40; 74S05; 74F10; 76M10; 76M30; 76D05; 76D09

arXiv:2402.11142 [pdf, other]

Gras** the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction

Authors: Sizhe Zhou, Yu Meng, Bowen **, Jiawei Han

Abstract: Relation extraction (RE), a crucial task in NLP, aims to identify semantic relationships between entities mentioned in texts. Despite significant advancements in this field, existing models typically rely on extensive annotated data for training, which can be both costly and time-consuming to acquire. Moreover, these models often struggle to adapt to new or unseen relationships. In contrast, few-s… ▽ More Relation extraction (RE), a crucial task in NLP, aims to identify semantic relationships between entities mentioned in texts. Despite significant advancements in this field, existing models typically rely on extensive annotated data for training, which can be both costly and time-consuming to acquire. Moreover, these models often struggle to adapt to new or unseen relationships. In contrast, few-shot learning settings, which aim to reduce annotation requirements, may offer incomplete and biased supervision for understanding target relation semantics, leading to degraded and unstable performance. To provide the model with accurate and explicit descriptions of the relations types and meanwhile minimize the annotation requirements, we study the definition only zero-shot RE setting where only relation definitions expressed in natural language are used to train a RE model. Motivated by the strong synthetic data generation power of LLMs, we propose a framework REPaL which consists of three stages: (1) We utilize LLMs to generate initial seed instances based on relation definitions and an unlabeled corpora. (2) We fine-tune a bidirectional Small Language Model (SLM) using these initial seeds to learn the relations for the target domain. (3) We enhance pattern coverage and mitigate bias resulting from the limited number of initial seeds by incorporating feedback acquired from SLM's predictions on unlabeled corpora. To accomplish this, we leverage the multi-turn conversation ability of LLMs to generate new instances in follow-up dialogues. Experiments on two datasets show REPaL achieves better zero-shot performance with large margins over baseline methods. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 21 pages, 12 Tables, 9 Figures

arXiv:2402.10802 [pdf, other]

TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models

Authors: Haotian Si, Changhua Pei, Hang Cui, **gwen Yang, Yongqian Sun, Shenglin Zhang, **g**g Li, Haiming Zhang, **g Han, Dan Pei, Jianhui Li, Gaogang Xie

Abstract: Driven by the proliferation of real-world application scenarios and scales, time series anomaly detection (TSAD) has attracted considerable scholarly and industrial interest. However, existing algorithms exhibit a gap in terms of training paradigm, online detection paradigm, and evaluation criteria when compared to the actual needs of real-world industrial systems. Firstly, current algorithms typi… ▽ More Driven by the proliferation of real-world application scenarios and scales, time series anomaly detection (TSAD) has attracted considerable scholarly and industrial interest. However, existing algorithms exhibit a gap in terms of training paradigm, online detection paradigm, and evaluation criteria when compared to the actual needs of real-world industrial systems. Firstly, current algorithms typically train a specific model for each individual time series. In a large-scale online system with tens of thousands of curves, maintaining such a multitude of models is impractical. The performance of using merely one single unified model to detect anomalies remains unknown. Secondly, most TSAD models are trained on the historical part of a time series and are tested on its future segment. In distributed systems, however, there are frequent system deployments and upgrades, with new, previously unseen time series emerging daily. The performance of testing newly incoming unseen time series on current TSAD algorithms remains unknown. Lastly, although some papers have conducted detailed surveys, the absence of an online evaluation platform prevents answering questions like "Who is the best at anomaly detection at the current stage?" In this paper, we propose TimeSeriesBench, an industrial-grade benchmark that we continuously maintain as a leaderboard. On this leaderboard, we assess the performance of existing algorithms across more than 168 evaluation settings combining different training and testing paradigms, evaluation metrics and datasets. Through our comprehensive analysis of the results, we provide recommendations for the future design of anomaly detection algorithms. To address known issues with existing public datasets, we release an industrial dataset to the public together with TimeSeriesBench. All code, data, and the online leaderboard have been made publicly available. △ Less

Submitted 26 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10744 [pdf, other]

GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models

Authors: Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, Jiawei Han

Abstract: The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs). However, we discovered that traditional relation extraction (RE) metrics like precision and recall fall short in evaluating GRE methods. This shortfall arises because these metrics rely on exact matching with human-annotate… ▽ More The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs). However, we discovered that traditional relation extraction (RE) metrics like precision and recall fall short in evaluating GRE methods. This shortfall arises because these metrics rely on exact matching with human-annotated reference relations, while GRE methods often produce diverse and semantically accurate relations that differ from the references. To fill this gap, we introduce GenRES for a multi-dimensional assessment in terms of the topic similarity, uniqueness, granularity, factualness, and completeness of the GRE results. With GenRES, we empirically identified that (1) precision/recall fails to justify the performance of GRE methods; (2) human-annotated referential relations can be incomplete; (3) prompting LLMs with a fixed set of relations or entities can cause hallucinations. Next, we conducted a human evaluation of GRE methods that shows GenRES is consistent with human preferences for RE quality. Last, we made a comprehensive evaluation of fourteen leading LLMs using GenRES across document, bag, and sentence level RE datasets, respectively, to set the benchmark for future research in GRE △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10436 [pdf, other]

I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models

Authors: Wenchao Dong, Assem Zhunis, Hyo** Chin, Jiyoung Han, Meeyoung Cha

Abstract: We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation toward… ▽ More We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation towards individualism (i.e., in-group values). Conversely, when a collectivistic persona was assigned to ChatGPT in Eastern languages, a similar pattern emerged with more negative responses toward individualism (i.e., out-group values) as compared to collectivism (i.e., in-group values). The results indicate that when imbued with a particular social identity, ChatGPT discerns in-group and out-group, embracing in-group values while eschewing out-group values. Notably, the negativity towards the out-group, from which prejudices and discrimination arise, exceeded the positivity towards the in-group. The experiment was replicated in the political domain, and the results remained consistent. Furthermore, this replication unveiled an intrinsic Democratic bias in Large Language Models (LLMs), aligning with earlier findings and providing integral insights into mitigating such bias through prompt engineering. Extensive robustness checks were performed using varying hyperparameter and persona setup methods, with or without social identity labels, across other popular language models. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08183 [pdf, other]

Pixel Sentence Representation Learning

Authors: Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed

Abstract: Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs… ▽ More Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs to form semantics-preserved positive pairs. In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. Drawing from cognitive and linguistic sciences, we introduce an unsupervised visual sentence representation learning framework, employing visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to texts to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision, achieving comparable performance in semantic textual similarity (STS) to existing state-of-the-art NLP methods. Additionally, we unveil our method's inherent zero-shot cross-lingual transferability and a unique leapfrogging pattern across languages during iterative training. To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension. Our code is available at https://github.com/gowitheflow-1998/Pixel-Linguist △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07255 [pdf, other]

American Sign Language Video to Text Translation

Authors: Parsheeta Roy, Ji-Eun Han, Srishti Chouhan, Bhaavanaa Thumu

Abstract: Sign language to text is a crucial technology that can break down communication barriers for individuals with hearing difficulties. We replicate and try to improve on a recently published study. We evaluate models using BLEU and rBLEU metrics to ensure translation quality. During our ablation study, we found that the model's performance is significantly influenced by optimizers, activation functio… ▽ More Sign language to text is a crucial technology that can break down communication barriers for individuals with hearing difficulties. We replicate and try to improve on a recently published study. We evaluate models using BLEU and rBLEU metrics to ensure translation quality. During our ablation study, we found that the model's performance is significantly influenced by optimizers, activation functions, and label smoothing. Further research aims to refine visual feature capturing, enhance decoder utilization, and integrate pre-trained decoders for better translation outcomes. Our source code is available to facilitate replication of our results and encourage future research. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.06177 [pdf, other]

Hamiltonicity of Sparse Pseudorandom Graphs

Authors: Asaf Ferber, Jie Han, Dingjia Mao, Roman Vershynin

Abstract: We show that every $(n,d,λ)$-graph contains a Hamilton cycle for sufficiently large $n$, assuming that $d\geq \log^{10}n$ and $λ\leq cd$, where $c=\frac{1}{9000}$. This significantly improves a recent result of Glock, Correia and Sudakov, who obtain a similar result for $d$ that grows polynomially with $n$. The proof is based on the absorption technique combined with a new result regarding the sec… ▽ More We show that every $(n,d,λ)$-graph contains a Hamilton cycle for sufficiently large $n$, assuming that $d\geq \log^{10}n$ and $λ\leq cd$, where $c=\frac{1}{9000}$. This significantly improves a recent result of Glock, Correia and Sudakov, who obtain a similar result for $d$ that grows polynomially with $n$. The proof is based on the absorption technique combined with a new result regarding the second largest eigenvalue of the adjacency matrix of a subgraph induced by a random subset of vertices. We believe that the latter result is of an independent interest and will have further applications. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05604 [pdf, other]

Protecting logical qubits with dynamical decoupling

Authors: Jia-Xiu Han, Jiang Zhang, Guang-Ming Xue, Haifeng Yu, Guilu Long

Abstract: Demonstrating that logical qubits outperform their physical counterparts is a milestone for achieving reliable quantum computation. Here, we propose to protect logical qubits with a novel dynamical decoupling scheme that implements iSWAP gates on nearest-neighbor physical qubits, and experimentally demonstrate the scheme on superconducting transmon qubits. In our scheme, each logical qubit only re… ▽ More Demonstrating that logical qubits outperform their physical counterparts is a milestone for achieving reliable quantum computation. Here, we propose to protect logical qubits with a novel dynamical decoupling scheme that implements iSWAP gates on nearest-neighbor physical qubits, and experimentally demonstrate the scheme on superconducting transmon qubits. In our scheme, each logical qubit only requires two physical qubits. A universal set of quantum gates on the logical qubits can be achieved such that each logical gate comprises only one or two physical gates. Our experiments reveal that the coherence time of a logical qubit is extended by up to 366% when compared to the better-performing physical qubit. Moreover, to the best of our knowledge, we demonstrate for the first time that multiple logical qubits outperform their physical counterparts in superconducting qubits. We illustrate a set of universal gates through a logical Ramsey experiment and the creation of a logical Bell state. Given its scalable nature, our scheme holds promise as a component for future reliable quantum computation. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

arXiv:2402.05492 [pdf, other]

Cosmological Forecast of the Void Size Function Measurement from the CSST Spectroscopic Survey

Authors: Yingxiao Song, Qi Xiong, Yan Gong, Furen Deng, Kwan Chuen Chan, Xuelei Chen, Qi Guo, Jiaxin Han, Guoliang Li, Ming Li, Yun Liu, Yu Luo, Wenxiang Pei, Chengliang Wei

Abstract: Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy m… ▽ More Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy mock samples based on an improved semi-analytical model. We identify voids from this galaxy catalog using the watershed algorithm without assuming a spherical shape, and estimate the VSFs at different redshift bins from $z=0.5$ to 1.1. We propose a void selection method based on the ellipticity, and assume the void linear underdensity threshold $δ_{\rm v}$ in the theoretical model is redshift-dependent and set it as a free parameter in each redshift bin. The Markov Chain Monte Carlo (MCMC) method is adopted to implement the constraints on the cosmological and void parameters. We find that the CSST VSF measurement can constrain the cosmological parameters to a few percent level. The best-fit values of $δ_{\rm v}$ are ranging from $\sim-0.4$ to $-0.1$ as the redshift increases from 0.5 to 1.1, which has a distinct difference from the theoretical calculation with $δ_{\rm v}\simeq-2.7$ assuming the spherical evolution and using particles as tracer. Our method can provide a good reference for void identification and selection in the VSF analysis of the spectroscopic galaxy surveys. △ Less

Submitted 24 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 10 pages, 7 figures, 3 tables. Accepted for publication in MNRAS

arXiv:2402.05382 [pdf, other]

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Authors: Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok

Abstract: Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based… ▽ More Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks show that MoCE outperforms the vanilla MAE by 2.45\% on average. It also obtains new state-of-the-art self-supervised learning results on detection and segmentation. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR 2023

arXiv:2402.04671 [pdf, other]

V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication

Authors: Yuanfang Zhang, Junxuan Li, Kaiqing Luo, Yiying Yang, Jiayi Han, Nian Liu, Denghui Qin, Peng Han, Chengpei Xu

Abstract: Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to… ▽ More Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to this problem by leveraging vehicle-to-vehicle (V2V) communication. We propose the first generalized collaborative SSC framework that allows autonomous vehicles to share sensing information from different sensor views to jointly perform SSC tasks. To validate the proposed framework, we further build V2VSSC, the first V2V SSC benchmark, on top of the large-scale V2V perception dataset OPV2V. Extensive experiments demonstrate that by leveraging V2V communication, the SSC performance can be increased by 8.3% on geometric metric IoU and 6.0% mIOU. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03720 [pdf, other]

Similarity-based Neighbor Selection for Graph LLMs

Authors: Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang

Abstract: Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded… ▽ More Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01749 [pdf, other]

Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

Authors: Weijia Zhang, **dong Han, Zhao Xu, Hang Ni, Hao Liu, Hui Xiong

Abstract: Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual unde… ▽ More Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual understanding, problem solving, and adaptability across a wide range of tasks suggest that integrating these models into urban domains could have a transformative impact on the development of smart cities. Despite growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces challenges such as a lack of clear definitions, systematic reviews, and universalizable solutions. To this end, this paper first introduces the concept of UFM and discusses the unique challenges involved in building them. We then propose a data-centric taxonomy that categorizes current UFM-related works, based on urban data modalities and types. Furthermore, to foster advancement in this field, we present a promising framework aimed at the prospective realization of UFMs, designed to overcome the identified challenges. Additionally, we explore the application landscape of UFMs, detailing their potential impact in various urban contexts. Relevant papers and open-source resources have been collated and are continuously updated at https://github.com/usail-hkust/Awesome-Urban-Foundation-Models. △ Less

Submitted 29 January, 2024; originally announced February 2024.

arXiv:2402.01115 [pdf, other]

Interpretation of Intracardiac Electrograms Through Textual Representations

Authors: William Jongwon Han, Diana Gomez, Avi Alok, Chao**g Duan, Michael A. Rosenberg, Douglas Weber, Emerson Liu, Ding Zhao

Abstract: Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artif… ▽ More Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artificial intelligence (AI) has allowed some works to utilize deep learning frameworks to interpret EGMs during AFib. Additionally, language models (LMs) have shown exceptional performance in being able to generalize to unseen domains, especially in healthcare. In this study, we are the first to leverage pretrained LMs for finetuning of EGM interpolation and AFib classification via masked language modeling. We formulate the EGM as a textual sequence and present competitive performances on AFib classification compared against other representations. Lastly, we provide a comprehensive interpretability study to provide a multi-perspective intuition of the model's behavior, which could greatly benefit the clinical use. △ Less

Submitted 11 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 18 pages, 9 figures; Accepted to CHIL 2024

ACM Class: I.2.7; J.3

arXiv:2401.17364 [pdf, other]

doi 10.1007/s11433-023-2333-8

HiFAST: an HI data calibration and imaging pipeline for FAST

Authors: Yingjie **g, Jie Wang, Chen Xu, Ziming Liu, Qingze Chen, Tiantian Liang, **long Xu, Yixian Cao, **g Wang, Huijie Hu, Chuan-Peng Zhang, Qi Guo, Liang Gao, Mei Ai, Hengqian Gan, Xuyang Gao, **lin Han, Ligang Hou, Zhipeng Hou, Peng Jiang, Xu Kong, Fujia Li, Zerui Liu, Li Shao, Hengxing Pan , et al. (8 additional authors not shown)

Abstract: The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of fr… ▽ More The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of frequency-dependent noise diode calibration, baseline fitting, standing wave removal using an FFT-based method, flux density calibration, stray radiation correction, and gridding to produce data cubes. These modules can be combined as needed to process the data from most FAST observation modes: tracking, drift scanning, On-The-Fly map**, and most of their variants. With HiFAST, the RMS noises of the calibrated spectra from all 19 beams were only slightly (~ 5%) higher than the theoretical expectation. The results for the extended source M33 and the point sources are consistent with the results from Arecibo. The moment maps (0,1 and 2) of M33 agree well with the results from the Arecibo Galaxy Environment Survey (AGES) with a fractional difference of less than 10%. For a common sample of 221 sources with signal-to-noise ratio S/N >10 from the Arecibo Legacy Fast ALFA (ALFALFA) survey, the mean value of fractional difference in the integrated flux density, $S_{\mathrm{int}}$, between the two datasets is approximately 0.005 %, with a dispersion of 15.4%. Further checks on the integrated flux density of 23 sources with seven observations indicate that the variance in the flux density of the source with luminous objects ($S_\mathrm{int}$ $ > 2.5$ Jy km s$^{-1}$) is less than 5%. Our tests suggest that the FAST telescope, with the efficient, precise, and user-friendly pipeline HiFAST, will yield numerous significant scientific findings in the investigation of the HI in the universe. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by SCPMA. 21 pages, 14 figures. The pipeline is accessible at https://hifast.readthedocs.io

arXiv:2401.16600 [pdf, other]

Depth Anything in Medical Images: A Comparative Study

Authors: John J. Han, Ayberk Acar, Callahan Henry, Jie Ying Wu

Abstract: Monocular depth estimation (MDE) is a critical component of many medical tracking and map** algorithms, particularly from endoscopic or laparoscopic video. However, because ground truth depth maps cannot be acquired from real patient data, supervised learning is not a viable approach to predict depth maps for medical scenes. Although self-supervised learning for MDE has recently gained attention… ▽ More Monocular depth estimation (MDE) is a critical component of many medical tracking and map** algorithms, particularly from endoscopic or laparoscopic video. However, because ground truth depth maps cannot be acquired from real patient data, supervised learning is not a viable approach to predict depth maps for medical scenes. Although self-supervised learning for MDE has recently gained attention, the outputs are difficult to evaluate reliably and each MDE's generalizability to other patients and anatomies is limited. This work evaluates the zero-shot performance of the newly released Depth Anything Model on medical endoscopic and laparoscopic scenes. We compare the accuracy and inference speeds of Depth Anything with other MDE models trained on general scenes as well as in-domain models trained on endoscopic data. Our findings show that although the zero-shot capability of Depth Anything is quite impressive, it is not necessarily better than other models in both speed and performance. We hope that this study can spark further research in employing foundation models for MDE in medical scenes. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 10 pages, 2 figures, 3 tables

arXiv:2401.14016 [pdf, other]

Towards Uncertainty-Aware Language Agent

Authors: Jiuzhou Han, Wray Buntine, Ehsan Shareghi

Abstract: While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external… ▽ More While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty. △ Less

Submitted 30 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Our code and data are at https://uala-agent.github.io. (accepted to ACL 2024 Findings). arXiv admin note: text overlap with arXiv:2310.05915

arXiv:2401.13129 [pdf, other]

Seed-Guided Fine-Grained Entity Ty** in Science and Engineering Domains

Authors: Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han

Abstract: Accurately ty** entity mentions from text segments is a fundamental task for various natural language processing applications. Many previous approaches rely on massive human-annotated data to perform entity ty**. Nevertheless, collecting such data in highly specialized science and engineering domains (e.g., software engineering and security) can be time-consuming and costly, without mentioning… ▽ More Accurately ty** entity mentions from text segments is a fundamental task for various natural language processing applications. Many previous approaches rely on massive human-annotated data to perform entity ty**. Nevertheless, collecting such data in highly specialized science and engineering domains (e.g., software engineering and security) can be time-consuming and costly, without mentioning the domain gaps between training and inference data if the model needs to be applied to confidential datasets. In this paper, we study the task of seed-guided fine-grained entity ty** in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i.e., those without seed entities). To solve this problem, we propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus using the contextualized representations of pre-trained language models. It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types. Extensive experiments on two datasets covering four domains demonstrate the effectiveness of SEType in comparison with various baselines. △ Less

Submitted 20 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: 9 pages; Accepted to AAAI 2024 (Code: https://github.com/yuzhimanhua/SEType)

arXiv:2401.11997 [pdf, other]

PAC.V. The Roles of Mass and Environment in the Quenching of Galaxies

Authors: Yun Zheng, Kun Xu, Y. P. **g, Donghai Zhao, Hongyu Gao, Xiaolin Luo, Jianxin Han, Yu Yu, Ming Li

Abstract: The roles that mass and environment play in the galaxy quenching are still under debate. Leveraging the Photometric objects Around Cosmic webs (PAC) method, we analyze the excess surface distribution $\bar{n}_2w_{\rm{p}}(r_{\rm{p}})$ of photometric galaxies in different color (rest-frame $u-r$) within the stellar mass range of $10^{9.0}M_{\odot}\sim10^{11.0}M_{\odot}$ around spectroscopic massive… ▽ More The roles that mass and environment play in the galaxy quenching are still under debate. Leveraging the Photometric objects Around Cosmic webs (PAC) method, we analyze the excess surface distribution $\bar{n}_2w_{\rm{p}}(r_{\rm{p}})$ of photometric galaxies in different color (rest-frame $u-r$) within the stellar mass range of $10^{9.0}M_{\odot}\sim10^{11.0}M_{\odot}$ around spectroscopic massive central galaxies ($10^{10.9}\sim10^{11.7}M_{\odot}$) at the redshift interval $0<z_s<0.7$, utilizing data from the Hyper SuprimeCam Subaru Strategic Program and the spectroscopic samples of Slogan Digital Sky Survey (i.e. Main, LOWZ and CMASS samples). We find that both mass and environment quenching contribute to the evolution of companion galaxies. To isolate the environment effect, we quantify the quenched fraction excess (QFE) of companion galaxies encircling massive central galaxies within $0.01h^{-1}{\rm{Mpc}}<r_{\rm{p}}<20h^{-1}\rm{Mpc}$, representing the surplus quenched fraction relative to the average. We find that the high density halo environment affects the star formation quenching up to about three times of the virial radius, and this effect becomes stronger at lower redshift. We also find that even after being scaled by the virial radius, the environment quenching efficiency is higher for more massive halos or for companion galaxies of higher stellar mass, though the trends are quite weak. We present a fitting formula that comprehensively captures the QFE across central and companion stellar mass bins, halo-centric distance bins, and redshift bins, offering a valuable tool for constraining galaxy formation models. Furthermore, we have made a quantitative comparison with Illustris-TNG that underscores some important differences, particularly in the excessive quenching of low-mass companion galaxies ($<10^{9.5}M_{\odot}$) by TNG. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 23 pages, 14 figures. Submitted to ApJ. Comments welcome :-)

arXiv:2401.11423 [pdf, ps, other]

doi 10.1140/epjc/s10052-023-12362-5

Pair production of the singlet vector-like B quark at the CLIC

Authors: **-Zhong Han, Yao-Bei Liu, Shi-Yu Xu

Abstract: Vector-like quarks~(VLQs) are a common feature of many scenarios of new physics beyond the Standard Model~(SM), which generally decay into a SM third-generation quark with a SM gauge boson, or a Higgs boson. The presence of a new exotic decay mode of VLQs will reduce the branching ratios of these standard decay modes and thus relax the current mass exclusion limits from LHC experiments. Based on a… ▽ More Vector-like quarks~(VLQs) are a common feature of many scenarios of new physics beyond the Standard Model~(SM), which generally decay into a SM third-generation quark with a SM gauge boson, or a Higgs boson. The presence of a new exotic decay mode of VLQs will reduce the branching ratios of these standard decay modes and thus relax the current mass exclusion limits from LHC experiments. Based on a model-independent framework, we investigate the prospect of discovering the pair production of the weak-singlet VLQ-$B$ at the future 3-TeV Compact Linear Collider~(CLIC), by focusing on the final states including one $Z$ boson and four $b$-jets via two types of modes: $Z\to \ell^{+}\ell^{-}$ and $Z\to ν\barν$. By performing a rapid detector simulation of the signal and background events, and considering the initial state radiation and beamstrahlung effects, the exclusion limit at the 95\% confidence level and the $5σ$ discovery prospects are respectively obtained on the branching ratio of $B\to bZ$ and the VLQ-$B$ masses at the future 3-TeV CLIC with an integrated luminosity of 5 ab$^{-1}$. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 17 pages, 6 figures. arXiv admin note: text overlap with arXiv:2112.15044

Journal ref: Eur. Phys. J. C 84, 61 (2024)

arXiv:2401.11093 [pdf, other]

Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding

Authors: Haisheng Fu, Feng Liang, Jie Liang, Zhenman Fang, Guohe Zhang, **gning Han

Abstract: Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two te… ▽ More Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two techniques to balance the trade-off between complexity and performance. First, we introduce two branching coding networks to independently learn a low-resolution latent representation and a high-resolution latent representation of the input image, discriminatively representing the global and local information therein. Second, we utilize the high-resolution latent representation as conditional information for the low-resolution latent representation, furnishing it with global information, thus aiding in the reduction of redundancy between low-resolution information. We do not utilize any serial entropy models. Instead, we employ a parallel channel-wise auto-regressive entropy model for encoding and decoding low-resolution and high-resolution latent representations. Experiments demonstrate that our method is approximately twice as fast in both encoding and decoding compared to the parallelizable checkerboard context model, and it also achieves a 1.2% improvement in R-D performance compared to state-of-the-art learned image compression schemes. Our method also outperforms classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods in rate-distortion performance, as validated by both PSNR and MS-SSIM metrics on the Kodak dataset. △ Less

Submitted 21 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted by DCC2024

arXiv:2401.11037 [pdf, other]

Equivariant Graph Neural Operator for Modeling 3D Dynamics

Authors: Minkai Xu, Jiaqi Han, Aaron Lou, Jean Kossaifi, Arvind Ramanathan, Kamyar Azizzadenesheli, Jure Leskovec, Stefano Ermon, Anima Anandkumar

Abstract: Modeling the complex three-dimensional (3D) dynamics of relational systems is an important problem in the natural sciences, with applications ranging from molecular simulations to particle mechanics. Machine learning methods have achieved good success by learning graph neural networks to model spatial interactions. However, these approaches do not faithfully capture temporal correlations since the… ▽ More Modeling the complex three-dimensional (3D) dynamics of relational systems is an important problem in the natural sciences, with applications ranging from molecular simulations to particle mechanics. Machine learning methods have achieved good success by learning graph neural networks to model spatial interactions. However, these approaches do not faithfully capture temporal correlations since they only model next-step predictions. In this work, we propose Equivariant Graph Neural Operator (EGNO), a novel and principled method that directly models dynamics as trajectories instead of just next-step prediction. Different from existing methods, EGNO explicitly learns the temporal evolution of 3D dynamics where we formulate the dynamics as a function over time and learn neural operators to approximate it. To capture the temporal correlations while kee** the intrinsic SE(3)-equivariance, we develop equivariant temporal convolutions parameterized in the Fourier space and build EGNO by stacking the Fourier layers over equivariant networks. EGNO is the first operator learning framework that is capable of modeling solution dynamics functions over time while retaining 3D equivariance. Comprehensive experiments in multiple domains, including particle simulations, human motion capture, and molecular dynamics, demonstrate the significantly superior performance of EGNO against existing methods, thanks to the equivariant temporal modeling. Our code is available at https://github.com/MinkaiXu/egno. △ Less

Submitted 2 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.10189 [pdf, other]

Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction

Authors: Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji

Abstract: Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq)… ▽ More Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively. △ Less

Submitted 29 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 16 pages. Accepted by Findings of the Association for Computational Linguistics: EACL 2024. Code and resources are available at https://github.com/EagleW/Chem-FINESE

arXiv:2401.08177 [pdf]

Controllable distant interactions at bound state in the continuum

Authors: Haijun Tang, Can Huang, Yuhan Wang, Xiong Jiang, Shumin Xiao, Jiecai Han, Qinghai Song

Abstract: Distant interactions at arbitrary locations and their dynamic control are fundamentally important for realizing large-scale photonic and quantum circuits. Conventional approaches suffer from short coupling distance, poor controllability, fixed locations and low wavelength uniformity, significantly restricting the scalability of photonic and quantum networks. Here, we exploit the intrinsic advantag… ▽ More Distant interactions at arbitrary locations and their dynamic control are fundamentally important for realizing large-scale photonic and quantum circuits. Conventional approaches suffer from short coupling distance, poor controllability, fixed locations and low wavelength uniformity, significantly restricting the scalability of photonic and quantum networks. Here, we exploit the intrinsic advantages of optical bound state in the continuum (BIC) and demonstrate an all-in-one solution for dynamically controllable long-range interactions. BIC metasurface can support a series of finite-sized quasi-BIC microlasers at arbitrary locations. The quasi-BICs microlasers have the same wavelength and are inherently connected through BIC waveguide. Consequently, the coupling distances in experiment increase significantly from subwavelength to tens of micrometers. Such long-range interaction in BIC metasurface enables scaling to two-dimensional architectures and ultrafast control of internal laser actions, e.g., non-Hermitian zero-mode lasing and enhanced optical gain. This research shall facilitate the advancement of scalable and reconfigurable photonic networks. △ Less

Submitted 5 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 12 pages, 4 figures

arXiv:2401.06059 [pdf, other]

Investigating Data Contamination for Pre-training Language Models

Authors: Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo

Abstract: Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding… ▽ More Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding of how this potential contamination might influence LMs' performance on downstream tasks. In this paper, we explore the impact of data contamination at the pre-training stage by pre-training a series of GPT-2 models \textit{from scratch}. We highlight the effect of both text contamination (\textit{i.e.}\ input text of the evaluation samples) and ground-truth contamination (\textit{i.e.}\ the prompts asked on the input and the desired outputs) from evaluation data. We also investigate the effects of repeating contamination for various downstream tasks. Additionally, we examine the prevailing n-gram-based definitions of contamination within current LLM reports, pinpointing their limitations and inadequacy. Our findings offer new insights into data contamination's effects on language model capabilities and underscore the need for independent, comprehensive contamination assessments in LLM studies. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 16 pages, 5 figures

arXiv:2401.05970 [pdf]

On-chip wavelength division multiplexing by angled multimode interferometer fabricated on erbium-doped thin film lithium niobate on insulator

Authors: **li Han, Rui Bao, Rongbo Wu, Zhaoxiang Liu, Zhe Wang, Chao Sun, Zhihao Zhang, Mengqi Li, Zhiwei Fang, Min Wang, Haisu Zhang, Ya Cheng

Abstract: Photonic integrated circuits based on erbium doped thin film lithium niobate on insulator has attracted broad interests with insofar various waveguide amplifiers and microlasers demonstrated. Wideband operation facilitated by the broadband absorption and emission of erbium ions necessitates the functional integration of wavelength filter and multiplexer on the same chip. Here a low-loss wavelength… ▽ More Photonic integrated circuits based on erbium doped thin film lithium niobate on insulator has attracted broad interests with insofar various waveguide amplifiers and microlasers demonstrated. Wideband operation facilitated by the broadband absorption and emission of erbium ions necessitates the functional integration of wavelength filter and multiplexer on the same chip. Here a low-loss wavelength division multiplexer at the resonant pum** and emission wavelengths (~1480 nm and 1530~1560 nm) of erbium ions based on angled multimode interferometer, is realized in the erbium doped thin film lithium niobate on insulator fabricated by the photolithography assisted chemomechanical etching technique. The minimum on-chip insertion losses of the fabricated device are <0.7 dB for both wavelength ranges, and a 3-dB bandwidth of >20 nm is measured at the telecom C-band. Besides, direct visualization of the multimode interference pattern by the visible upconversion fluorescence of erbium ions compares well with the simulated light propagation in the multimode interferometer. Spectral tuning of the wavelength division multiplexer by structural design is also demonstrated and discussed. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 11 pages, 5 figures

arXiv:2401.05893 [pdf, ps, other]

$R^2$ corrections to holographic heavy meson dissociation

Authors: Zhou-Run Zhu, Manman Sun, Rui Zhou, **zhong Han

Abstract: In this paper, we study the $R^2$ corrections to the spectral functions of heavy mesons in Gauss-Bonnet gravity. We discuss the effect of Gauss-Bonnet parameter $λ_{GB}$ on the 1S states and 2S states of charmonium and bottomonium. It is found that $λ_{GB}$ reduces the height and increases the width of the 1S states peak. The 2S states of charmonium and bottomonium dissociate gradually as increasi… ▽ More In this paper, we study the $R^2$ corrections to the spectral functions of heavy mesons in Gauss-Bonnet gravity. We discuss the effect of Gauss-Bonnet parameter $λ_{GB}$ on the 1S states and 2S states of charmonium and bottomonium. It is found that $λ_{GB}$ reduces the height and increases the width of the 1S states peak. The 2S states of charmonium and bottomonium dissociate gradually as increasing $λ_{GB}$. It is obvious that $λ_{GB}$ enhances the dissociation of charmonium and bottomonium. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 13 pages, 4 figures

arXiv:2401.05850 [pdf, other]

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Authors: Yadong Guan, Jiqing Han, Hongwei Song, Wenjie Song, Guibin Zheng, Tieran Zheng, Yongjun He

Abstract: Overlap** sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlap** events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning fram… ▽ More Overlap** sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlap** events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: accepted by icassp2024

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2401.05123 [pdf, other]

Individual subpulses of PSR B1916+14 and their polarization properties

Authors: Tao Wang, C. Wang, J. L. Han, N. N. Cai, W. C. **g, Yi Yan, P. F. Wang

Abstract: Individual subpulses of pulsars are regarded as the basic emission components, providing invaluable information to understand the radio emission process in the pulsar magnetosphere. Nevertheless, subpulses are overlapped with each other along the rotation phase for most pulsars, making it difficult to study the statistical properties of subpulses. Among the pulsars observed by the Five-hundred-met… ▽ More Individual subpulses of pulsars are regarded as the basic emission components, providing invaluable information to understand the radio emission process in the pulsar magnetosphere. Nevertheless, subpulses are overlapped with each other along the rotation phase for most pulsars, making it difficult to study the statistical properties of subpulses. Among the pulsars observed by the Five-hundred-meter Aperture Spherical radio Telescope, PSR B1916+14 has a large number of isolated well-resolved subpulses in the high time resolution observations, having a typical width of 0.15 ms and a high linear polarization. We find that the number distribution of subpulses contributes dominantly to the mean profile. According to the emission geometry, these emission units come from a region roughly 155 km above the polar cap in the pulsar magnetosphere, and the length scale of basic emission units is approximately 120 m. The deviations of polarization position angles for these single subpulses from the standard S-shaped curve are closely related to their fractional linear and circular polarization, and the large deviations tend to come from drifting subpulses. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 11 pages, 10 figures, accepted for MNRAS after a minor revision

arXiv:2401.04316 [pdf, other]

doi 10.1109/TIE.2019.2956414

Robust Control of An Aerial Manipulator Based on A Variable Inertia Parameters Model

Authors: Guangyu Zhang, Yuqing He, Bo Dai, Feng Gu, Jianda Han, Guangjun Liu

Abstract: Aerial manipulator, which is composed of an UAV (Unmanned Aerial Vehicle) and a multi-link manipulator and can perform aerial manipulation, has shown great potential of applications. However, dynamic coupling between the UAV and the manipulator makes it difficult to control the aerial manipulator with high performance. In this paper, system modeling and control problem of the aerial manipulator ar… ▽ More Aerial manipulator, which is composed of an UAV (Unmanned Aerial Vehicle) and a multi-link manipulator and can perform aerial manipulation, has shown great potential of applications. However, dynamic coupling between the UAV and the manipulator makes it difficult to control the aerial manipulator with high performance. In this paper, system modeling and control problem of the aerial manipulator are studied. Firstly, an UAV dynamic model is proposed with consideration of the dynamic coupling from an attached manipulator, which is treated as disturbance for the UAV. In the dynamic model, the disturbance is affected by the variable inertia parameters of the aerial manipulator system. Then, based on the proposed dynamic model, a disturbance compensation robust $H_{\infty}$ controller is designed to stabilize flight of the UAV while the manipulator is in operation. Finally, experiments are conducted and the experimental results demonstrate the feasibility and validity of the proposed control scheme. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: IEEE Trans. Ind. Electron. 67(2020)9515-9525

arXiv:2401.01490 [pdf]

Chirality tuning and reversing with resonant phase-change metasurfaces

Authors: Xinbo Sha, Kang Du, Yixuan Zeng, Fangxing Lai, Jun Yin, Hanxu Zhang, Bo Song, Jiecai Han, Shumin Xiao, Yuri Kivshar, Qinghai Song

Abstract: Dynamic control of circular dichroism in photonic structures is critically important for compact spectrometers, stereoscopic displays, and information processing exploiting multiple degrees of freedom. Metasurfaces can help miniaturize chiral devices but only produce static and limited chiral responses. While external stimuli are able to tune resonances, their modulations are often weak, and rever… ▽ More Dynamic control of circular dichroism in photonic structures is critically important for compact spectrometers, stereoscopic displays, and information processing exploiting multiple degrees of freedom. Metasurfaces can help miniaturize chiral devices but only produce static and limited chiral responses. While external stimuli are able to tune resonances, their modulations are often weak, and reversing continuously the sign of circular dichroism is extremely challenging. Here, we demonstrate dynamically tunable chiral response of resonant metasurfaces supporting chiral bound states in the continuum combining them with phase-change materials. Phase transition between amorphous and crystalline phases allows to control chiral response and vary chirality rapidly from -0.947 to +0.958 backward and forward via chirality continuum. Our demonstrations underpin the rapid development of chiral photonics and its applications. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures

arXiv:2401.01082 [pdf, ps, other]

Quantization effects for multi-component Ginzburg-Landau vortices

Authors: Rejeb Hadiji, Jongmin Han, Juhee Sohn

Abstract: In this paper, we are concerned with $n$-component Ginzburg-Landau equations on $\rtwo$.By introducing a diffusion constant for each component, we discuss that the $n$-component equations are different from $n$-copies of the single Ginzburg-Landau equations.Then, the results of Brezis-Merle-Riviere for the single Ginzburg-Landau equation can be nontrivially extended to the multi-component case.Fi… ▽ More In this paper, we are concerned with $n$-component Ginzburg-Landau equations on $\rtwo$.By introducing a diffusion constant for each component, we discuss that the $n$-component equations are different from $n$-copies of the single Ginzburg-Landau equations.Then, the results of Brezis-Merle-Riviere for the single Ginzburg-Landau equation can be nontrivially extended to the multi-component case.First, we show that if the solutions have their gradients in $L^2$ space, they are trivial solutions.Second, we prove that if the potential is square summable, then it has quantized integrals, i.e., there exists one-to-one correspondence between the possible values of the potential energy and $\nat^n$.Third, we show that different diffusion coefficients in the system are important to obtain nontrivial solutions of $n$-component equations. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00988 [pdf, other]

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

Authors: Xinpeng Ding, **ahua Han, Hang Xu, Xiaodan Liang, Wei Zhang, Xiaomeng Li

Abstract: The rise of multimodal large language models (MLLMs) has spurred interest in language-based driving tasks. However, existing research typically focuses on limited tasks and often omits key multi-view and temporal information which is crucial for robust autonomous driving. To bridge these gaps, we introduce NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks, where eac… ▽ More The rise of multimodal large language models (MLLMs) has spurred interest in language-based driving tasks. However, existing research typically focuses on limited tasks and often omits key multi-view and temporal information which is crucial for robust autonomous driving. To bridge these gaps, we introduce NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks, where each task demands holistic information (e.g., temporal, multi-view, and spatial), significantly elevating the challenge level. To obtain NuInstruct, we propose a novel SQL-based method to generate instruction-response pairs automatically, which is inspired by the driving logical progression of humans. We further present BEV-InMLLM, an end-to-end method for efficiently deriving instruction-aware Bird's-Eye-View (BEV) features, language-aligned for large language models. BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks. Moreover, our proposed BEV injection module is a plug-and-play method for existing MLLMs. Our experiments on NuInstruct demonstrate that BEV-InMLLM significantly outperforms existing MLLMs, e.g. around 9% improvement on various tasks. We plan to release our NuInstruct for future research development. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.17459 [pdf, other]

Measuring the conditional luminosity and stellar mass functions of galaxies by combining the DESI LS DR9, SV3 and Y1 data

Authors: Yirong Wang, Xiaohu Yang, Yizhou Gu, Xiaoju Xu, Haojie Xu, Yuyu Wang, Antonios Katsianis, Jiaxin Han, Min He, Yunliang Zheng, Qingyang Li, Yaru Wang, Wensheng Hong, Jiaqi Wang, Zhenlin Tan, Hu Zou, Johannes Ulf Lange, ChangHoon Hahn, Peter Behroozi, Jessica Nicole Aguilar, Steven Ahlen, David Brooks, Todd Claybaugh, Shaun Cole, Axel de la Macorra , et al. (20 additional authors not shown)

Abstract: In this investigation, we leverage the combination of Dark Energy Spectroscopic Instrument Legacy imaging Surveys Data Release 9 (DESI LS DR9), Survey Validation 3 (SV3), and Year 1 (Y1) data sets to estimate the conditional luminosity and stellar mass functions (CLFs & CSMFs) of galaxies across various halo mass bins and redshift ranges. To support our analysis, we utilize a realistic DESI Mock G… ▽ More In this investigation, we leverage the combination of Dark Energy Spectroscopic Instrument Legacy imaging Surveys Data Release 9 (DESI LS DR9), Survey Validation 3 (SV3), and Year 1 (Y1) data sets to estimate the conditional luminosity and stellar mass functions (CLFs & CSMFs) of galaxies across various halo mass bins and redshift ranges. To support our analysis, we utilize a realistic DESI Mock Galaxy Redshift Survey (MGRS) generated from a high-resolution Jiutian simulation. An extended halo-based group finder is applied to both MGRS catalogs and DESI observation. By comparing the r and z-band luminosity functions (LFs) and stellar mass functions (SMFs) derived using both photometric and spectroscopic data, we quantified the impact of photometric redshift (photo-z) errors on the galaxy LFs and SMFs, especially in the low redshift bin at low luminosity/mass end. By conducting prior evaluations of the group finder using MGRS, we successfully obtain a set of CLF and CSMF measurements from observational data. We find that at low redshift the faint end slopes of CLFs and CSMFs below $10^{9}h^{-2}L_{\odot}$ (or $h^{-2}M_{\odot}$) evince a compelling concordance with the subhalo mass functions. After correcting the cosmic variance effect of our local Universe following arXiv:1809.00523, the faint end slopes of the LFs/SMFs turn out to be also in good agreement with the slope of the halo mass function. △ Less

Submitted 22 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 28 pages, 13 figures, Accepted for publication in ApJ

arXiv:2312.16954 [pdf, other]

Blockchain-based Privacy-Preserving Public Key Searchable Encryption with Strong Traceability

Authors: Yue Han, **guang Han, Weizhi Meng, Jianchang Lai, Ge Wu

Abstract: Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they search… ▽ More Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they searched is quite important to against the abuse of data. It is a challenge to bind users' identities and keywords while protecting their privacy. Moreover, existing traceable PKSE schemes do not consider the unforgeability and immutability of trapdoor query records, which can lead to the occurrence of frame-up and denying. In this paper, to solve these problems, we propose a blockchain-based privacy-preserving PKSE with strong traceability (BP3KSEST) scheme. Our scheme provides the following features: (1) authorized users can authenticate to trapdoor generation center and obtain trapdoors without releasing their identities and keywords; (2) when data users misbehave in the system, the trusted third party (TTP) can trace both their identities and the keywords which they searched; (3) trapdoor query records are unforgeable; (4) trapdoor query records are immutable because records are stored in blockchain. Notably, this scheme is suitable to the scenarios where privacy must be considered, e.g., electronic health record (EHR). We formalize both the definition and security model of our BP3KSEST scheme, and present a concrete construction. Furthermore, the security of the proposed scheme is formally proven. Finally, the implementation and evaluation are conducted to analyze its efficiency. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16486 [pdf, other]

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Authors: Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu

Abstract: Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and oper… ▽ More Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and operating in unique latent spaces, poses a challenge due to incompatible image resolutions and latent space embedding structures, hindering their joint use. Addressing these constraints, we present "PanGu-Draw", a novel latent diffusion model designed for resource-efficient text-to-image synthesis that adeptly accommodates multiple control signals. We first propose a resource-efficient Time-Decoupling Training Strategy, which splits the monolithic text-to-image model into structure and texture generators. Each generator is trained using a regimen that maximizes data utilization and computational efficiency, cutting data preparation by 48% and reducing training resources by 51%. Secondly, we introduce "Coop-Diffusion", an algorithm that enables the cooperative use of various pre-trained diffusion models with different latent spaces and predefined resolutions within a unified denoising process. This allows for multi-control image synthesis at arbitrary resolutions without the necessity for additional data or retraining. Empirical validations of Pangu-Draw show its exceptional prowess in text-to-image and multi-control image generation, suggesting a promising direction for future model training efficiencies and generation versatility. The largest 5B T2I PanGu-Draw model is released on the Ascend platform. Project page: $\href{https://pangu-draw.github.io}{this~https~URL}$ △ Less

Submitted 28 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 16 pages, 16 figures

arXiv:2312.16145 [pdf, other]

One-Dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Authors: Mengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan **, Yuan He, Hui Xue, Jungong Han, Guiguang Ding

Abstract: The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing methods in academia are all based on full parameter or specification-based fine-tuning, from which we observe the following issues: 1) Generation alternation towards erosion: Parameter drift during target elimination ca… ▽ More The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing methods in academia are all based on full parameter or specification-based fine-tuning, from which we observe the following issues: 1) Generation alternation towards erosion: Parameter drift during target elimination causes alternations and potential deformations across all generations, even eroding other concepts at varying degrees, which is more evident with multi-concept erased; 2) Transfer inability & deployment inefficiency: Previous model-specific erasure impedes the flexible combination of concepts and the training-free transfer towards other models, resulting in linear cost growth as the deployment scenarios increase. To achieve non-invasive, precise, customizable, and transferable elimination, we ground our erasing framework on one-dimensional adapters to erase multiple concepts from most DMs at once across versatile erasing applications. The concept-SemiPermeable structure is injected as a Membrane (SPM) into any DM to learn targeted erasing, and meantime the alteration and erosion phenomenon is effectively mitigated via a novel Latent Anchoring fine-tuning strategy. Once obtained, SPMs can be flexibly combined and plug-and-play for other DMs without specific re-tuning, enabling timely and efficient adaptation to diverse scenarios. During generation, our Facilitated Transport mechanism dynamically regulates the permeability of each SPM to respond to different input prompts, further minimizing the impact on other concepts. Quantitative and qualitative results across ~40 concepts, 7 DMs and 4 erasing applications have demonstrated the superior erasing of SPM. Our code and pre-tuned SPMs are available on the project page https://lyumengyao.github.io/projects/spm. △ Less

Submitted 11 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: CVPR 2024

arXiv:2312.15659 [pdf, other]

Perceptual Quality Assessment for Video Frame Interpolation

Authors: **liang Han, Xiongkuo Min, Yixuan Gao, Jun Jia, Lei Sun, Zuowei Cao, Yonglin Luo, Guangtao Zhai

Abstract: The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate… ▽ More The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate the quality of VFI frames without reference videos, a no-reference perceptual quality assessment method is proposed in this paper. This method is more compatible with VFI application and the evaluation scores from it are consistent with human subjective opinions. A new quality assessment dataset for VFI was constructed through subjective experiments firstly, to assess the opinion scores of interpolated frames. The dataset was created from triplets of frames extracted from high-quality videos using 9 state-of-the-art VFI algorithms. The proposed method evaluates the perceptual coherence of frames incorporating the original pair of VFI inputs. Specifically, the method applies a triplet network architecture, including three parallel feature pipelines, to extract the deep perceptual features of the interpolated frame as well as the original pair of frames. Coherence similarities of the two-way parallel features are jointly calculated and optimized as a perceptual metric. In the experiments, both full-reference and no-reference quality assessment methods were tested on the new quality dataset. The results show that the proposed method achieves the best performance among all compared quality assessment methods on the dataset. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures

ACM Class: I.4.0

arXiv:2312.15402 [pdf, other]

An Unfitted Interface Penalty DG--FE Method for Elliptic Interface Problems

Authors: Juan Han, Haijun Wu, Yuanming Xiao

Abstract: We design an unfitted interface penalty DG-FE method (UIPDG-FEM) for an elliptic interface problem, which uses the interior penalty discontinuous Galerkin methods locally along the interface together with additional penalty terms on the interface (or the Nitsche's trick) to deal with the jump conditions, and uses the finite element methods away from the interface. Moreover, the trick of merging el… ▽ More We design an unfitted interface penalty DG-FE method (UIPDG-FEM) for an elliptic interface problem, which uses the interior penalty discontinuous Galerkin methods locally along the interface together with additional penalty terms on the interface (or the Nitsche's trick) to deal with the jump conditions, and uses the finite element methods away from the interface. Moreover, the trick of merging elements is used to keep the condition number of the algebraic system not affected by the interface position. The proposed UIPDG-FEM not only possesses flexibilities of the IPDG method, in particular, simplifying the process of merging elements near a complex interface, but also avoids its drawback of larger number of global degrees of freedom. The convergence rates of the UIPDG-FEM solution are optimal and independent of the interface position. Furthermore, a uniform estimate of the flux value is established in terms of the discontinuous physical coefficients. A two dimensional merging algorithm is also presented, which is guaranteed to succeed under appropriate assumptions on the interface. Numerical examples are given to verify the theoretical results. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 33 pages, 15 figures

arXiv:2312.15397 [pdf, ps, other]

On the equivalence between the effective adjunction conjectures of Prokhorov-Shokurov and of Li

Authors: **gjun Han, Jihao Liu, Qingyuan Xue

Abstract: Prokhorov and Shokurov introduced the famous effective adjunction conjecture, also known as the effective base-point-freeness conjecture. This conjecture asserts that the moduli component of an lc-trivial fibration is effectively base-point-free. Li proposed a variation of this conjecture, which is known as the $Γ$-effective adjunction conjecture, and proved that a weaker version of his conjecture… ▽ More Prokhorov and Shokurov introduced the famous effective adjunction conjecture, also known as the effective base-point-freeness conjecture. This conjecture asserts that the moduli component of an lc-trivial fibration is effectively base-point-free. Li proposed a variation of this conjecture, which is known as the $Γ$-effective adjunction conjecture, and proved that a weaker version of his conjecture is implied by the original Prokhorov-Shokurov conjecture. In this paper, we establish the equivalence of Prokhorov-Shokurov's and Li's effective adjunction conjectures. The key to our proof is the formulation of a uniform rational polytope for canonical bundle formulas, which relies on recent developments in the minimal model program theory of algebraically integrable foliations by Ambro-Cascini-Shokurov-Spicer and Chen-Han-Liu-Xie. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 13 pages. arXiv admin note: text overlap with arXiv:2309.15823

MSC Class: 14E30; 37F75

arXiv:2312.14462 [pdf]

Tailoring Interlayer Chiral Exchange by Azimuthal Symmetry Engineering

Authors: Yu-Hao Huang, Jui-Hsu Han, Wei-Bang Liao, Chen-Yu Hu, Yan-Ting Liu, Chi-Feng Pai

Abstract: Recent theoretical and experimental studies of the interlayer Dzyaloshinskii-Moriya interaction (DMI) has sparked great interest in its implementation into practical magnetic random-access memory (MRAM) devices, due to its capability to mediate long-range chiral spin textures. So far, experimental reports focused on the observation of interlayer DMI, leaving the development of strategies to contro… ▽ More Recent theoretical and experimental studies of the interlayer Dzyaloshinskii-Moriya interaction (DMI) has sparked great interest in its implementation into practical magnetic random-access memory (MRAM) devices, due to its capability to mediate long-range chiral spin textures. So far, experimental reports focused on the observation of interlayer DMI, leaving the development of strategies to control interlayer DMI's magnitude unaddressed. Here, we introduce an azimuthal symmetry engineering protocol capable of additive/subtractive tuning of interlayer DMI through the control of wedge deposition of separate layers, and demonstrate its capability to mediate field-free spin-orbit torque (SOT) magnetization switching in both orthogonally magnetized and synthetic antiferromagnetically coupled systems. Furthermore, we showcase the spatial inhomogeneity brought about by wedge depositon can be suppressed by specific azimuthal engineering design, ideal for practical implementation. Our findings provide guidelines for effective manipulations of interlayer DMI strength, beneficial for future design of SOT-MRAM or other spintronic devices utilizing interlayer DMI. △ Less

Submitted 25 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 32 pages, 10 figures

arXiv:2312.13986 [pdf]

Deep Learning Enabled Design of Terahertz High-Q Metamaterials

Authors: Shan Yin, Haotian Zhong, Wei Huang, Wentao Zhang, Jiaguang Han

Abstract: Metamaterials open up a new way to manipulate electromagnetic waves and realize various functional devices. Metamaterials with high-quality (Q) resonance responses are widely employed in sensing, detection, and other applications. Traditional design of metamaterials involves laborious simulation-optimization and limits the efficiency. The high-Q metamaterials with abrupt spectral change are even h… ▽ More Metamaterials open up a new way to manipulate electromagnetic waves and realize various functional devices. Metamaterials with high-quality (Q) resonance responses are widely employed in sensing, detection, and other applications. Traditional design of metamaterials involves laborious simulation-optimization and limits the efficiency. The high-Q metamaterials with abrupt spectral change are even harder to reverse design on-demand. In this paper, we propose novel solutions for designing terahertz high-Q metamaterials based on deep learning, including the forward prediction of spectral responses and the inverse design of structural parameters. For the forward prediction, we develop the Electromagnetic Response Transformer (ERT) model to establish the complex map** relations between the highly sensitive structural parameters and the abrupt spectra, and realize precise prediction of the high-Q resonance in terahertz spectra from given structural parameters. For the inverse design, we introduce the Visual Attention Network (VAN) model with a large model capability to attentively learn the abrupt shifts in spectral resonances, which can efficiently reduce errors and achieve highly accurate inverse design of structural parameters according to the expected high-Q resonance responses. Both models exhibit outstanding performance, and the accuracy is improved one or two orders higher compared to the traditional machine learning methods. Besides, our ERT model can be 4000 times faster than the conventional full wave simulations in computation time. Our work provides new avenues for the deep learning enabled design of terahertz high-Q metamaterials, which holds potential applications in various fields, such as terahertz communication, sensing, imaging, and functional devices. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 17 pages, 6 figures

arXiv:2312.13495 [pdf, other]

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

Authors: Jie Han, Yixiong Zou, Haozhao Wang, Jun Wang, Wei Liu, Yao Wu, Tao Zhang, Ruixuan Li

Abstract: Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among… ▽ More Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among source domains and target domains. For instance, transferring domain-specific-knowledge-related experience is difficult. To tackle this problem, we propose a new method that explicitly decouples the transferring of general-semantic-representation-related experience and the domain-specific-knowledge-related experience. Specifically, for domain-specific-knowledge-related experience, we design two modules to capture intent-slot relation and slot-slot relation respectively. Extensive experiments on Snips and FewJoint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1-shot setting, and from 46.54% to 60.79% in the 5-shot setting. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 9 pages, 4 figures

arXiv:2312.12889 [pdf]

Singular Hall response from a correlated ferromagnetic flat nodal-line semimetal

Authors: Woohyun Cho, Yoon-Gu Kang, Jaehun Cha, Dong Hyun David Lee, Do Hoon Kiem, Jaewhan Oh, Jongho Park, Changyoung Kim, Yongsoo Yang, Yeong Kwan Kim, Myung Joon Han, Heejun Yang

Abstract: Topological quantum phases have been largely understood in weakly correlated systems, which have identified various quantum phenomena such as spin Hall effect, protected transport of helical fermions, and topological superconductivity. Robust ferromagnetic order in correlated topological materials particularly attracts attention, as it can provide a versatile platform for novel quantum devices. He… ▽ More Topological quantum phases have been largely understood in weakly correlated systems, which have identified various quantum phenomena such as spin Hall effect, protected transport of helical fermions, and topological superconductivity. Robust ferromagnetic order in correlated topological materials particularly attracts attention, as it can provide a versatile platform for novel quantum devices. Here, we report singular Hall response arising from a unique band structure of flat topological nodal lines in combination with electron correlation in an itinerant, van der Waals ferromagnetic semimetal, Fe3GaTe2, with a high Curie temperature of Tc=360 K. High anomalous Hall conductivity violating the conventional scaling, resistivity upturn at low temperature, and a large Sommerfeld coefficient are observed in Fe3GaTe2, which implies heavy fermion features in this ferromagnetic topological material. Our circular dichroism in angle-resolved photoemission spectroscopy and theoretical calculations support the original electronic features in the material. Thus, low-dimensional Fe3GaTe2 with electronic correlation, topology, and room-temperature ferromagnetic order appears to be a promising candidate for robust quantum devices. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.12529 [pdf, other]

doi 10.1051/0004-6361/202348980

The history and mass content of cluster galaxies in the EAGLE simulation

Authors: Cristóbal Sifón, Jiaxin Han

Abstract: We explore the mass content of galaxies residing in galaxy clusters at z=0 in the EAGLE hydrodynamical simulation as well as the galaxies' mass build-up through cosmic time. We use a galaxy catalogue generated with the HBT+ algorithm, which identifies subhaloes consistently over time by tracking their dynamical evolution throughout the simulation. The satellite subhalo-to-stellar mass relation (SH… ▽ More We explore the mass content of galaxies residing in galaxy clusters at z=0 in the EAGLE hydrodynamical simulation as well as the galaxies' mass build-up through cosmic time. We use a galaxy catalogue generated with the HBT+ algorithm, which identifies subhaloes consistently over time by tracking their dynamical evolution throughout the simulation. The satellite subhalo-to-stellar mass relation (SHSMR) is well described by a double power-law. At stellar masses $9<\log m_\star/\mathrm{M}_\odot<10$, satellites have 20-25% the subhalo mass of central galaxies at fixed stellar mass. At high stellar masses the satellite SHSMR is consistent with that of centrals. The satellite SHSMR decreases steeply for satellites closer to the cluster centre, even in projection, broadly consistent with recent weak lensing measurements. The scatter in the satellite SHSMR is larger than that of central galaxies at all cluster masses and cluster-centric distances $R<R_\mathrm{200m}$. The SHSMR scatter decreases with stellar mass by about 12% over an order of magnitude, but this dependence can be explained by the mixing of infall times. There is significant dark matter preprocessing; the most recent infallers into massive clusters had already lost up to 50% of their dark matter by the time of infall, particularly if they fell in indirectly as satellites of another host. On the contrary, on average satellite galaxies are still gaining stellar mass at the time of infall and they do so for another 2 Gyr afterwards, although we see evidence of a slowing growth for indirect infallers. Overall, pre- and post-processing each have similar impacts on the satellite SHSMR. Finally, we provide a simple prescription to infer the mean mass loss experienced by satellites as a function of cluster-centric distance based on a comparison to central galaxies, convenient for observational weak lensing measurements. [abridged] △ Less

Submitted 28 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 19 pages, 16 figures including 2 short appendices.Accepted for publication in A&A. Code is available at https://github.com/cristobal-sifon/eagle-satellites

Journal ref: A&A 686, A163 (2024)

arXiv:2312.11842 [pdf, other]

Neural operator-based super-fidelity: A warm-start approach for accelerating steady-state simulations

Authors: Xu-Hui Zhou, Jiequn Han, Muhammad I. Zafar, Christopher J. Roy, Heng Xiao

Abstract: In recent years, using neural networks to speed up the solving of partial differential equations (PDEs) has gained significant traction in both academic and industrial settings. However, the use of neural networks as standalone surrogate models raises concerns about the reliability of solutions due to their dependence on data volume, quality, and training algorithms, especially in precision-critic… ▽ More In recent years, using neural networks to speed up the solving of partial differential equations (PDEs) has gained significant traction in both academic and industrial settings. However, the use of neural networks as standalone surrogate models raises concerns about the reliability of solutions due to their dependence on data volume, quality, and training algorithms, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method, which uses neural networks for initial warm-starting in solving steady-state PDEs, ensuring both speed and accuracy. Drawing from super-resolution concepts in computer vision, our approach maps low-fidelity model solutions to high-fidelity targets using a vector-cloud neural network with equivariance (VCNN-e), maintaining all necessary invariance and equivariance properties for scalar and vector solutions. This method adapts well to different spatial resolutions. We tested this approach in two scientific computing scenarios: one with weak nonlinearity, using low Reynolds number flows around elliptical cylinders, and another with strong nonlinearity, using high Reynolds number flows over airfoils. In both cases, our neural operator-based initialization significantly accelerated convergence by at least two-fold, without sacrificing accuracy, compared to traditional methods. Its robustness is confirmed across various iterative algorithms with different linear equation solvers. The approach also demonstrated time savings in multiple simulations, even including model development time. Additionally, we propose an efficient training data generation strategy. Overall, our method offers an efficient way to accelerate steady-state PDE solutions using neural operators without loss of accuracy, especially relevant in precision-focused scientific applications. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Showing 151–200 of 2,291 results for author: Han, J