-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission
Authors:
Han Xiao,
Wenqiang Tian,
Shi **,
Wendong Liu,
Jia Shen,
Zhihua Shi,
Zhi Zhang
Abstract:
In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai…
▽ More
In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Authors:
Le Zhuo,
Ruoyi Du,
Han Xiao,
Yangguang Li,
Dongyang Liu,
Rongjie Huang,
Wenze Liu,
Lirui Zhao,
Fu-Yun Wang,
Zhanyu Ma,
Xu Luo,
Zehan Wang,
Kaipeng Zhang,
Xiangyang Zhu,
Si Liu,
Xiangyu Yue,
Dingning Liu,
Wanli Ouyang,
Ziwei Liu,
Yu Qiao,
Hongsheng Li,
Peng Gao
Abstract:
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu…
▽ More
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Enhancing Tool Retrieval with Iterative Feedback from Large Language Models
Authors:
Qiancheng Xu,
Yongqi Li,
Heming Xia,
Wenjie Li
Abstract:
Tool learning aims to enhance and expand large language models' (LLMs) capabilities with external tools, which has gained significant attention recently. Current methods have shown that LLMs can effectively handle a certain amount of tools through in-context learning or fine-tuning. However, in real-world scenarios, the number of tools is typically extensive and irregularly updated, emphasizing th…
▽ More
Tool learning aims to enhance and expand large language models' (LLMs) capabilities with external tools, which has gained significant attention recently. Current methods have shown that LLMs can effectively handle a certain amount of tools through in-context learning or fine-tuning. However, in real-world scenarios, the number of tools is typically extensive and irregularly updated, emphasizing the necessity for a dedicated tool retrieval component. Tool retrieval is nontrivial due to the following challenges: 1) complex user instructions and tool descriptions; 2) misalignment between tool retrieval and tool usage models. To address the above issues, we propose to enhance tool retrieval with iterative feedback from the large language model. Specifically, we prompt the tool usage model, i.e., the LLM, to provide feedback for the tool retriever model in multi-round, which could progressively improve the tool retriever's understanding of instructions and tools and reduce the gap between the two standalone components. We build a unified and comprehensive benchmark to evaluate tool retrieval models. The extensive experiments indicate that our proposed approach achieves advanced performance in both in-domain evaluation and out-of-domain evaluation.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Data-Driven Turbulence Modeling Approach for Cold-Wall Hypersonic Boundary Layers
Authors:
Muhammad I. Zafar,
Xuhui Zhou,
Christopher J. Roy,
David Stelter,
Heng Xiao
Abstract:
Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence mo…
▽ More
Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence model is used to provide closure map** from mean flow quantities to Reynolds stress as well as a variable turbulent Prandtl number. Sparse observation data of velocity and temperature are used to train the turbulence model. This approach is analyzed using direct numerical simulation database for boundary layer flows over a flat plate with a Mach number between 6 and 14 and wall-to-recovery temperature ratios ranging from 0.18 to 0.76. Two training cases are conducted: 1) a single training case with observation data from one flow case, 2) a joint training case where data from two flow cases are simultaneously used for training. Trained models are also tested for generalizability on the remaining flow cases in each of the training cases. The results are also analyzed for insights to inform the future work towards enhancing the generalizability of the learned turbulence model.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack
Authors:
Hanfeng Xia,
Haibo Hong,
Ruili Wang
Abstract:
Backdoor attacks involve the injection of a limited quantity of poisoned examples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This paper explores strate…
▽ More
Backdoor attacks involve the injection of a limited quantity of poisoned examples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.We primarily leverage two key characteristics of backdoor attacks: the ability for multiple backdoors to exist simultaneously within a single model, and the discovery through Composite Backdoor Attack (CBA) that altering two triggers in a sample to new target labels does not compromise the original functionality of the triggers, yet enables the prediction of the data as a new target class when both triggers are present simultaneously.Therefore, a novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution. Firstly, utilizing the identified distinctions in output between poisoned and clean samples, a subset of data is partitioned to include both poisoned and clean instances. Subsequently, benign triggers are incorporated and labels are adjusted to create new target and benign target classes, thereby prompting the poisoned and clean data to be classified as distinct entities during the inference stage. The experimental results indicate that CBPF is successful in filtering out malicious data produced by six advanced attacks on CIFAR10 and ImageNet-12. On average, CBPF attains a notable filtering success rate of 99.91% for the six attacks on CIFAR10. Additionally, the model trained on the uncontaminated samples exhibits sustained high accuracy levels.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Privacy Preserving Machine Learning for Electronic Health Records using Federated Learning and Differential Privacy
Authors:
Naif A. Ganadily,
Han J. Xia
Abstract:
An Electronic Health Record (EHR) is an electronic database used by healthcare providers to store patients' medical records which may include diagnoses, treatments, costs, and other personal information. Machine learning (ML) algorithms can be used to extract and analyze patient data to improve patient care. Patient records contain highly sensitive information, such as social security numbers (SSN…
▽ More
An Electronic Health Record (EHR) is an electronic database used by healthcare providers to store patients' medical records which may include diagnoses, treatments, costs, and other personal information. Machine learning (ML) algorithms can be used to extract and analyze patient data to improve patient care. Patient records contain highly sensitive information, such as social security numbers (SSNs) and residential addresses, which introduces a need to apply privacy-preserving techniques for these ML models using federated learning and differential privacy.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video
Authors:
Zhengbang Yang,
Haotian Xia,
**gxi Li,
Zezhi Chen,
Zhuangdi Zhu,
Weining Shen
Abstract:
Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluate…
▽ More
Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluated mainstream large language models for various sports tasks. Our evaluation spans from simple queries on basic rules and historical facts to complex, context-specific reasoning, leveraging strategies from zero-shot to few-shot learning, and chain-of-thought techniques. In addition to unimodal analysis, we further assessed the sports reasoning capabilities of mainstream video language models to bridge the gap in multimodal sports understanding benchmarking. Our findings highlighted the critical challenges of sports understanding for NLP. We proposed a new benchmark based on a comprehensive overview of existing sports datasets and provided extensive error analysis which we hope can help identify future research priorities in this field.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
TabularMark: Watermarking Tabular Datasets for Machine Learning
Authors:
Yihao Zheng,
Haocheng Xia,
Junyuan Pang,
**fei Liu,
Kui Ren,
Lingyang Chu,
Yang Cao,
Li Xiong
Abstract:
Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we…
▽ More
Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is employed, which can reliably determine the presence of the watermark. Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Warm and Fuzzy Dark Matter: Free Streaming of Wave Dark Matter
Authors:
Rayne Liu,
Wayne Hu,
Huangyu Xiao
Abstract:
Wave or fuzzy dark matter that is produced with relativistic wavenumbers exhibits free streaming effects analogous to warm or hot particle dark matter with relativistic momenta. Axions produced after inflation provide such a warm or mildly relativistic candidate, where the enhanced suppression and observational bounds are only moderately stronger than that from wave propagation of initially cold a…
▽ More
Wave or fuzzy dark matter that is produced with relativistic wavenumbers exhibits free streaming effects analogous to warm or hot particle dark matter with relativistic momenta. Axions produced after inflation provide such a warm or mildly relativistic candidate, where the enhanced suppression and observational bounds are only moderately stronger than that from wave propagation of initially cold axions. More generally, the free streaming dam** also impacts isocurvature fluctuations from generation in causally disconnected patches. As coherent spatial fluctuations free stream away they leave incoherent and transient superpositions in their wakes. These multiple wave momentum streams are the wave analogue of particle phase space fluctuations or directional collisionless dam** of massive neutrinos or hot dark matter. The observable impact on both adiabatic and isocurvature fluctuations of fuzzy dark matter can differ from their cold dark matter counterparts due to free streaming depending on how warm or hot is their momentum distribution.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba
Authors:
Ruiqi He,
Yushu He,
Longju Bai,
Jiarui Liu,
Zhenjie Sun,
Zenghao Tang,
He Wang,
Hanchen Xia,
Naihao Deng
Abstract:
Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua…
▽ More
Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning
Authors:
Lijie Hu,
Liang Liu,
Shu Yang,
Xin Chen,
Hongru Xiao,
Mengdi Li,
Pan Zhou,
Muhammad Asif Ali,
Di Wang
Abstract:
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1…
▽ More
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1) For zero-shot CoT, why does prompting the model with "let's think step by step" significantly impact its outputs? (2) For few-shot CoT, why does providing examples before questioning the model could substantially improve its reasoning ability? To answer these questions, we conduct a top-down explainable analysis from the Hopfieldian view and propose a Read-and-Control approach for controlling the accuracy of CoT. Through extensive experiments on seven datasets for three different tasks, we demonstrate that our framework can decipher the inner workings of CoT, provide reasoning error localization, and control to come up with the correct reasoning path.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Language and Multimodal Models in Sports: A Survey of Datasets and Applications
Authors:
Haotian Xia,
Zhengbang Yang,
Yun Zhao,
Yuqing Wang,
**gxi Li,
Rhys Tracy,
Zhuangdi Zhu,
Yuan-fang Wang,
Hanjie Chen,
Weining Shen
Abstract:
Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are fo…
▽ More
Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are for tasks involving text or multimodality (e.g., text, video, audio), respectively. Convertible datasets, initially single-modal (video), can be enriched with additional annotations, such as explanations of actions and video descriptions, to become multimodal, offering future potential for richer and more diverse applications. Our study highlights the contributions of these datasets to various applications, from improving fan experiences to supporting tactical analysis and medical diagnostics. We also discuss the challenges and future directions in dataset development, emphasizing the need for diverse, high-quality data to support real-time processing and personalized user experiences. This survey provides a foundational resource for researchers and practitioners aiming to leverage NLP and multimodal models in sports, offering insights into current trends and future opportunities in the field.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
Authors:
Weiyao Luo,
Suncong Zheng,
Heming Xia,
Weikang Wang,
Yan Lei,
Tianyu Liu,
Shuang Chen,
Zhifang Sui
Abstract:
Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath…
▽ More
Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Photonic realization of chiral hinge states in a Chern-insulator stack
Authors:
Han-Rong Xia,
Jia-Zheng Li,
Si-Yu Yuan,
Meng Xiao
Abstract:
Higher-order topological insulators, as a novel family of topological phases, are a hot frontier in condensed matter physics due to their adherence to unconventional bulk-boundary correspondence. A three-dimensional second-order topological insulator can support one-dimensional modes along its hinges (dubbed as hinge states). Here, we present a simple and direct method to construct chiral hinge mo…
▽ More
Higher-order topological insulators, as a novel family of topological phases, are a hot frontier in condensed matter physics due to their adherence to unconventional bulk-boundary correspondence. A three-dimensional second-order topological insulator can support one-dimensional modes along its hinges (dubbed as hinge states). Here, we present a simple and direct method to construct chiral hinge modes based on a Chern-insulator stack. We analyze the existence of the hinge modes through the nontrivial quadrupole indices, and then design a photonic crystal to realize the specific flowing pattern of the hinge mode in our model. The experimental results align well with full-wave simulations, clearly demonstrating the existence of chiral hinge states. We also verify the robustness of these hinge states against defects in our photonic system.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Twisting of Lie triple systems, $L_\infty$-algebras, and (generalized) matched pairs
Authors:
Jia Zhao,
Haobo Xia
Abstract:
In this paper, we introduce notions of (proto-, quasi-)twilled Lie triple systems and give their equivalent descriptions using the controlling algebra and bidegree convention. Then we construct an $L_\infty$-algebra via a twilled Lie triple system. Besides, we establish the twisting theory of Lie triple systems and then characterize the twisting as a Maurer-Cartan element in the constructed…
▽ More
In this paper, we introduce notions of (proto-, quasi-)twilled Lie triple systems and give their equivalent descriptions using the controlling algebra and bidegree convention. Then we construct an $L_\infty$-algebra via a twilled Lie triple system. Besides, we establish the twisting theory of Lie triple systems and then characterize the twisting as a Maurer-Cartan element in the constructed $L_\infty$-algebra. Finally, we clarify the relationship between twilled Lie triple systems and matched pairs and clarify the relationship between twilled Lie triple systems and relative Rota-Baxter operators respectively so that we obtain the relationship between matched pairs of Lie triple systems and relative Rota-Baxter operators.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Axion Stars: Mass Functions and Constraints
Authors:
Jae Hyeok Chang,
Patrick J. Fox,
Huangyu Xiao
Abstract:
The QCD axion and axion-like particles, as leading dark matter candidates, can also have interesting implications for dark matter substructures if the Peccei-Quinn symmetry is broken after inflation. In such a scenario, axion perturbations on small scales will lead to the formation of axion miniclusters at matter-radiation equality, and subsequently the formation of axion stars. Such compact objec…
▽ More
The QCD axion and axion-like particles, as leading dark matter candidates, can also have interesting implications for dark matter substructures if the Peccei-Quinn symmetry is broken after inflation. In such a scenario, axion perturbations on small scales will lead to the formation of axion miniclusters at matter-radiation equality, and subsequently the formation of axion stars. Such compact objects open new windows for indirect searches for axions. We compute the axion star mass function based on recent axion minicluster studies and Bose star simulations. Applying this mass function, we find post-inflation axion-like particles with masses $m_a< 3.3 \times 10^{-17}$ eV are constrained by the lack of dynamical heating of stars in ultrafaint dwarfs. We also find that current microlensing surveys are insensitive to QCD axion stars. While we focus on the gravitational detectability of axion stars, our result can be directly applied to other interesting signatures of axion stars, e.g. their decay to photons, that require as input the abundance, mass, and density distribution of axion stars.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
**ming Guo,
Xiaolin Chen,
**gcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Encoded Structure
Authors:
Bangxin Li,
Hengrui Xing,
Chao Huang,
** Qian,
Huangqing Xiao,
Linfeng Feng,
Cong Tian
Abstract:
Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on…
▽ More
Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on tail structures that are rarely used during LLM training, which we refer to as Uncommon Text-Encoded Structure (UTES). We extensively study 12 UTESs templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms baseline methods. In particular, the attack success rate reaches 94.62\% on GPT-4o, which has not been addressed by state-of-the-art techniques.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
pVACview: an interactive visualization tool for efficient neoantigen prioritization and selection
Authors:
Huiming Xia,
My Hoang,
Evelyn Schmidt,
Susanna Kiwala,
Joshua McMichael,
Zachary L. Skidmore,
Bryan Fisk,
Jonathan J. Song,
Jasreet Hundal,
Thomas Mooney,
Jason R. Walker,
S. Peter Goedegebuure,
Christopher A. Miller,
William E. Gillanders,
Obi L. Griffith,
Malachi Griffith
Abstract:
Neoantigen targeting therapies including personalized vaccines have shown promise in the treatment of cancers. Accurate identification/prioritization of neoantigens is highly relevant to designing clinical trials, predicting treatment response, and understanding mechanisms of resistance. With the advent of massively parallel sequencing technologies, it is now possible to predict neoantigens based…
▽ More
Neoantigen targeting therapies including personalized vaccines have shown promise in the treatment of cancers. Accurate identification/prioritization of neoantigens is highly relevant to designing clinical trials, predicting treatment response, and understanding mechanisms of resistance. With the advent of massively parallel sequencing technologies, it is now possible to predict neoantigens based on patient-specific variant information. However, numerous factors must be considered when prioritizing neoantigens for use in personalized therapies. Complexities such as alternative transcript annotations, various binding, presentation and immunogenicity prediction algorithms, and variable peptide lengths/registers all potentially impact the neoantigen selection process. While computational tools generate numerous algorithmic predictions for neoantigen characterization, results from these pipelines are difficult to navigate and require extensive knowledge of the underlying tools for accurate interpretation. Due to the intricate nature and number of salient neoantigen features, presenting all relevant information to facilitate candidate selection for downstream applications is a difficult challenge that current tools fail to address. We have created pVACview, the first interactive tool designed to aid in the prioritization and selection of neoantigen candidates for personalized neoantigen therapies. pVACview has a user-friendly and intuitive interface where users can upload, explore, select and export their neoantigen candidates. The tool allows users to visualize candidates using variant, transcript and peptide information. pVACview will allow researchers to analyze and prioritize neoantigen candidates with greater efficiency and accuracy in basic and translational settings. The application is available as part of the pVACtools pipeline at pvactools.org and as an online server at pvacview.org.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding
Authors:
Yiqing Shen,
Zan Chen,
Michail Mamalakis,
Luhan He,
Haiyang Xia,
Tianbin Li,
Yanzhou Su,
Junjun He,
Yu Guang Wang
Abstract:
The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have…
▽ More
The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score. The dataset and the benchmark are available at https://huggingface.co/datasets/tsynbio/ProteinLMBench.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Efficient Exploration of the Rashomon Set of Rule Set Models
Authors:
Martino Ciaperoni,
Han Xiao,
Aristides Gionis
Abstract:
Today, as increasingly complex predictive models are developed, simple rule sets remain a crucial tool to obtain interpretable predictions and drive high-stakes decision making. However, a single rule set provides a partial representation of a learning task. An emerging paradigm in interpretable machine learning aims at exploring the Rashomon set of all models exhibiting near-optimal performance.…
▽ More
Today, as increasingly complex predictive models are developed, simple rule sets remain a crucial tool to obtain interpretable predictions and drive high-stakes decision making. However, a single rule set provides a partial representation of a learning task. An emerging paradigm in interpretable machine learning aims at exploring the Rashomon set of all models exhibiting near-optimal performance. Existing work on Rashomon-set exploration focuses on exhaustive search of the Rashomon set for particular classes of models, which can be a computationally challenging task. On the other hand, exhaustive enumeration leads to redundancy that often is not necessary, and a representative sample or an estimate of the size of the Rashomon set is sufficient for many applications. In this work, we propose, for the first time, efficient methods to explore the Rashomon set of rule set models with or without exhaustive search. Extensive experiments demonstrate the effectiveness of the proposed methods in a variety of scenarios.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
**a CLIP: Your CLIP Model Is Also Your Text Retriever
Authors:
Andreas Koukounas,
Georgios Mastrapas,
Michael Günther,
Bo Wang,
Scott Martens,
Isabelle Mohr,
Saba Sturua,
Mohammad Kalim Akram,
Joan Fontanals Martínez,
Saahil Ognawala,
Susana Guzman,
Maximilian Werk,
Nan Wang,
Han Xiao
Abstract:
Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by map** them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval…
▽ More
Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by map** them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the **a-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks.
△ Less
Submitted 26 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Proof of Quality: A Costless Paradigm for Trustless Generative AI Model Inference on Blockchains
Authors:
Zhenjie Zhang,
Yuyang Rao,
Hao Xiao,
Xiaokui Xiao,
Yin Yang
Abstract:
Generative AI models, such as GPT-4 and Stable Diffusion, have demonstrated powerful and disruptive capabilities in natural language and image tasks. However, deploying these models in decentralized environments remains challenging. Unlike traditional centralized deployment, systematically guaranteeing the integrity of AI model services in fully decentralized environments, particularly on trustles…
▽ More
Generative AI models, such as GPT-4 and Stable Diffusion, have demonstrated powerful and disruptive capabilities in natural language and image tasks. However, deploying these models in decentralized environments remains challenging. Unlike traditional centralized deployment, systematically guaranteeing the integrity of AI model services in fully decentralized environments, particularly on trustless blockchains, is both crucial and difficult. In this paper, we present a new inference paradigm called \emph{proof of quality} (PoQ) to enable the deployment of arbitrarily large generative models on blockchain architecture. Unlike traditional approaches based on validating inference procedures, such as ZKML or OPML, our PoQ paradigm focuses on the outcome quality of model inference. Using lightweight BERT-based cross-encoders as our underlying quality evaluation model, we design and implement PQML, the first practical protocol for real-world NLP generative model inference on blockchains, tailored for popular open-source models such as Llama 3 and Mixtral. Our analysis demonstrates that our protocol is robust against adversarial but rational participants in ecosystems, where lazy or dishonest behavior results in fewer benefits compared to well-behaving participants. The computational overhead of validating the quality evaluation is minimal, allowing quality validators to complete the quality check within a second, even using only a CPU. Preliminary simulation results show that PoQ consensus is generated in milliseconds, 1,000 times faster than any existing scheme.
△ Less
Submitted 30 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Improving Data-aware and Parameter-aware Robustness for Continual Learning
Authors:
Hanxi Xiao,
Fan Lyu
Abstract:
The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parame…
▽ More
The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method. From the data perspective, we develop a contrastive loss based on the concepts of uniformity and alignment, forming a feature distribution that is more applicable to outliers. From the parameter perspective, we present a forward strategy for worst-case perturbation and apply robust gradient projection to the parameters. The experimental results on three benchmarks show that the proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results. The code is available at: https://github.com/HanxiXiao/RCL
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Initial Burst of Disruptive Efforts Ensuring Scientific Career Viability
Authors:
Shuang Zhang,
Feifan Liu,
Haoxiang Xia
Abstract:
Despite persistent efforts to understand the dynamics of creativity of scientists over careers in terms of productivity, impact, and prize, little is known about the dynamics of scientists' disruptive efforts that affect individual academic careers and drive scientific advance. Drawing on millions of data over six decades and across nineteen disciplines, associating the publication records of indi…
▽ More
Despite persistent efforts to understand the dynamics of creativity of scientists over careers in terms of productivity, impact, and prize, little is known about the dynamics of scientists' disruptive efforts that affect individual academic careers and drive scientific advance. Drawing on millions of data over six decades and across nineteen disciplines, associating the publication records of individual scientists with the disruption index, we systematically quantify the temporal pattern of disruptive ideas over individual scientific careers, providing a detailed understanding of the macro phenomenon of scientific stagnation from the individual perspective. We start by checking the relationship between disruption-based and citation-based publication profiles. Next, we observe the finite inequality in the disruptive productivity of scientists, diminishing gradually as the level of disruption increases. We then identify the initial burst phenomenon in disruption dynamics. It is further revealed that while early engagement in high disruption frictions away initial productivity, compared to initial advantage in productivity or impact, initial high disruption ensures more subsequent academic viability evidenced by a longer career span and relatively final higher productivity, but does not necessarily guarantee academic success throughout careers. Further analysis shows that increasing disruptive work is uncorrelated to overall productivity but negatively correlated with the overall impact. However, increasing disruptive work in the early career is associated with higher overall productivity, yet lower overall productivity in the later career. Our research underscores the urgent need for a policy shift that encourages a balance between the pursuit of disruptive efforts and the achievement of impactful outcomes.
△ Less
Submitted 27 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Scalable Visual State Space Model with Fractal Scanning
Authors:
Lv Tang,
HaoKe Xiao,
Peng-Tao Jiang,
Hao Zhang,
**wei Chen,
Bo Li
Abstract:
Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternativ…
▽ More
Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternatives, initially matching Transformer performance in NLP tasks and later surpassing Vision Transformers (ViTs) in various CV tasks. To improve the performance of SSMs, one crucial aspect is effective serialization of image patches. Existing methods, relying on linear scanning curves, often fail to capture complex spatial relationships and produce repetitive patterns, leading to biases. To address these limitations, we propose using fractal scanning curves for patch serialization. Fractal curves maintain high spatial proximity and adapt to different image resolutions, avoiding redundancy and enhancing SSMs' ability to model complex patterns accurately. We validate our method in image classification, detection, and segmentation tasks, and the superior performance validates its effectiveness.
△ Less
Submitted 26 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Studying magnetic reconnection with synchrotron polarization statistics
Authors:
Jian-Fu Zhang,
Shi-Min Liang,
Hua-** Xiao
Abstract:
Magnetic reconnection is a fundamental process for releasing magnetic energy in space physics and astrophysics. At present, the usual way to investigate the reconnection process is through analytical studies or first-principles numerical simulations. This paper is the first to understand the turbulent magnetic reconnection process by exploring the nature of magnetic turbulence. From the perspectiv…
▽ More
Magnetic reconnection is a fundamental process for releasing magnetic energy in space physics and astrophysics. At present, the usual way to investigate the reconnection process is through analytical studies or first-principles numerical simulations. This paper is the first to understand the turbulent magnetic reconnection process by exploring the nature of magnetic turbulence. From the perspective of radio synchrotron polarization statistics, we study how to recover the properties of the turbulent magnetic field by considering the line of sight along different directions of the reconnection layer. We find that polarization intensity statistics can reveal the spectral properties of reconnection turbulence. This work opens up a new way of understanding turbulent magnetic reconnection.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Tunable moiré bandgap in hBN-aligned bilayer graphene device with in-situ electrostatic gating
Authors:
Hanbo Xiao,
Han Gao,
Min Li,
Fanqiang Chen,
Qiao Li,
Yiwei Li,
Meixiao Wang,
Fangyuan Zhu,
Lexian Yang,
Feng Miao,
Yulin Chen,
Cheng Chen,
Bin Cheng,
Jianpeng Liu,
Zhongkai Liu
Abstract:
Over the years, great efforts have been devoted in introducing a sizable and tunable band gap in graphene for its potential application in next-generation electronic devices. The primary challenge in modulating this gap has been the absence of a direct method for observing changes of the band gap in momentum space. In this study, we employ advanced spatial- and angle-resolved photoemission spectro…
▽ More
Over the years, great efforts have been devoted in introducing a sizable and tunable band gap in graphene for its potential application in next-generation electronic devices. The primary challenge in modulating this gap has been the absence of a direct method for observing changes of the band gap in momentum space. In this study, we employ advanced spatial- and angle-resolved photoemission spectroscopy technique to directly visualize the gap formation in bilayer graphene, modulated by both displacement fields and moiré potentials. The application of displacement field via in-situ electrostatic gating introduces a sizable and tunable electronic bandgap, proportional to the field strength up to 100 meV. Meanwhile, the moiré potential, induced by aligning the underlying hexagonal boron nitride substrate, extends the bandgap by ~ 20 meV. Theoretical calculations, effectively capture the experimental observations. Our investigation provides a quantitative understanding of how these two mechanisms collaboratively modulate the band gap in bilayer graphene, offering valuable guidance for the design of graphene-based electronic devices.
△ Less
Submitted 24 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection
Authors:
Zhe Huang,
Yizhe Zhao,
Hao Xiao,
Chenyan Wu,
Lingting Ge
Abstract:
Recent advances in multi-view camera-only 3D object detection either rely on an accurate reconstruction of bird's-eye-view (BEV) 3D features or on traditional 2D perspective view (PV) image features. While both have their own pros and cons, few have found a way to stitch them together in order to benefit from "the best of both worlds". To this end, we explore a duo space (i.e., BEV and PV) 3D perc…
▽ More
Recent advances in multi-view camera-only 3D object detection either rely on an accurate reconstruction of bird's-eye-view (BEV) 3D features or on traditional 2D perspective view (PV) image features. While both have their own pros and cons, few have found a way to stitch them together in order to benefit from "the best of both worlds". To this end, we explore a duo space (i.e., BEV and PV) 3D perception framework, in conjunction with some useful duo space fusion strategies that allow effective aggregation of the two feature representations. To the best of our knowledge, our proposed method, DuoSpaceNet, is the first to leverage two distinct feature spaces and achieves the state-of-the-art 3D object detection and BEV map segmentation results on nuScenes dataset.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (559 additional authors not shown)
Abstract:
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for…
▽ More
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Authors:
Hanguang Xiao,
Feizhong Zhou,
Xingyue Liu,
Tianqi Liu,
Zhipeng Li,
Xin Liu,
Xiaoxuan Huang
Abstract:
Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have garnered significant attention due to their powerful and general capabilities in understanding, reasoning, and generation, thereby offering new paradigms for the integration of artificial intelligence with medicine. This survey comprehensively overviews the development background…
▽ More
Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have garnered significant attention due to their powerful and general capabilities in understanding, reasoning, and generation, thereby offering new paradigms for the integration of artificial intelligence with medicine. This survey comprehensively overviews the development background and principles of LLMs and MLLMs, as well as explores their application scenarios, challenges, and future directions in medicine. Specifically, this survey begins by focusing on the paradigm shift, tracing the evolution from traditional models to LLMs and MLLMs, summarizing the model structures to provide detailed foundational knowledge. Subsequently, the survey details the entire process from constructing and evaluating to using LLMs and MLLMs with a clear logic. Following this, to emphasize the significant value of LLMs and MLLMs in healthcare, we survey and summarize 6 promising applications in healthcare. Finally, the survey discusses the challenges faced by medical LLMs and MLLMs and proposes a feasible approach and direction for the subsequent integration of artificial intelligence with medicine. Thus, this survey aims to provide researchers with a valuable and comprehensive reference guide from the perspectives of the background, principles, and clinical applications of LLMs and MLLMs.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Enabling Roll-up and Drill-down Operations in News Exploration with Knowledge Graphs for Due Diligence and Risk Management
Authors:
Sha Wang,
Yuchen Li,
Hanhua Xiao,
Zhifeng Bao,
Lambert Deng,
Yanfei Dong
Abstract:
Efficient news exploration is crucial in real-world applications, particularly within the financial sector, where numerous control and risk assessment tasks rely on the analysis of public news reports. The current processes in this domain predominantly rely on manual efforts, often involving keywordbased searches and the compilation of extensive keyword lists. In this paper, we introduce NCEXPLORE…
▽ More
Efficient news exploration is crucial in real-world applications, particularly within the financial sector, where numerous control and risk assessment tasks rely on the analysis of public news reports. The current processes in this domain predominantly rely on manual efforts, often involving keywordbased searches and the compilation of extensive keyword lists. In this paper, we introduce NCEXPLORER, a framework designed with OLAP-like operations to enhance the news exploration experience. NCEXPLORER empowers users to use roll-up operations for a broader content overview and drill-down operations for detailed insights. These operations are achieved through integration with external knowledge graphs (KGs), encompassing both fact-based and ontology-based structures. This integration significantly augments exploration capabilities, offering a more comprehensive and efficient approach to unveiling the underlying structures and nuances embedded in news content. Extensive empirical studies through master-qualified evaluators on Amazon Mechanical Turk demonstrate NCEXPLORER's superiority over existing state-of-the-art news search methodologies across an array of topic domains, using real-world news datasets.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
CloudDiff: Super-resolution ensemble retrieval of cloud properties for all day using the generative diffusion model
Authors:
Haixia Xiao,
Feng Zhang,
Lingxiao Wang,
Wenwen Li,
Bin Guo,
Jun Li
Abstract:
Clouds play a crucial role in the Earth's water and energy cycles, underscoring the importance of high spatiotemporal resolution data on cloud phase and properties for accurate numerical modeling and weather prediction. Currently, Moderate Resolution Imaging Spectroradiometer (MODIS) provides cloud products with a spatial resolution of 1 km. However, these products suffer from a lengthy revisit cy…
▽ More
Clouds play a crucial role in the Earth's water and energy cycles, underscoring the importance of high spatiotemporal resolution data on cloud phase and properties for accurate numerical modeling and weather prediction. Currently, Moderate Resolution Imaging Spectroradiometer (MODIS) provides cloud products with a spatial resolution of 1 km. However, these products suffer from a lengthy revisit cycle. This study develops a generative diffusion model (donated as CloudDiff) for super-resolution retrieval of high spatiotemporal cloud phase and properties, applicable both day and night. Leveraging 2 km spatial resolution Himawari-8 Advanced Himawari Imager (AHI) thermal infrared (TIR) radiances and viewing geometry as condition, alongside daytime MODIS products as targets, the model can generate cloud phase (CLP), cloud top height (CTH), cloud optical thickness (COT), and cloud effective radius (CER) at 1 km spatial resolution and 10-minute temporal resolution. The conditional diffusion model can generate sharper images and capture finer local features than deterministic super-resolution approaches. It draws multiple samples based on the underlying probability distribution, enabling retrieval uncertainty assessment. Evaluations show agreement between cloud phase and properties derived from the CloudDiff and MODIS cloud products. The ensemble mean is found to enhance retrieval accuracy and credibility, outperforming the deterministic model.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
SATO: Stable Text-to-Motion Framework
Authors:
Wenshuo Chen,
Hongru Xiao,
Erhang Zhang,
Lijie Hu,
Lei Wang,
Mengyuan Liu,
Chen Chen
Abstract:
Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, re…
▽ More
Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists of three modules, each dedicated to stable attention, stable prediction, and maintaining a balance between accuracy and robustness trade-off. We present a methodology for constructing an SATO that satisfies the stability of attention and prediction. To verify the stability of the model, we introduced a new textual synonym perturbation dataset based on HumanML3D and KIT-ML. Results show that SATO is significantly more stable against synonyms and other slight perturbations while kee** its high accuracy performance.
△ Less
Submitted 3 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Authors:
Bo Li,
Haoke Xiao,
Lv Tang
Abstract:
In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that…
▽ More
In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition
Authors:
Zhendong Liu,
Haifeng Xia,
Tong Guo,
Libo Sun,
Ming Shao,
Siyu Xia
Abstract:
Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purpos…
▽ More
Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Embedded Representation Learning Network for Animating Styled Video Portrait
Authors:
Tianyong Wang,
Xiangyu Liang,
Wangguandong Zheng,
Dan Niu,
Haifeng Xia,
Siyu Xia
Abstract:
The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable ta…
▽ More
The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable talking heads. (2) Displacement artifacts around the neck in rendered images. To overcome these two challenges, we propose a novel generative paradigm \textit{Embedded Representation Learning Network} (ERLNet) with two learning stages. First, the \textit{ audio-driven FLAME} (ADF) module is constructed to produce facial expression and head pose sequences synchronized with content audio and style video. Second, given the sequence deduced by the ADF, one novel \textit{dual-branch fusion NeRF} (DBF-NeRF) explores these contents to render the final images. Extensive empirical studies demonstrate that the collaboration of these two stages effectively facilitates our method to render a more realistic talking head than the existing algorithms.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
Authors:
Xiangyu Liang,
Wenlin Zhuang,
Tianyong Wang,
Guangxing Geng,
Guangyue Geng,
Haifeng Xia,
Siyu Xia
Abstract:
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff a…
▽ More
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and capture a dataset for five different emotions. We train a generative network using an autoencoder structure and input an emotion embedding vector to achieve the generation of user-control expressions. Experimental results demonstrate that our method outperforms existing state-of-the-art methods.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Measurement of Interstellar Magnetization by Synchrotron Polarization Variance
Authors:
Ning-Ning Guo,
Jian-Fu Zhang,
Hua-** Xiao,
Jungyeon Cho,
Xue-Juan Yang
Abstract:
Since synchrotron polarization fluctuations are related to the fundamental properties of the magnetic field, we propose the polarization intensity variance to measure the Galactic interstellar medium (ISM) magnetization. We confirm the method's applicability by comparing it with the polarization angle dispersion and its reliability by measuring the underlying Alfvénic Mach number of MHD turbulence…
▽ More
Since synchrotron polarization fluctuations are related to the fundamental properties of the magnetic field, we propose the polarization intensity variance to measure the Galactic interstellar medium (ISM) magnetization. We confirm the method's applicability by comparing it with the polarization angle dispersion and its reliability by measuring the underlying Alfvénic Mach number of MHD turbulence. With the finding of the power-law relation of $\mathcal{A} \propto M_{\rm A}^{2}$ between polarization intensity variance $\mathcal{A}$ and Alfvénic Mach number $M_{\rm A}$, we apply the new technique to the Canadian Galactic Plane Survey (CGPS) data, achieving Alfvénic Mach number of the Galactic ISM. Our results show that the low-latitude Galactic ISM is dominated by sub-Alfénic turbulence, with $M_{\rm A}$ approximately between 0.5 and 1.0.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Omnidirectional 3D printing of PEDOT:PSS aerogels with tunable electromechanical performance for unconventional stretchable interconnects and thermoelectrics
Authors:
Hasan Emre Baysal,
Tzu-Yi Yu,
Viktor Naenen,
Stijn De Smedt,
Defne Hiz,
Bokai Zhang,
Heyi Xia,
Isidro Florenciano,
Martin Rosenthal,
Ruth Cardinaels,
Francisco Molina-Lopez
Abstract:
The next generation of soft electronics will expand to the third dimension. This will require the integration of mechanically-compliant three-dimensional functional structures with stretchable materials. This study demonstrates omnidirectional direct ink writing (DIW) of Poly(3,4-ethylenedioxythiophene) polystyrene sulfonate (PEDOT:PSS) aerogels with tunable electrical and mechanical performance,…
▽ More
The next generation of soft electronics will expand to the third dimension. This will require the integration of mechanically-compliant three-dimensional functional structures with stretchable materials. This study demonstrates omnidirectional direct ink writing (DIW) of Poly(3,4-ethylenedioxythiophene) polystyrene sulfonate (PEDOT:PSS) aerogels with tunable electrical and mechanical performance, which can be integrated with soft substrates. Several PEDOT:PSS hydrogels were formulated for DIW and freeze-dried directly on stretchable substrates to form integrated aerogels displaying high shape fidelity and minimal shrinkage. The effect of additives and processing in the PEDOT:PSS hydro and aerogels morphology, and the link with their electromechanical properties was elucidated. This technology demonstrated 3D-structured stretchable interconnects and planar thermoelectric generators (TEGs) for skin electronics, as well as vertically-printed high aspect ratio thermoelectric pillars with a high ZT value of 3.2 10^-3 and ultra-low thermal conductivity of 0.065 W/(m K). Despite their comparatively low ZT, the aerogel pillars outpowered their dense counterparts in realistic energy harvesting scenarios where contact resistances cannot be ignored, and produced up to 26 nW/cm2 (corresponding to a gravimetric power density of 0.76 mW/kg) for a difference of temperature of 15 K. This work suggests promising advancements in soft and energy-efficiency electronic systems relevant to soft robotics and wearable.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition
Authors:
Naichuan Zheng,
Hailun Xia,
Zeyu Liang
Abstract:
In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. To address this issue, we propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Disti…
▽ More
In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. To address this issue, we propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN). By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy. Firstly, we convert GCN into Spiking Graph Convolutional Network (SGN) and construct a foundational Base-SGN for skeleton-based action recognition, establishing a new benchmark and paving the way for future research exploration. Secondly, we further propose a Spiking Multimodal Fusion module (SMF), leveraging mutual information to process multimodal data more efficiently. Additionally, we introduce a spiking attention mechanism and design a Spatio Graph Convolution module with a Spatial Global Spiking Attention mechanism (SA-SGC), enhancing feature learning capability. Furthermore, we delve into knowledge distillation methods from multimodal GCN to SGN and propose a novel, integrated method that simultaneously focuses on both intermediate layer distillation and soft label distillation to improve the performance of SGN. On two challenging datasets for skeleton-based action recognition, MK-SGN outperforms the state-of-the-art GCN-like frameworks in reducing computational load and energy consumption. In contrast, typical GCN methods typically consume more than 35mJ per action sample, while MK-SGN reduces energy consumption by more than 98%.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
Authors:
Hongchi Xia,
Zhi-Hao Lin,
Wei-Chiu Ma,
Shenlong Wang
Abstract:
Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF)…
▽ More
Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
GCC: Generative Calibration Clustering
Authors:
Haifeng Xia,
Hai Huang,
Zhengming Ding
Abstract:
Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-lev…
▽ More
Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-level representation. This hypothesis actually is too strict to be satisfied for real-world applications. To overcome such a challenge, the natural strategy is utilizing generative models to augment considerable instances. How to use these novel samples to effectively fulfill clustering performance improvement is still difficult and under-explored. In this paper, we propose a novel Generative Calibration Clustering (GCC) method to delicately incorporate feature learning and augmentation into clustering procedure. First, we develop a discriminative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment to boost the conditional diffusion generation. Extensive experimental results on three benchmarks validate the effectiveness and advantage of our proposed method over the state-of-the-art methods.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
The particle acceleration study in blazar jet
Authors:
Hubing Xiao,
Wenxin Yang,
Yutao Zhang,
Shaohua Zhang,
Junhui Fan,
Li** Fu,
Jianghe Yang
Abstract:
The particle acceleration of blazar jets is crucial to high-energy astrophysics, yet the acceleration mechanism division in blazar subclasses and the underlying nature of these mechanisms remain elusive. In this work, we utilized the synchrotron spectral information (synchrotron peak frequency, $\log ν_{\rm sy}$, and corresponding curvature, $b_{\rm sy}$) of 2705 blazars from the literature and st…
▽ More
The particle acceleration of blazar jets is crucial to high-energy astrophysics, yet the acceleration mechanism division in blazar subclasses and the underlying nature of these mechanisms remain elusive. In this work, we utilized the synchrotron spectral information (synchrotron peak frequency, $\log ν_{\rm sy}$, and corresponding curvature, $b_{\rm sy}$) of 2705 blazars from the literature and studied the subject of particle acceleration in blazar jets by analysing the correlation between $\log ν_{\rm sy}$ and $1/b_{\rm sy}$. Our results suggested that the entire sample follows an energy-dependent probability acceleration (EDPA). Specifically, the low inverse Compton peak sources (LCPs) follow the mechanism that fluctuations of fractional gain acceleration (FFGA), while the high inverse Compton peak sources (HCPs) follow an acceleration mechanism of EDPA. Our results indicated that the separation between LCPs and HCPs results from the electron peak Lorentz factor ($γ_{\rm p}$), and the differentiation should originate from different acceleration mechanisms. Moreover, our study revealed a transition in the acceleration mechanism from FFGA to EDPA around $\log ν_{\rm sy} \sim 15$ through a detailed analysis of binned-$\log ν_{\rm sy}$. The mechanism of FFGA dominates the particle acceleration in LCP jets because of stronger jets and the EDPA dominates the particle energy gain in the HCPs due to a more efficient acceleration process.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Authors:
Xiangyang Zhu,
Renrui Zhang,
Bowei He,
Ziyu Guo,
Jiaming Liu,
Han Xiao,
Chaoyou Fu,
Hao Dong,
Peng Gao
Abstract:
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c…
▽ More
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' classes. To tackle these issues, we propose a Non-parametric Network for few-shot 3D Segmentation, Seg-NN, and its Parametric variant, Seg-PN. Without training, Seg-NN extracts dense representations by hand-crafted filters and achieves comparable performance to existing parametric models. Due to the elimination of pre-training, Seg-NN can alleviate the domain gap issue and save a substantial amount of time. Based on Seg-NN, Seg-PN only requires training a lightweight QUEry-Support Transferring (QUEST) module, which enhances the interaction between the support set and query set. Experiments suggest that Seg-PN outperforms previous state-of-the-art method by +4.19% and +7.71% mIoU on S3DIS and ScanNet datasets respectively, while reducing training time by -90%, indicating its effectiveness and efficiency.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation
Authors:
Hui Xiao,
Yuting Hong,
Li Dong,
Diqun Yan,
Jiayan Zhuang,
Junjie Xiong,
Dongtai Liang,
Chengbin Peng
Abstract:
Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (…
▽ More
Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (MLLC), which aims to use graph neural networks to capture structural relationships in Semantic-Level Graphs (SLGs) and Class-Level Graphs (CLGs) to rectify erroneous pseudo-labels. Specifically, SLGs represent semantic affinities between pairs of pixel features, and CLGs describe classification consistencies between pairs of pixel labels. With the support of proximate pattern information from graphs, MLLC can rectify incorrectly predicted pseudo-labels and can facilitate discriminative feature representations. We design an end-to-end network to train and perform this effective label corrections mechanism. Experiments demonstrate that MLLC can significantly improve supervised baselines and outperforms state-of-the-art approaches in different scenarios on Cityscapes and PASCAL VOC 2012 datasets. Specifically, MLLC improves the supervised baseline by at least 5% and 2% with DeepLabV2 and DeepLabV3+ respectively under different partition protocols.
△ Less
Submitted 9 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation
Authors:
Wangguandong Zheng,
Haifeng Xia,
Rui Chen,
Ming Shao,
Siyu Xia,
Zhengming Ding
Abstract:
Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To ove…
▽ More
Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.
△ Less
Submitted 7 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Coexistence of non-Hermitian skin effect and extended states in one-dimensional nonreciprocal lattices
Authors:
Han Xiao,
Qi-Bo Zeng
Abstract:
We study the one-dimensional non-Hermitian lattices with staggered onsite modulations and nonreciprocal hop** up to the next-nearest-neighboring (NNN) sites. Due to the NNN nonreciprocity, the non-Hermitian skin effect (NHSE) in the system under open boundary conditions (OBC) can be energy-dependent, and there will be NHSE edges in the eigenenergy spectrum, which separates the eigenstates locali…
▽ More
We study the one-dimensional non-Hermitian lattices with staggered onsite modulations and nonreciprocal hop** up to the next-nearest-neighboring (NNN) sites. Due to the NNN nonreciprocity, the non-Hermitian skin effect (NHSE) in the system under open boundary conditions (OBC) can be energy-dependent, and there will be NHSE edges in the eigenenergy spectrum, which separates the eigenstates localized at the opposite ends of the lattice. We find that the interplay between the nonreciprocal hop** and onsite modulations can reverse the direction of the skin effect and modify the position of the NHSE edge. Moreover, by tuning the system parameters, some of the eigenstates under OBC will become fully extended with the corresponding eigenenergies being imaginary under both open and periodic boundary conditions. Thus, the extended states can coexist with the NHSH in the same system. The NHSE can even be completely dissolved with all the eigenstates being extended when the modulation is imaginary. Our work unveils the intricate interplay between onsite modulations and nonreciprocal hop** in non-Hermitian systems.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
EL-MLFFs: Ensemble Learning of Machine Leaning Force Fields
Authors:
Bangchen Yin,
Yue Yin,
Yuda W. Tang,
Hai Xiao
Abstract:
Machine learning force fields (MLFFs) have emerged as a promising approach to bridge the accuracy of quantum mechanical methods and the efficiency of classical force fields. However, the abundance of MLFF models and the challenge of accurately predicting atomic forces pose significant obstacles in their practical application. In this paper, we propose a novel ensemble learning framework, EL-MLFFs,…
▽ More
Machine learning force fields (MLFFs) have emerged as a promising approach to bridge the accuracy of quantum mechanical methods and the efficiency of classical force fields. However, the abundance of MLFF models and the challenge of accurately predicting atomic forces pose significant obstacles in their practical application. In this paper, we propose a novel ensemble learning framework, EL-MLFFs, which leverages the stacking method to integrate predictions from diverse MLFFs and enhance force prediction accuracy. By constructing a graph representation of molecular structures and employing a graph neural network (GNN) as the meta-model, EL-MLFFs effectively captures atomic interactions and refines force predictions. We evaluate our approach on two distinct datasets: methane molecules and methanol adsorbed on a Cu(100) surface. The results demonstrate that EL-MLFFs significantly improves force prediction accuracy compared to individual MLFFs, with the ensemble of all eight models yielding the best performance. Moreover, our ablation study highlights the crucial roles of the residual network and graph attention layers in the model's architecture. The EL-MLFFs framework offers a promising solution to the challenges of model selection and force prediction accuracy in MLFFs, paving the way for more reliable and efficient molecular simulations.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.