Search | arXiv e-print repository

When is the Four-phonon Effect in Half-Heusler Materials more Pronounced?

Authors: Yu Wu, Shengnan Dai, Linxuan Ji, Yimin Ding, Jiong Yang, Liujiang Zhou

Abstract: Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly… ▽ More Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly in a co-increasing or decreasing behavior. The $aao$ and $aaa$ three-phonon scattering channels in Half-Heusler materials are weakly affected by the acoustic-optical gap and acoustic bunched features respectively only when acoustic phonon bandwidths are close. Finally, we found that Half-Heusler materials with smaller acoustic bandwidths tend to have a more pronounced four-phonon effect, although three-phonon scattering may not be significantly suppressed at this time. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00615 [pdf, other]

GC-Bench: An Open and Unified Benchmark for Graph Condensation

Authors: Qingyun Sun, Ziying Chen, Beining Yang, Cheng Ji, Xingcheng Fu, Sheng Zhou, Hao Peng, Jianxin Li, Philip S. Yu

Abstract: Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehens… ▽ More Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehensive evaluation and in-depth analysis, which creates a great obstacle to understanding the progress in this field. To fill this gap, we develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation in different scenarios systematically. Specifically, GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity. We comprehensively evaluate 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 diverse graph datasets. Further, we have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research. The GC-Bench library is available at https://github.com/RingBDStack/GC-Bench. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

arXiv:2407.00614 [pdf, other]

Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Gras** in Dexterous Robotics

Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we pr… ▽ More To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we propose a granularity-aware affordance feature extraction method for locating functional affordance areas and predicting dexterous coarse gestures. We study the intrinsic mechanisms of human tool use. On one hand, we use fine-grained affordance features of object-functional finger contact areas to locate functional affordance regions. On the other hand, we use highly activated coarse-grained affordance features in hand-object interaction regions to predict grasp gestures. Additionally, we introduce a model-based post-processing module that includes functional finger coordinate localization, finger-to-end coordinate transformation, and force feedback-based coarse-to-fine gras**. This forms a complete dexterous robotic functional gras** framework GAAF-Dex, which learns Granularity-Aware Affordances from human-object interaction for tool-based Functional gras** in Dexterous Robotics. Unlike fully-supervised methods that require extensive data annotation, we employ a weakly supervised approach to extract relevant cues from exocentric (Exo) images of hand-object interactions to supervise feature extraction in egocentric (Ego) images. We have constructed a small-scale dataset, FAH, which includes near 6K images of functional hand-object interaction Exo- and Ego images of 18 commonly used tools performing 6 tasks. Extensive experiments on the dataset demonstrate our method outperforms state-of-the-art methods. The code will be made publicly available at https://github.com/yangfan293/GAAF-DEX. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

arXiv:2407.00611 [pdf, other]

WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

Authors: Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Xuanlei Zhao, James Demmel, Yang You

Abstract: In recent years, Transformer-based Large Language Models (LLMs) have garnered significant attention due to their exceptional performance across a variety of tasks. However, training these models on long sequences presents a substantial challenge in terms of efficiency and scalability. Current methods are constrained either by the number of attention heads, limiting scalability, or by excessive com… ▽ More In recent years, Transformer-based Large Language Models (LLMs) have garnered significant attention due to their exceptional performance across a variety of tasks. However, training these models on long sequences presents a substantial challenge in terms of efficiency and scalability. Current methods are constrained either by the number of attention heads, limiting scalability, or by excessive communication overheads. In this paper, we propose an insight that Attention Computation can be considered as a special case of n-body problem with direct interactions. Based on this concept, this paper introduces WallFacer, an efficient long-sequence training system with a novel multi-dimensional ring sequence parallelism, fostering an efficient communication paradigm and extra tuning space for communication arrangement. Through comprehensive experiments under diverse environments and model settings, we demonstrate that WallFacer significantly surpasses state-of-the-art method that supports near-infinite sequence length, achieving performance improvements of up to 77.12%. △ Less

Submitted 1 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00596 [pdf, other]

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19286

arXiv:2407.00579 [pdf, ps, other]

Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim of maximizing the covert rate. Specifically, a dual-function base-station (BS) transmits the superposition signal to sense multiple targets, while achieving covert and reliable communications for a pair of NOMA covert and public users, respectively, in the presence of a warden. Two superposition transmission schemes, namely, the transmissions with dedicated sensing signal (w-DSS) and without dedicated sensing signal (w/o-DSS), are respectively considered in the formulations of the joint transmission and reflection beamforming optimization problems. Numerical results demonstrate that active-RIS-aided NOMA-ISAC system outperforms the passive-RIS-aided and without-RIS counterparts in terms of covert rate and trade-off between covert communication and sensing performance metrics. Finally, the w/o-DSS scheme, which omits the dedicated sensing signal, achieves a higher covert rate than the w-DSS scheme by allocating more transmit power for the covert transmissions, while preserving a comparable multi-target sensing performance. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00574 [pdf, other]

OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration

Authors: Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Angela Yao

Abstract: Accurate camera motion estimation is critical to estimate human motion in the global space. A standard and widely used method for estimating camera motion is Simultaneous Localization and Map** (SLAM). However, SLAM only provides a trajectory up to an unknown scale factor. Different from previous attempts that optimize the scale factor, this paper presents Optimization-free Camera Motion Scale C… ▽ More Accurate camera motion estimation is critical to estimate human motion in the global space. A standard and widely used method for estimating camera motion is Simultaneous Localization and Map** (SLAM). However, SLAM only provides a trajectory up to an unknown scale factor. Different from previous attempts that optimize the scale factor, this paper presents Optimization-free Camera Motion Scale Calibration (OfCaM), a novel framework that utilizes prior knowledge from human mesh recovery (HMR) models to directly calibrate the unknown scale factor. Specifically, OfCaM leverages the absolute depth of human-background contact joints from HMR predictions as a calibration reference, enabling the precise recovery of SLAM camera trajectory scale in global space. With this correctly scaled camera motion and HMR's local motion predictions, we achieve more accurate global human motion estimation. To compensate for scenes where we detect SLAM failure, we adopt a local-to-global motion map** to fuse with previously derived motion to enhance robustness. Simple yet powerful, our method sets a new standard for global human mesh estimation tasks, reducing global human motion error by 60% over the prior SOTA while also demanding orders of magnitude less inference time compared with optimization-based methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 12 pages, 7 figures, 4 tables

arXiv:2407.00564 [pdf, other]

Variational Nonparametric Inference in Functional Stochastic Block Model

Authors: Zuofeng Shang, Peijun Sang, Yang Feng, Chong **

Abstract: We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual… ▽ More We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slop function of the functional nodal information. Our methods are examined through both simulated and real datasets. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00496 [pdf, other]

A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation

Authors: Aicheng Gong, Kai Yang, Jiafei Lyu, Xiu Li

Abstract: Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solution… ▽ More Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solutions. In reality, task allocation resembles Markov decision processes, with dynamically changing task and entity attributes. Thus, algorithms must dynamically allocate tasks based on their states. To address this issue, we propose a two-stage task allocation algorithm based on similarity, utilizing reinforcement learning to learn allocation strategies. The proposed pre-assign strategy allows entities to preselect appropriate tasks, effectively avoiding local optima and thereby better finding the optimal allocation. We also introduce an attention mechanism and a hyperparameter network structure to adapt to the changing number and attributes of entities and tasks, enabling our network structure to generalize to new tasks. Experimental results across multiple environments demonstrate that our algorithm effectively addresses the challenges of dynamic task allocation in practical applications. Compared to heuristic algorithms like genetic algorithms, our reinforcement learning approach better solves dynamic allocation problems and achieves zero-shot generalization to new tasks with good performance. The code is available at https://github.com/yk7333/TaskAllocation. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00487 [pdf, other]

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

Authors: Bingdong Li, Zixiang Di, Yanting Yang, Hong Qian, Peng Yang, Hao Hao, Ke Tang, Aimin Zhou

Abstract: In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on huma… ▽ More In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on human intuition and customized strategies. Second, parameter conflicts often arise during merging, and while methods like DARE [1] can alleviate this issue, they tend to stochastically drop parameters, risking the loss of important delta parameters. To address these challenges, we propose the MM-MO method, which automates the search for optimal merging configurations using multi-objective optimization algorithms, eliminating the need for human intuition. During the configuration searching process, we use estimated performance across multiple diverse tasks as optimization objectives in order to alleviate the parameter conflicting between different source models without losing crucial delta parameters. We conducted comparative experiments with other mainstream model merging methods, demonstrating that our method consistently outperforms them. Moreover, our experiments reveal that even task types not explicitly targeted as optimization objectives show performance improvements, indicating that our method enhances the overall potential of the model rather than merely overfitting to specific task types. This approach provides a significant advancement in model merging techniques, offering a robust and plug-and-play solution for integrating diverse models into a unified, high-performing model. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00478 [pdf, other]

Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

Abstract: The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural in… ▽ More The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural incompatibility and representation bottlenecks. To tackle this challenge, we propose a novel framework for learning from relational graphs via knowledge-aware parsimony learning. Our method draws inspiration from the duality between data and knowledge inherent in these graphs. Specifically, we first extract knowledge (like symbolic logic and physical laws) during the learning process, and then apply combinatorial generalization to the task at hand. This extracted knowledge serves as the ``building blocks'' for achieving parsimony learning. By applying this philosophy to architecture, parameters, and inference, we can effectively achieve versatile, sample-efficient, and interpretable learning. Experimental results show that our proposed framework surpasses methods that strictly follow the traditional scaling-up roadmap. This highlights the importance of incorporating knowledge in the development of next-generation learning technologies. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00467 [pdf, other]

VcLLM: Video Codecs are Secretly Tensor Codecs

Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressure. Our research found that video codecs, despite being originally designed for compressing videos, show excellent efficiency when compressing various types of tensors. We demonstrate that video codecs can be versatile and general-purpose tensor codecs while achieving the state-of-the-art compression efficiency in various tasks. We further make use of the hardware video encoding and decoding module available on GPUs to create a framework capable of both inference and training with video codecs repurposed as tensor codecs. This greatly reduces the requirement for memory capacity and communication bandwidth, enabling training and inference of large models on consumer-grade GPUs. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00443 [pdf, other]

doi 10.1021/acs.jpcc.4c02349

Electronic Correlations in Multiferroic van der Waals CuCrP$_2$S6: Insights From X-Ray Spectroscopy and DFT

Authors: Yefei Guo, Jiali Yang, Junhao Zhou, Na Zhu, Yichen **, Günther Thiele, Alexei Preobrajenski, Elena Voloshina, Yuriy Dedkov

Abstract: The electronic structure of high-quality van der Waals multiferroic CuCrP$_2$S6 crystals was investigated applying photoelectron spectroscopy methods in combination with DFT analysis. Using X-ray photoelectron and near-edge X-ray absorption fine structure (NEXAFS) spectroscopy at the Cu L2,3 and Cr L2,3 absorption edges we determine the charge states of ions in the studied compound. Analyzing the… ▽ More The electronic structure of high-quality van der Waals multiferroic CuCrP$_2$S6 crystals was investigated applying photoelectron spectroscopy methods in combination with DFT analysis. Using X-ray photoelectron and near-edge X-ray absorption fine structure (NEXAFS) spectroscopy at the Cu L2,3 and Cr L2,3 absorption edges we determine the charge states of ions in the studied compound. Analyzing the systematic NEXAFS and resonant photoelectron spectroscopy data at the Cu/Cr L2,3 absorption edges allowed us to assign the CuCrP$_2$S6 material to a Mott-Hubbard type insulator and identify different Auger-decay channels (participator vs. spectator) during absorption and autoionization processes. Spectroscopic and theoretical data obtained for CuCrP$_2$S6 are very important for the detailed understanding of the electronic structure and electron-correlations phenomena in different layered materials, that will drive their further applications in different areas, like electronics, spintronics, sensing, and catalysis. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Journal ref: J. Phys. Chem. C 128, 7830 (2024)

arXiv:2407.00434 [pdf, other]

Brevity is the soul of wit: Pruning long files for code generation

Authors: Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos

Abstract: Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements. Generally, three t… ▽ More Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements. Generally, three types of methods are used to filter internet-scale corpora: embedding-based, heuristic-based, and classifier-based. In this work, we contrast the former two in the domain of finetuning LLMs for code generation. We find that embedding-based methods are often confounded by length, and that a simple heuristic--pruning long files--outperforms other methods in compute-limited regimes. Our method can yield up to a 2x efficiency benefit in training (while matching performance) or a 3.5% absolute performance improvement on HumanEval (while matching compute). However, we find that perplexity on held-out long files can increase, begging the question of whether optimizing data mixtures for common coding benchmarks (HumanEval, MBPP) actually best serves downstream use cases. Overall, we hope our work builds useful intuitions about code data (specifically, the low quality of extremely long code files) provides a compelling heuristic-based method for data pruning, and brings to light questions in how we evaluate code generation models. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 15 pages, 5 figures

arXiv:2407.00433 [pdf]

Screening of half-Heuslers with temperature-induced band convergence and enhanced thermoelectric properties

Authors: **yang Xi, Zirui Dong, Menghan Gao, Jun Luo, Jiong Yang

Abstract: Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-in… ▽ More Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-induced conduction band convergence in HH are found out. Using these as the screening criteria, 3 out of 274 HHs (TiRhBi, TiPtSn, NbPtTl) are then stood out from our MatHub-3d database. Taking TiPtSn as the example, it shows the conduction band convergence at mid-high temperature, and further resulting in enhanced Seebeck coefficient S: e.g., at 600 K with electron concentration 10^20 cm^-3, the predicted S with and without renormalized band is 352.83 uV/K and 289.52 uV/K, respectively. Herein, the former is closer to our measurement value of 338.79 uV/K. Besides, the effective masses obtained from calculation and experiment are both enlarged with temperature, indicating the existence of band convergence. Our work demonstrates for the first time the significance of adding the temperature effect on electronic structure in the design of potential high-performance TE materials. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00427 [pdf, ps, other]

On the boundedness of degenerate hypergraphs

Authors: Jianfeng Hou, Caiyun Hu, Heng Li, Xizhi Liu, Caihong Yang, Yixiao Zhang

Abstract: We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $α, β>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $α\binom{n-1}{r-1}$ has fewer than $(1-β) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent wo… ▽ More We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $α, β>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $α\binom{n-1}{r-1}$ has fewer than $(1-β) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent works~\cite{HHLLYZ23a,DHLY24} that aim to extend the classical Hajnal--Szemerédi Theorem and the anti-Ramsey theorems of Erdős--Simonovits--Sós. We show that many well-studied degenerate hypergraphs, such as all even cycles, most complete bipartite graphs, and the expansion of most complete bipartite graphs, are bounded. In addition, to prove the boundedness of the expansion of complete bipartite graphs, we introduce and solve a Zarankiewicz-type problem for $3$-graphs, strengthening a theorem by Kostochka--Mubayi--Verstraëte~\cite{KMV15}. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: comments are welcome

arXiv:2407.00365 [pdf, other]

Financial Knowledge Large Language Model

Authors: Cehao Yang, Cheng** Xu, Yiyan Qi

Abstract: Artificial intelligence is making significant strides in the finance industry, revolutionizing how data is processed and interpreted. Among these technologies, large language models (LLMs) have demonstrated substantial potential to transform financial services by automating complex tasks, enhancing customer service, and providing detailed financial analysis. Firstly, we introduce IDEA-FinBench, an… ▽ More Artificial intelligence is making significant strides in the finance industry, revolutionizing how data is processed and interpreted. Among these technologies, large language models (LLMs) have demonstrated substantial potential to transform financial services by automating complex tasks, enhancing customer service, and providing detailed financial analysis. Firstly, we introduce IDEA-FinBench, an evaluation benchmark specifically tailored for assessing financial knowledge in large language models (LLMs). This benchmark utilizes questions from two globally respected and authoritative financial professional exams, aimimg to comprehensively evaluate the capability of LLMs to directly address exam questions pertinent to the finance sector. Secondly, we propose IDEA-FinKER, a Financial Knowledge Enhancement framework designed to facilitate the rapid adaptation of general LLMs to the financial domain, introducing a retrieval-based few-shot learning method for real-time context-level knowledge injection, and a set of high-quality financial knowledge instructions for fine-tuning any general LLM. Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs. This system is structured around a scheme of real-time knowledge injection and factual enhancement using external knowledge. IDEA-FinQA is comprised of three main modules: the data collector, the data querying module, and LLM-based agents tasked with specific functions. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 66 pages

arXiv:2407.00352 [pdf, other]

PhyTracker: An Online Tracker for Phytoplankton

Authors: Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong

Abstract: Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking f… ▽ More Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking framework designed for automatic tracking of phytoplankton. PhyTracker overcomes significant challenges unique to phytoplankton monitoring, such as constrained mobility within water flow, inconspicuous appearance, and the presence of impurities. Our method incorporates three innovative modules: a Texture-enhanced Feature Extraction (TFE) module, an Attention-enhanced Temporal Association (ATA) module, and a Flow-agnostic Movement Refinement (FMR) module. These modules enhance feature capture, differentiate between phytoplankton and impurities, and refine movement characteristics, respectively. Extensive experiments on the PMOT dataset validate the superiority of PhyTracker in phytoplankton tracking, and additional tests on the MOT dataset demonstrate its general applicability, outperforming conventional tracking methods. This work highlights key differences between phytoplankton and traditional objects, offering an effective solution for phytoplankton monitoring. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 13pages,eleven figures

arXiv:2407.00348 [pdf, other]

Accretion of the degenerate Fermi gas onto a Reissner-Nordström black hole

Authors: ** Li, Jiang-he Yang, Siwei Xu

Abstract: We investigate the stationary, spherically symmetric accretion of a degenerate relativistic Fermi gas onto a Reissner-Nordström black hole. The accretion theory is based on the Boyer-Lindquist coordinates and the Fermi gas follows Fermi-Dirac statistics at infinity. We have derived the expression for the particle current density, the stress energy-momentum tensor, and three accretion rates. As the… ▽ More We investigate the stationary, spherically symmetric accretion of a degenerate relativistic Fermi gas onto a Reissner-Nordström black hole. The accretion theory is based on the Boyer-Lindquist coordinates and the Fermi gas follows Fermi-Dirac statistics at infinity. We have derived the expression for the particle current density, the stress energy-momentum tensor, and three accretion rates. As the charged particle falls into the black hole, both the mass and the charge of the black hole increase. Consequently, the mass accretion rate and charge accretion rate are proportional to the particle accretion rate. We have also provided analytical results at infinity and numerical results within a finite range for these quantities. Our results indicate that the accretion rate decreases as the charge of the black hole increases. Additionally, we found that the Vlasov gas accretion is no longer an isotropic perfect fluid accretion theory in the Boyer-Lindquist coordinates at infinity, mainly due to non-vanishing non-diagonal terms of the stress energy-momentum tensor. Despite this, the radial pressure remains smaller than the tangential pressure even at infinity. This study also suggests that naked singularities are unavoidable in black hole accretion theory. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00326 [pdf, other]

Teola: Towards End-to-End Optimization of LLM-based Applications

Authors: Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu

Abstract: Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling dec… ▽ More Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling decisions. We propose fine-grained end-to-end orchestration, which utilizes task primitives as the basic units and represents each query's workflow as a primitive-level dataflow graph. This explicitly exposes a much larger design space, enables optimizations in parallelization and pipelining across primitives of different modules, and enhances scheduling to improve application-level performance. We build Teola, a novel orchestration framework for LLM-based applications that implements this scheme. Comprehensive experiments show that Teola can achieve up to 2.09x speedup over existing systems across various popular LLM applications. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00319 [pdf, other]

doi 10.1038/s41550-024-02309-5

A slightly oblate dark matter halo revealed by a retrograde precessing Galactic disk warp

Authors: Yang Huang, Qikang Feng, Tigran Khachaturyants, Huawei Zhang, Jifeng Liu, Juntai Shen, Timothy C. Beers, Youjun Lu, Song Wang, Haibo Yuan

Abstract: The shape of the dark matter (DM) halo is key to understanding the hierarchical formation of the Galaxy. Despite extensive efforts in recent decades, however, its shape remains a matter of debate, with suggestions ranging from strongly oblate to prolate. Here, we present a new constraint on its present shape by directly measuring the evolution of the Galactic disk warp with time, as traced by accu… ▽ More The shape of the dark matter (DM) halo is key to understanding the hierarchical formation of the Galaxy. Despite extensive efforts in recent decades, however, its shape remains a matter of debate, with suggestions ranging from strongly oblate to prolate. Here, we present a new constraint on its present shape by directly measuring the evolution of the Galactic disk warp with time, as traced by accurate distance estimates and precise age determinations for about 2,600 classical Cepheids. We show that the Galactic warp is mildly precessing in a retrograde direction at a rate of $ω= -2.1 \pm 0.5 ({\rm statistical}) \pm 0.6 ({\rm systematic})$ km s$^{-1}$ kpc$^{-1}$ for the outer disk over the Galactocentric radius [$7.5, 25$] kpc, decreasing with radius. This constrains the shape of the DM halo to be slightly oblate with a flattening (minor axis to major axis ratio) in the range $0.84 \le q_Φ \le 0.96$. Given the young nature of the disk warp traced by Cepheids (less than 200 Myr), our approach directly measures the shape of the present-day DM halo. This measurement, combined with other measurements from older tracers, could provide vital constraints on the evolution of the DM halo and the assembly history of the Galaxy. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Published in Nature Astronomy on June 27th, 2024. Final published version here: https://www.nature.com/articles/s41550-024-02309-5

arXiv:2407.00308 [pdf]

The role of lattice thermal conductivity suppression by dopants from a holistic perspective

Authors: Shengnan Dai, Shijie Zhang, Ye Sheng, Erting Dong, Sheng Sun, Lili Xi, G. Jeffrey Snyder, **yang Xi, Jiong Yang

Abstract: Dopants play an important role in improving electrical and thermal transport. In the traditional perspective, a dopant suppresses lattice thermal conductivity kL by adding point defect (PD) scattering term to the phonon relaxation time, which has been adopted for decades. In this study, we propose an innovative perspective to solve the kL of defective systems-the holistic approach, i.e., treating… ▽ More Dopants play an important role in improving electrical and thermal transport. In the traditional perspective, a dopant suppresses lattice thermal conductivity kL by adding point defect (PD) scattering term to the phonon relaxation time, which has been adopted for decades. In this study, we propose an innovative perspective to solve the kL of defective systems-the holistic approach, i.e., treating dopant and matrix as a holism. This approach allows us to handle the influences from defects explicitly by the calculations of defective systems, about their changed phonon dispersion, phonon-phonon and electron-phonon interaction, etc, due to the existence of dopants. The kL reduction between defective MxNb1-xFeSb (M=V, Ti) and NbFeSb is used as an example for the holistic approach, and comparable results with experiments are obtained. It is notable that light elemental dopants also induced the avoided-crossing behavior. It can be further rationalized by a one-dimensional atomic chain model. The mass and force constant imbalance generally generates the avoided-crossing phonons, mathematically in a similar way as the coefficients in traditional PD scattering, but along a different direction in kL reduction. Our work provides another perspective for understanding the mechanism of dopants influence in material's thermal transport. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00300 [pdf, other]

On the near soliton dynamics for the 2D cubic Zakharov-Kuznetsov equations

Authors: Gong Chen, Yang Lan, Xu Yuan

Abstract: In this article, we consider the Cauchy problem for the cubic (mass-critical) Zakharov-Kuznetsov equations in dimension two: $$\partial_t u+\partial_{x_1}(Δu+u^3)=0,\quad (t,x)\in [0,\infty)\times \mathbb{R}^{2}.$$ For initial data in $H^1$ close to the soliton with a suitable space-decay property, we fully describe the asymptotic behavior of the corresponding solution. More precisely, for such in… ▽ More In this article, we consider the Cauchy problem for the cubic (mass-critical) Zakharov-Kuznetsov equations in dimension two: $$\partial_t u+\partial_{x_1}(Δu+u^3)=0,\quad (t,x)\in [0,\infty)\times \mathbb{R}^{2}.$$ For initial data in $H^1$ close to the soliton with a suitable space-decay property, we fully describe the asymptotic behavior of the corresponding solution. More precisely, for such initial data, we show that only three possible behaviors can occur: 1) The solution leaves a tube near soliton in finite time; 2) the solution blows up in finite time; 3) the solution is global and locally converges to a soliton. In addition, we show that for initial data near a soliton with non-positive energy and above the threshold mass, the corresponding solution will blow up as described in Case 2. Our proof is inspired by the techniques developed for mass-critical generalized Korteweg-de Vries equation (gKdV) equation in a similar context by Martel-Merle-Raphaël. More precisely, our proof relies on refined modulation estimates and a modified energy-virial Lyapunov functional. The primary challenge in our problem is the lack of coercivity of the Schrödinger operator which appears in the virial-type estimate. To overcome the difficulty, we apply a transform, which was first introduced in Kenig-Martel [13], to perform the virial computations after converting the original problem to the adjoint one. Th coercivity of the Schrödinger operator in the adjoint problem has been numerically verified by Farah-Holmer-Roudenko-Yang [9]. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 65 pages

arXiv:2407.00291 [pdf, other]

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

Abstract: This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging… ▽ More This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second, we consider training loss of our model specific to each datasets for their corresponding classes. This independent learning framework helps the model extract domain-specific features effectively. Lastly, we use the sound event bounding boxes method for post-processing. Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Technical report for DCASE 2024 Challenge Task 4

arXiv:2407.00288 [pdf, ps, other]

A Rank-Two Case of Local-Global Compatibility for $l = p$

Authors: Yuji Yang

Abstract: We prove the classical $l = p$ local-global compatibility conjecture for certain regular algebraic cuspidal automorphic representations of weight 0 for GL$_2$ over CM fields. Using an automorphy lifting theorem, we show that if the automorphic side comes from a twist of Steinberg at $v | l$, then the Galois side has nontrivial monodromy at $v$. Based on this observation, we will give a definition… ▽ More We prove the classical $l = p$ local-global compatibility conjecture for certain regular algebraic cuspidal automorphic representations of weight 0 for GL$_2$ over CM fields. Using an automorphy lifting theorem, we show that if the automorphic side comes from a twist of Steinberg at $v | l$, then the Galois side has nontrivial monodromy at $v$. Based on this observation, we will give a definition of the Fontaine-Mazur $\mathcal{L}$-invariants attached to certain automorphic representations. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 9 pages

arXiv:2407.00285 [pdf, other]

Imaging of single barium atoms in a second matrix site in solid xenon for barium tagging in a $^{136}$Xe double beta decay experiment

Authors: M. Yvaine, D. Fairbank, J. Soderstrom, C. Taylor, J. Stanley, T. Walton, C. Chambers, A. Iverson, W. Fairbank, S. Al Kharusi, A. Amy, E. Angelico, A. Anker, I. J. Arnquist, A. Atencio, J. Bane, V. Belov, E. P. Bernard, T. Bhatta, A. Bolotnikov, J. Breslin, P. A. Breur, J. P. Brodsky, E. Brown, T. Brunner , et al. (112 additional authors not shown)

Abstract: Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform s… ▽ More Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform single atom imaging of Ba atoms in a single-vacancy site of a solid xenon matrix. In this paper, the effort to identify signal from individual barium atoms is extended to Ba atoms in a hexa-vacancy site in the matrix and is achieved despite increased photobleaching in this site. Abrupt fluorescence turn-off of a single Ba atom is also observed. Significant recovery of fluorescence signal lost through photobleaching is demonstrated upon annealing of Ba deposits in the Xe ice. Following annealing, it is observed that Ba atoms in the hexa-vacancy site exhibit antibleaching while Ba atoms in the tetra-vacancy site exhibit bleaching. This may be evidence for a matrix site transfer upon laser excitation. Our findings offer a path of continued research toward tagging of Ba daughters in all significant sites in solid xenon. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 9 pages, 8 figures

arXiv:2407.00283 [pdf, other]

Gravitational waveforms from periodic orbits around a quantum-corrected black hole

Authors: Sen Yang, Yu-Peng Zhang, Tao Zhu, Li Zhao, Yu-Xiao Liu

Abstract: Extreme mass-ratio inspirals are crucial sources for future space-based gravitational wave detections. Gravitational waveforms emitted by extreme mass-ratio inspirals are closely related to the orbital dynamics of small celestial objects, which vary with the underlying spacetime geometry. Despite the tremendous success of general relativity, there are unsolved issues such as singularities in both… ▽ More Extreme mass-ratio inspirals are crucial sources for future space-based gravitational wave detections. Gravitational waveforms emitted by extreme mass-ratio inspirals are closely related to the orbital dynamics of small celestial objects, which vary with the underlying spacetime geometry. Despite the tremendous success of general relativity, there are unsolved issues such as singularities in both black holes and cosmology. Loop quantum gravity, a theory addressing these singularity problems, offers a framework for regular black holes. In this paper, we focus on periodic orbits of a small celestial object around a supermassive quantum-corrected black hole in loop quantum gravity and compute the corresponding gravitational waveforms. We view the small celestial object as a massive test particle and obtain its four-velocity and effective potential. Our results indicate that the quantum parameter $\hatα$ influences the shape of the effective potential. We explore the effects of quantum corrections on marginally bound orbits, innermost stable circular orbits, and other periodic orbits. Using the numerical kludge scheme, we further explore the gravitational waveforms of the small celestial object along different periodic orbits. The waveforms exhibit distinct zoom and whirl phases in a complete orbital period, closely tied to the quantum parameter $\hatα$. We also perform a spectral analysis of the gravitational waves from these periodic orbits and assess their detectability. With the steady progress of space-based gravitational wave detection programs, our findings will contribute to utilizing extreme mass-ratio inspirals to test and understand the properties of quantum-corrected black holes. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 16 pages, 12 figures, and 2 tables

arXiv:2407.00281 [pdf]

Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator

Authors: Khanh Duy Nguyen, Woojoo Lee, Jianchen Dang, Tongyao Wu, Gabriele Berruto, Chenhui Yan, Chi Ian Jess Ip, Haoran Lin, Qiang Gao, Seng Huat Lee, Binghai Yan, Chaoxing Liu, Zhiqiang Mao, Xiao-Xiao Zhang, Shuolong Yang

Abstract: The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th… ▽ More The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of this mechanism remains elusive, especially in intrinsic MTIs. Here we combine time-resolved photoemission spectroscopy with time-resolved magneto-optical Kerr effect measurements to elucidate the unique electromagnetism at the surface of an intrinsic MTI MnBi2Te4. Theoretical modeling based on 2D Ruderman-Kittel-Kasuya-Yosida interactions captures the initial quenching of a surface-rooted exchange gap within a factor of two but over-estimates the bulk demagnetization by one order of magnitude. This mechanism directly explains the sizable gap in the quasi-2D electronic state and the nonzero residual magnetization in even-layer MnBi2Te4. Furthermore, it leads to efficient light-induced demagnetization comparable to state-of-the-art magnetophotonic crystals, promising an effective manipulation of magnetism and topological orders for future topotronics. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 19 pages, 4 figures

arXiv:2407.00207 [pdf, other]

CIS: Composable Instruction Set for Streaming Applications: Design, Modeling, and Scheduling

Authors: Yu Yang, Jordi Altayó González, Ahmed Hemani

Abstract: The efficiency improvement of hardware accelerators such as single-instruction-multiple-data (SIMD) and coarse-grained reconfigurable architecture (CGRA) empowers the rapid advancement of AI and machine learning applications. These streaming applications consist of numerous vector operations that can be naturally parallelized. Despite the outstanding achievements of today's hardware accelerators,… ▽ More The efficiency improvement of hardware accelerators such as single-instruction-multiple-data (SIMD) and coarse-grained reconfigurable architecture (CGRA) empowers the rapid advancement of AI and machine learning applications. These streaming applications consist of numerous vector operations that can be naturally parallelized. Despite the outstanding achievements of today's hardware accelerators, their potential is limited by their instruction set design. Traditional instruction sets, designed for microprocessors and accelerators, focus on computation and pay little attention to instruction composability and instruction-level cooperation. It leads to a rigid instruction set that is difficult to extend and significant control overhead in hardware. This paper presents an instruction set that is composable in both spatial and temporal sense and suitable for streaming applications. The proposed instruction set contains significantly fewer instruction types but can still efficiently implement complex multi-level loop structures, which is essential for accelerating streaming applications. It is also a resource-centric instruction set that can be conveniently extended by adding new hardware resources, thus creating a custom heterogeneous computation machine. Besides presenting the composable instruction set, we propose a simple yet efficient instruction scheduling algorithm. We analyzed the scalability of the scheduling algorithm and compared the efficiency of our compiled programs against RISC-V programs. The results indicate that our scheduling algorithm scales linearly, and our instruction set leads to near-optimal execution latency. The mapped applications on CIS are nearly 10 times faster than the RISC-V version. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00203 [pdf, other]

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Authors: Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, **gxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

Abstract: Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology imag… ▽ More Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract representative WSI patches, generating and refining captions to obtain high-quality image-text pairs. Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, PathGen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks. Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. Overall, we provide a scalable pathway for high-quality data generation in pathology, paving the way for next-generation general pathology models. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 13 pages, 3 figures

arXiv:2407.00167 [pdf, other]

Can GPT-4 Help Detect Quit Va** Intentions? An Exploration of Automatic Data Annotation Approach

Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

Abstract: In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due… ▽ More In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one va** sub-community on Reddit to analyze users' quit-va** intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit va** intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)

arXiv:2407.00136 [pdf, other]

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00118 [pdf, other]

From Efficient Multimodal Models to World Models: A Survey

Authors: Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang

Abstract: Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of… ▽ More Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of key techniques such as Multimodal Chain of Thought (M-COT), Multimodal Instruction Tuning (M-IT), and Multimodal In-Context Learning (M-ICL). Additionally, we discuss both the fundamental and specific technologies of multimodal models, highlighting their applications, input/output modalities, and design characteristics. Despite significant advancements, the development of a unified multimodal model remains elusive. We discuss the integration of 3D generation and embodied intelligence to enhance world simulation capabilities and propose incorporating external rule systems for improved reasoning and decision-making. Finally, we outline future research directions to address these challenges and advance the field. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2407.00113 [pdf, other]

doi 10.1145/3637528.3671948

Personalized Federated Continual Learning via Multi-granularity Prompt

Authors: Hao Yu, Xin Yang, Xin Gao, Yan Kang, Hao Wang, Junbo Zhang, Tianrui Li

Abstract: Personalized Federated Continual Learning (PFCL) is a new practical scenario that poses greater challenges in sharing and personalizing knowledge. PFCL not only relies on knowledge fusion for server aggregation at the global spatial-temporal perspective but also needs model improvement for each client according to the local requirements. Existing methods, whether in Personalized Federated Learning… ▽ More Personalized Federated Continual Learning (PFCL) is a new practical scenario that poses greater challenges in sharing and personalizing knowledge. PFCL not only relies on knowledge fusion for server aggregation at the global spatial-temporal perspective but also needs model improvement for each client according to the local requirements. Existing methods, whether in Personalized Federated Learning (PFL) or Federated Continual Learning (FCL), have overlooked the multi-granularity representation of knowledge, which can be utilized to overcome Spatial-Temporal Catastrophic Forgetting (STCF) and adopt generalized knowledge to itself by coarse-to-fine human cognitive mechanisms. Moreover, it allows more effectively to personalized shared knowledge, thus serving its own purpose. To this end, we propose a novel concept called multi-granularity prompt, i.e., coarse-grained global prompt acquired through the common model learning process, and fine-grained local prompt used to personalize the generalized representation. The former focuses on efficiently transferring shared global knowledge without spatial forgetting, and the latter emphasizes specific learning of personalized local knowledge to overcome temporal forgetting. In addition, we design a selective prompt fusion mechanism for aggregating knowledge of global prompts distilled from different clients. By the exclusive fusion of coarse-grained knowledge, we achieve the transmission and refinement of common knowledge among clients, further enhancing the performance of personalization. Extensive experiments demonstrate the effectiveness of the proposed method in addressing STCF as well as improving personalized performance. Our code now is available at https://github.com/SkyOfBeginning/FedMGP. △ Less

Submitted 27 June, 2024; originally announced July 2024.

Comments: Accepted by KDD 2024 Research Track

arXiv:2407.00088 [pdf, other]

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native support for mpGEMM, resort to dequantize weights for high precision computation. Such an indirect way can lead to a significant inference overhead. In this paper, we introduce T-MAC, an innovative lookup table(LUT)-based method designed for efficient low-bit LLM (i.e., weight-quantized LLM) inference on CPUs. T-MAC directly supports mpGEMM without dequantization, while simultaneously eliminating multiplications and reducing additions required. Specifically, T-MAC transforms the traditional data-type-centric multiplication to bit-wise table lookup, and enables a unified and scalable mpGEMM solution. Our LUT-based kernels scale linearly to the weight bit-width. Evaluated on low-bit Llama and BitNet models, T-MAC demonstrates up to 4x increase in throughput and 70% reduction in energy consumption compared to llama.cpp. For BitNet-b1.58-3B, T-MAC delivers a token generation throughput of 30 tokens/s with a single core and 71 tokens/s with eight cores on M2-Ultra, and 11 tokens/s on lower-end devices like Raspberry Pi 5, which significantly exceeds the adult average reading speed. T-MAC with LUT-based computing paradigm, paves the way for the practical deployment of low-bit LLMs on resource-constrained edge devices without compromising computational efficiency. The system is open-sourced at https://github.com/microsoft/T-MAC. △ Less

Submitted 25 June, 2024; originally announced July 2024.

arXiv:2407.00072 [pdf, other]

Pistis-RAG: A Scalable Cascading Framework Towards Content-Centric Retrieval-Augmented Generation

Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing the search space, prioritizing semantically relevant documents, aligning with the large language model's (LLM) preferences, supporting complex chain-of-thought (CoT) methods, and combining information from multiple sources. Our ranking stage introduces a significant innovation by recognizing that semantic relevance alone may not lead to improved generation quality, due to the sensitivity of the few-shot prompt order, as noted in previous research. This critical aspect is often overlooked in current RAG frameworks. We argue that the alignment issue between LLMs and external knowledge ranking methods is tied to the model-centric paradigm dominant in RAG systems. We propose a content-centric approach, emphasizing seamless integration between LLMs and external information sources to optimize content transformation for specific tasks. Our novel ranking stage is designed specifically for RAG systems, incorporating principles of information retrieval while considering the unique business scenarios reflected in LLM preferences and user feedback. We simulated feedback signals on the MMLU benchmark, resulting in a 9.3% performance improvement. Our model and code will be open-sourced on GitHub. Additionally, experiments on real-world, large-scale data validate the scalability of our framework. △ Less

Submitted 3 July, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

arXiv:2407.00046 [pdf, other]

Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

Authors: Dewen Guo, Minchen Li, Yin Yang, Guo** Wang, Sheng Li

Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint sets that vary in each iteration present additional challenges in algorithm convergence. Our method employs a novel barrier-augmented Lagrangian method to improve system conditioning and solver efficiency by adaptively updating an augmentation constraint sets. This enables the utilization of a scalable, inexact Newton-PCG solver with sparse GPU storage, eliminating the need for direct factorization. We further enhance PCG convergence speed with a domain-decomposed warm start strategy based on an eigenvalue spectrum approximated through our in-time assembly. Demonstrating significant scalability improvements, our method makes simulations previously impractical on 128 GB of CPU memory feasible with only 8 GB of GPU memory and orders-of-magnitude faster. Additionally, our method adeptly handles stiff problems, surpassing the capabilities of existing GPU-based interior-point methods. Our results, validated across various complex collision scenarios involving intricate geometries and large deformations, highlight the exceptional performance of our approach. △ Less

Submitted 4 June, 2024; originally announced July 2024.

Comments: 17 pages, 30 figures

arXiv:2407.00031 [pdf, other]

Supercharging Federated Learning with Flower and NVIDIA FLARE

Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications. △ Less

Submitted 21 May, 2024; originally announced July 2024.

arXiv:2407.00020 [pdf, other]

Visual Language Model based Cross-modal Semantic Communication Systems

Authors: Feibo Jiang, Chuanguo Tang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

Abstract: Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-N… ▽ More Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-Noise Ratio (SNR). To address these challenges, we propose a novel Vision-Language Model-based Cross-modal Semantic Communication (VLM-CSC) system. The VLM-CSC comprises three novel components: (1) Cross-modal Knowledge Base (CKB) is used to extract high-density textual semantics from the semantically sparse image at the transmitter and reconstruct the original image based on textual semantics at the receiver. The transmission of high-density semantics contributes to alleviating bandwidth pressure. (2) Memory-assisted Encoder and Decoder (MED) employ a hybrid long/short-term memory mechanism, enabling the semantic encoder and decoder to overcome catastrophic forgetting in dynamic environments when there is a drift in the distribution of semantic features. (3) Noise Attention Module (NAM) employs attention mechanisms to adaptively adjust the semantic coding and the channel coding based on SNR, ensuring the robustness of the CSC system. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system. △ Less

Submitted 6 May, 2024; originally announced July 2024.

Comments: 12 pages, 10 figures

arXiv:2406.20087 [pdf, other]

ProgressGym: Alignment with a Millennium of Moral Progress

Authors: Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang

Abstract: Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigat… ▽ More Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots. To empower research in progress alignment, we introduce ProgressGym, an experimental framework allowing the learning of moral progress mechanics from history, in order to facilitate future progress in real-world moral decisions. Leveraging 9 centuries of historical text and 18 historical LLMs, ProgressGym enables codification of real-world progress alignment challenges into concrete benchmarks. Specifically, we introduce three core challenges: tracking evolving values (PG-Follow), preemptively anticipating moral progress (PG-Predict), and regulating the feedback loop between human and AI value shifts (PG-Coevolve). Alignment methods without a temporal dimension are inapplicable to these tasks. In response, we present lifelong and extrapolative algorithms as baseline methods of progress alignment, and build an open leaderboard soliciting novel algorithms and challenges. The framework and the leaderboard are available at https://github.com/PKU-Alignment/ProgressGym and https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard respectively. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.20081 [pdf, other]

Segment Anything without Supervision

Authors: XuDong Wang, **gfeng Yang, Trevor Darrell

Abstract: The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into inst… ▽ More The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into instance/semantic level segments. For all pixels within a segment, a bottom-up clustering method is employed to iteratively merge them into larger groups, thereby forming a hierarchical structure. These unsupervised multi-granular masks are then utilized to supervise model training. Evaluated across seven popular datasets, UnSAM achieves competitive results with the supervised counterpart SAM, and surpasses the previous state-of-the-art in unsupervised segmentation by 11% in terms of AR. Moreover, we show that supervised SAM can also benefit from our self-supervised labels. By integrating our unsupervised pseudo masks into SA-1B's ground-truth masks and training UnSAM with only 1% of SA-1B, a lightly semi-supervised UnSAM can often segment entities overlooked by supervised SAM, exceeding SAM's AR by over 6.7% and AP by 3.9% on SA-1B. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/frank-xwang/UnSAM

arXiv:2406.20078 [pdf, other]

GM-DF: Generalized Multi-Scenario Deepfake Detection

Authors: Yingxin Lai, Zitong Yu, **g Yang, Bin Li, Xiangui Kang, Linlin Shen

Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.20038 [pdf, other]

BioMNER: A Dataset for Biomedical Method Entity Recognition

Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources, primarily attributed to the intricate nature of methodological concepts, which necessitate a profound understanding for precise delineation. In this study, we propose a novel dataset for biomedical method entity recognition, employing an automated BioMethod entity recognition and information retrieval system to assist human annotation. Furthermore, we comprehensively explore a range of conventional and contemporary open-domain NER methodologies, including the utilization of cutting-edge large-scale language models (LLMs) customised to our dataset. Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns pertaining to biomedical methods. Remarkably, the approach, leveraging the modestly sized ALBERT model (only 11MB), in conjunction with conditional random fields (CRF), achieves state-of-the-art (SOTA) performance. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.20016 [pdf, ps, other]

A new method for finding more symmetry relations of Feynman integrals

Authors: Zihao Wu, Yang Zhang

Abstract: We introduce a new method for deriving Feynman integral symmetry relation. By solving the ansatz of momentum transformation in the field of rational functions rather than constants, the method can sometimes find more symmetry relations, comparing with some state-of-art software. The new method may help to further decrease the number of master integrals in an integral family. Well-chosen gauge cond… ▽ More We introduce a new method for deriving Feynman integral symmetry relation. By solving the ansatz of momentum transformation in the field of rational functions rather than constants, the method can sometimes find more symmetry relations, comparing with some state-of-art software. The new method may help to further decrease the number of master integrals in an integral family. Well-chosen gauge conditions are implemented in this method, for the efficient symmetry searching. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 10 pages, 7 figures

Report number: USTC-ICTS/PCFT-24-20

arXiv:2406.20015 [pdf, other]

ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Authors: Yuxiang Zhang, **g Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

Abstract: Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre… ▽ More Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and breadth. In terms of depth, we propose a multi-level diagnostic process, including (1) solvability detection, (2) solution planning, and (3) missing-tool analysis. For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools. Furthermore, we developed seven tasks and collected 700 evaluation samples through multiple rounds of manual annotation. The results show the significant challenges presented by the ToolBH benchmark. The current advanced models Gemini-1.5-Pro and GPT-4o only achieve a total score of 45.3 and 37.0, respectively, on a scale of 100. In this benchmark, larger model parameters do not guarantee better performance; the training data and response strategies also play a crucial role in tool-enhanced LLM scenarios. Our diagnostic analysis indicates that the primary reason for model errors lies in assessing task solvability. Additionally, open-weight models suffer from performance drops with verbose replies, whereas proprietary models excel with longer reasoning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19995 [pdf, other]

Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

Authors: Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing Chen, Yang Liu

Abstract: This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived… ▽ More This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived from the original without the need for retraining from scratch. We detail the implementation of PLRD, which strategically decreases the tensor ranks, thus optimizing the trade-off between model performance and resource usage. The efficacy of PLRD is demonstrated through extensive experiments showing that models trained with PLRD method on only 1B tokens maintain comparable performance with traditionally trained models while using 0.1% of the tokens. The versatility of PLRD is highlighted by its ability to generate multiple model sizes from a single foundational model, adapting fluidly to varying computational and memory budgets. Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs, making advanced AI more feasible on diverse platforms. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19969 [pdf, other]

Enhancing Terrestrial Net Primary Productivity Estimation with EXP-CASA: A Novel Light Use Efficiency Model Approach

Authors: Guanzhou Chen, Kaiqi Zhang, Xiaodong Zhang, Hong Xie, Haobo Yang, Xiaoliang Tan, Tong Wang, Yule Ma, Qing Wang, **zhou Cao, Weihong Cui

Abstract: The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation… ▽ More The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation. Additionally, the saturation effect of the Normalized Difference Vegetation Index (NDVI), a key variable in the CASA model, weakened the accuracy of CASA's NPP predictions in densely vegetated areas. To address these limitations, this study introduces the Exponential-CASA (EXP-CASA) model. The EXP-CASA model effectively improves the CASA model by using novel functions for estimating the fraction of absorbed photosynthetically active radiation (FPAR) and environmental stress, by utilizing long-term observational data from FLUXNET and MODIS surface reflectance data. In a comparative analysis of NPP estimation accuracy among four different NPP products, EXP-CASA ($R^2 = 0.68, RMSE= 1.1gC\cdot m^{-2} \cdot d^{-1}$) outperforms others, followed by GLASS-NPP, and lastly MODIS-NPP and classic CASA. Additionally, this research assesses the EXP-CASA model's adaptability to various vegetation indices, evaluates the sensitivity and stability of its parameters over time, and compares its accuracy against other leading NPP estimation products. The findings reveal that the EXP-CASA model exhibits strong adaptability to diverse vegetation indices and stability of model parameters over time series. By introducing a novel estimation approach that optimizes model construction, the EXP-CASA model remarkably improves the accuracy of NPP estimations and paves the way for global-scale, consistent, and continuous assessment of vegetation NPP. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19959 [pdf, other]

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge this simulation-to-real gap, this paper presents a new relatively large-scale Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset. The proposed dataset is valuable in two aspects: 1) benchmarking speech enhancement and localization algorithms in real scenarios; 2) offering a substantial amount of real-world training data for potentially improving the performance of real-world applications. Specifically, a 32-channel array with high-fidelity microphones is used for recording. A loudspeaker is used for playing source speech signals. A total of 83-hour speech signals (48 hours for static speaker and 35 hours for moving speaker) are recorded in 32 different scenes, and 144 hours of background noise are recorded in 31 different scenes. Both speech and noise recording scenes cover various common indoor, outdoor, semi-outdoor and transportation environments, which enables the training of general-purpose speech enhancement and source localization networks. To obtain the task-specific annotations, the azimuth angle of the loudspeaker is annotated with an omni-direction fisheye camera by automatically detecting the loudspeaker. The direct-path signal is set as the target clean speech for speech enhancement, which is obtained by filtering the source speech signal with an estimated direct-path propagation filter. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19939 [pdf, other]

Data-driven methods for flow and transport in porous media: a review

Authors: Guang Yang, Ran Xu, Yusong Tian, Songyuan Guo, **gyi Wu, Xu Chu

Abstract: This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in… ▽ More This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19905 [pdf, other]

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Authors: Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they e… ▽ More The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they employ a router to predict the routing for each token. However, the predictions are based solely on sample features and do not truly reveal the optimization direction of tokens. This can lead to severe optimization conflicts between different tokens within an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis. Specifically, we first use token-level gradients to identify conflicting tokens in experts. Then, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate the effectiveness of our method. The code will be publicly available at https://github.com/longrongyang/STGC. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Showing 151–200 of 59,936 results for author: Yang