Search | arXiv e-print repository

arXiv:2406.12300 [pdf]

IR2QSM: Quantitative Susceptibility Map** via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

Abstract: Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-based IR2QSM method for QSM reconstruction. It is designed by iterating four times of a reverse concatenations and middle recurrent modules enhanced U-net, which could dramatically improve the efficiency of latent feature utilization. Simulated and in vivo experiments were conducted to compare IR2QSM with several traditional algorithms (MEDI and iLSQR) and state-of-the-art deep learning methods (U-net, xQSM, and LPCNN). The results indicated that IR2QSM was able to obtain QSM images with significantly increased accuracy and mitigated artifacts over other methods. Particularly, IR2QSM demonstrated on average the best NRMSE (27.59%) in simulated experiments, which is 15.48%, 7.86%, 17.24%, 9.26%, and 29.13% lower than iLSQR, MEDI, U-net, xQSM, LPCNN, respectively, and led to improved QSM results with fewer artifacts for the in vivo data. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures

arXiv:2406.12287 [pdf, ps, other]

$L^2$-type invariants for complex smooth quasi-projective varieties -- a survey

Authors: Yongqiang Liu

Abstract: Let $X$ be a complex smooth quasi-projective variety with an epimorphism $ν\colon π_1(X)\twoheadrightarrow \mathbb{Z}^n$. We survey recent developments about the asymptotic behaviour of Betti numbers with any field coefficients and the order of the torsion part of singular integral homology of finite abelian covers of $X$ associated to $ν$, known as the $L^2$-type invariants. We give relations bet… ▽ More Let $X$ be a complex smooth quasi-projective variety with an epimorphism $ν\colon π_1(X)\twoheadrightarrow \mathbb{Z}^n$. We survey recent developments about the asymptotic behaviour of Betti numbers with any field coefficients and the order of the torsion part of singular integral homology of finite abelian covers of $X$ associated to $ν$, known as the $L^2$-type invariants. We give relations between $L^2$-type invariants, Alexander invariants and cohomology jump loci. When $ν$ is orbifold effective, we give explicit formulas for $L^2$-invariants at homological degree one in terms of geometric information of $X$. We also propose several related open questions for hyperplane arrangement complement. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 21 pages. arXiv admin note: text overlap with arXiv:2110.03356

arXiv:2406.12242 [pdf, other]

doi 10.1609/aaai.v38i8.28795

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Authors: Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue

Abstract: Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g.,… ▽ More Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12222 [pdf, other]

doi 10.1145/3637528.3671879

BadSampler: Harnessing the Power of Catastrophic Forgetting to Poison Byzantine-robust Federated Learning

Authors: Yi Liu, Cong Wang, Xingliang Yuan

Abstract: Federated Learning (FL) is susceptible to poisoning attacks, wherein compromised clients manipulate the global model by modifying local datasets or sending manipulated model updates. Experienced defenders can readily detect and mitigate the poisoning effects of malicious behaviors using Byzantine-robust aggregation rules. However, the exploration of poisoning attacks in scenarios where such behavi… ▽ More Federated Learning (FL) is susceptible to poisoning attacks, wherein compromised clients manipulate the global model by modifying local datasets or sending manipulated model updates. Experienced defenders can readily detect and mitigate the poisoning effects of malicious behaviors using Byzantine-robust aggregation rules. However, the exploration of poisoning attacks in scenarios where such behaviors are absent remains largely unexplored for Byzantine-robust FL. This paper addresses the challenging problem of poisoning Byzantine-robust FL by introducing catastrophic forgetting. To fill this gap, we first formally define generalization error and establish its connection to catastrophic forgetting, paving the way for the development of a clean-label data poisoning attack named BadSampler. This attack leverages only clean-label data (i.e., without poisoned data) to poison Byzantine-robust FL and requires the adversary to selectively sample training data with high loss to feed model training and maximize the model's generalization error. We formulate the attack as an optimization problem and present two elegant adversarial sampling strategies, Top-$κ$ sampling, and meta-sampling, to approximately solve it. Additionally, our formal error upper bound and time complexity analysis demonstrate that our design can preserve attack utility with high efficiency. Extensive evaluations on two real-world datasets illustrate the effectiveness and performance of our proposed attacks. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD' 24), August 25-29, 2024, Barcelona, Spain

arXiv:2406.12111 [pdf, other]

Precision measurement of the $Ξ^-_b$ baryon lifetime

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1064 additional authors not shown)

Abstract: A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys… ▽ More A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2014-010.html (LHCb public pages)

Report number: LHCb-PAPER-2024-010, CERN-EP-2024-139

arXiv:2406.11937 [pdf, other]

Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Prepared for submission to JINST

arXiv:2406.11906 [pdf, other]

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

Authors: **gbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics. In this work, we present the first unified benchmark NovoBench for \emph{de novo} peptide sequencing, which comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent impressive methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $π$-HelixNovo are integrated into our framework. In addition to amino acid-level and peptide-level precision and recall, we evaluate the models' performance in terms of identifying post-tranlational modifications (PTMs), efficiency and robustness to peptide length, noise peaks and missing fragment ratio, which are important influencing factors while seldom be considered. Leveraging this benchmark, we conduct a large-scale study of current methods, report many insightful findings that open up new possibilities for future development. The benchmark will be open-sourced to facilitate future research and application. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11831 [pdf, other]

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Authors: Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu

Abstract: Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the… ▽ More Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the prompt-following ability in image generation. We identified two main obstacles behind this issue. One is the misalignment between the next token prediction training in LLM and the requirement for discriminative prompt features in diffusion models. The other is the intrinsic positional bias introduced by the decoder-only architecture. To deal with this issue, we propose a novel framework to fully harness the capabilities of LLMs. Through the carefully designed usage guidance, we effectively enhance the text representation capability for prompt encoding and eliminate its inherent positional bias. This allows us to integrate state-of-the-art LLMs into the text-to-image generation model flexibly. Furthermore, we also provide an effective manner to fuse multiple LLMs into our framework. Considering the excellent performance and scaling capabilities demonstrated by the transformer architecture, we further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework. We conduct extensive experiments to validate LI-DiT across model size and data size. Benefiting from the inherent ability of the LLMs and our innovative designs, the prompt understanding performance of LI-DiT easily surpasses state-of-the-art open-source models as well as mainstream closed-source commercial models including Stable Diffusion 3, DALL-E 3, and Midjourney V6. The powerful LI-DiT-10B will be available through the online platform and API after further optimization and security checks. △ Less

Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11685 [pdf, other]

Edge Classification on Graphs: New Directions in Topological Imbalance

Authors: Xueqi Cheng, Yu Wang, Yunchao Liu, Yuying Zhao, Charu C. Aggarwal, Tyler Derr

Abstract: Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. W… ▽ More Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel `Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Based on this, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. Ultimately, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Through extensive experiments, we demonstrate the efficacy of our proposed strategies on newly curated datasets and thus establish a new benchmark for (imbalanced) edge classification. △ Less

Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11468 [pdf, ps, other]

Fractional Brauer configuration algebras I: definitions and examples

Authors: Nengqun Li, Yuming Liu

Abstract: In 2017, Green and Schroll introduced a generalization of Brauer graph algebras which they call Brauer configuration algebras. In the present paper, we further generalize Brauer configuration algebras to fractional Brauer configuration algebras by generalizing Brauer configurations to fractional Brauer configurations. The fractional Brauer configuration algebras are locally bounded but neither fin… ▽ More In 2017, Green and Schroll introduced a generalization of Brauer graph algebras which they call Brauer configuration algebras. In the present paper, we further generalize Brauer configuration algebras to fractional Brauer configuration algebras by generalizing Brauer configurations to fractional Brauer configurations. The fractional Brauer configuration algebras are locally bounded but neither finite-dimensional nor symmetric in general. We show that if the fractional Brauer configuration is of type S (resp. of type MS), then the corresponding fractional Brauer configuration algebra is a locally bounded Frobenius algebra (resp. a locally bounded special multiserial Frobenius algebra). Moreover, we show that over an algebraically closed field, the class of finite-dimensional indecomposable representation-finite fractional Brauer configuration algebras in type S coincides with the class of basic indecomposable finite-dimensional standard representation-finite self-injective algebras. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 29 pages

MSC Class: 18Bxx; 16Gxx

arXiv:2406.11429 [pdf, other]

Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction

Authors: Shilong Li, Ge Bai, Zhang Zhang, Ying Liu, Chenji Lu, Daichi Guo, Ruifang Liu, Yong Sun

Abstract: Predicting unseen relations that cannot be observed during the training phase is a challenging task in relation extraction. Previous works have made progress by matching the semantics between input instances and label descriptions. However, fine-grained matching often requires laborious manual annotation, and rich interactions between instances and label descriptions come with significant computat… ▽ More Predicting unseen relations that cannot be observed during the training phase is a challenging task in relation extraction. Previous works have made progress by matching the semantics between input instances and label descriptions. However, fine-grained matching often requires laborious manual annotation, and rich interactions between instances and label descriptions come with significant computational overhead. In this work, we propose an efficient multi-grained matching approach that uses virtual entity matching to reduce manual annotation cost, and fuses coarse-grained recall and fine-grained classification for rich interactions with guaranteed inference speed. Experimental results show that our approach outperforms the previous State Of The Art (SOTA) methods, and achieves a balance between inference efficiency and prediction accuracy in zero-shot relation extraction tasks. Our code is available at https://github.com/longls777/EMMA. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted to the main conference of NAACL2024

arXiv:2406.11412 [pdf, ps, other]

Some Bounds on the Energy of Graphs with Self-Loops regarding $λ_{1}$ and $λ_{n}$

Authors: Minghua Li, Yue Liu

Abstract: Let $G_{S}$ be a graph with $n$ vertices obtained from a simple graph $G$ by attaching one self-loop at each vertex in $S \subseteq V(G)$. The energy of $G_{S}$ is defined by Gutman et al. as $E(G_{S})=\sum_{i=1}^{n}\left| λ_{i} -\fracσ{n} \right|$, where $λ_{1},\dots,λ_{n}$ are the adjacency eigenvalues of $G_{S}$ and $σ$ is the number of self-loops of $G_{S}$. In this paper, several upper and lo… ▽ More Let $G_{S}$ be a graph with $n$ vertices obtained from a simple graph $G$ by attaching one self-loop at each vertex in $S \subseteq V(G)$. The energy of $G_{S}$ is defined by Gutman et al. as $E(G_{S})=\sum_{i=1}^{n}\left| λ_{i} -\fracσ{n} \right|$, where $λ_{1},\dots,λ_{n}$ are the adjacency eigenvalues of $G_{S}$ and $σ$ is the number of self-loops of $G_{S}$. In this paper, several upper and lower bounds of $E(G_{S})$ regarding $λ_{1}$ and $λ_{n}$ are obtained. Especially, the upper bound $E(G_{S}) \leq \sqrt{n\left(2m+σ-\frac{σ^{2}}{n}\right)}$ $(\ast)$ given by Gutman et al. is improved to the following bound \begin{align*} E(G_{S})\leq \sqrt{n\left(2m+σ-\frac{σ^{2}}{n}\right)-\frac{n}{2}\left(\left |λ_{1}-\fracσ{n}\right |-\left |λ_{n}-\fracσ{n}\right |\right)^{2}}, \end{align*} where $\left| λ_{1}-\fracσ{n}\right| \geq \dots \geq \left| λ_{n}-\fracσ{n}\right|$. Moreover, all graphs are characterized when the equality holds in Gutmans' bound $(\ast)$ by using this new bound. △ Less

Submitted 17 June, 2024; originally announced June 2024.

MSC Class: 05C50; 05C90

arXiv:2406.11370 [pdf, other]

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments

Authors: Han Zhou, Xingchen Wan, Yinhong Liu, Nigel Collier, Ivan Vulić, Anna Korhonen

Abstract: Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In thi… ▽ More Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human judgments. To this end, we propose a zero-shot learning objective based on the preference decision fairness. ZEPO demonstrates substantial performance improvements over state-of-the-art LLM evaluators, without requiring labeled data, on representative meta-evaluation benchmarks. Our findings underscore the critical correlation between preference fairness and human alignment, positioning ZEPO as an efficient prompt optimizer for bridging the gap between LLM evaluators and human judgments. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures, 1 table (12 pages, 4 figures, 6 tables including references and appendices)

arXiv:2406.11285 [pdf, other]

Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment

Authors: Jie Li, Yi Liu, Chongyang Liu, Xiaoning Ren, Ling Shi, Weisong Sun, Yinxing Xue

Abstract: Large Language Models (LLMs) like OpenAI's GPT series, Anthropic's Claude, and Meta's LLaMa have shown remarkable capabilities in text generation. However, their susceptibility to toxic prompts presents significant security challenges. This paper investigates alignment techniques, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), to mitigate these risks.… ▽ More Large Language Models (LLMs) like OpenAI's GPT series, Anthropic's Claude, and Meta's LLaMa have shown remarkable capabilities in text generation. However, their susceptibility to toxic prompts presents significant security challenges. This paper investigates alignment techniques, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), to mitigate these risks. We conduct an empirical study on refusal patterns across nine LLMs, revealing that models with uniform refusal patterns, such as Claude3, exhibit higher security. Based on these findings, we propose self-distilling and cross-model distilling methods to enhance LLM security. Our results show that these methods significantly improve refusal rates and reduce unsafe content, with cross-model distilling achieving refusal rates close to Claude3's 94.51%. These findings underscore the potential of distillation-based alignment in securing LLMs against toxic prompts. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11282 [pdf, other]

From Pixels to Progress: Generating Road Network from Satellite Imagery for Socioeconomic Insights in Impoverished Areas

Authors: Yanxin Xi, Yu Liu, Zhicheng Liu, Sasu Tarkoma, Pan Hui, Yong Li

Abstract: The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas. Those areas rely on road infrastructure construction to promote accessibility and economic development. Although publicly available data like OpenStreetMap is available to monitor road status, data completeness in impoveri… ▽ More The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas. Those areas rely on road infrastructure construction to promote accessibility and economic development. Although publicly available data like OpenStreetMap is available to monitor road status, data completeness in impoverished areas is limited. Meanwhile, the development of deep learning techniques and satellite imagery shows excellent potential for earth monitoring. To tackle the challenge of road network assessment in impoverished areas, we develop a systematic road extraction framework combining an encoder-decoder architecture and morphological operations on satellite imagery, offering an integrated workflow for interdisciplinary researchers. Extensive experiments of road network extraction on real-world data in impoverished regions achieve a 42.7% enhancement in the F1-score over the baseline methods and reconstruct about 80% of the actual roads. We also propose a comprehensive road network dataset covering approximately 794,178 km2 area and 17.048 million people in 382 impoverished counties in China. The generated dataset is further utilized to conduct socioeconomic analysis in impoverished counties, showing that road network construction positively impacts regional economic development. The technical appendix, code, and generated dataset can be found at https://github.com/tsinghua-fib-lab/Road_network_extraction_impoverished_counties. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 13 figures, IJCAI2024 (AI and Social Good)

arXiv:2406.11265 [pdf, ps, other]

Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

Authors: Yuanzhe Geng, Erwu Liu, Wei Ni, Rui Wang, Yan Liu, Hao Xu, Chen Cai, Abbas Jamalipour

Abstract: This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays for… ▽ More This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays form an alliance in an attempt to maximize the benefit of relaying while the source aims to increase the channel capacity cost-effectively. To this end, we establish the trade problem as a Stackelberg game, and prove the existence of its equilibrium. Another important aspect is that we use multi-agent reinforcement learning (MARL) to approach the equilibrium in a situation where the instantaneous channel state information (CSI) is unavailable, and the source and relays do not have knowledge of each other's goal. A multi-agent deep deterministic policy gradient-based framework is designed, where the relay alliance and the source act as agents. Experiments demonstrate that the proposed method can obtain an acceptable performance that is close to the game-theoretic equilibrium for all players under time-invariant environments, which considerably outperforms its potential alternatives and is only about 2.9% away from the optimal solution. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11195 [pdf, other]

Resolving Geometric Excitations of Fractional Quantum Hall States

Authors: Yang Liu, Tongzhou Zhao, T. Xiang

Abstract: The quantum dynamics of the intrinsic metric profoundly influence the neutral excitations in the fractional quantum Hall system, as established by Haldane in 2011 \cite{Haldane2011}, and further evidenced by a recent two-photon experiment \cite{Liang2024}. Despite these advancements, a comprehensive understanding of the dynamic properties of these excitations, especially at long wavelengths, conti… ▽ More The quantum dynamics of the intrinsic metric profoundly influence the neutral excitations in the fractional quantum Hall system, as established by Haldane in 2011 \cite{Haldane2011}, and further evidenced by a recent two-photon experiment \cite{Liang2024}. Despite these advancements, a comprehensive understanding of the dynamic properties of these excitations, especially at long wavelengths, continues to elude interest. In this study, we employ tensor-network methods to investigate the neutral excitations of the Laughlin and Moore-Read states on an infinite cylinder. This investigation deepens our understanding of the excitation spectrum in regions where traditional methods do not work effectively. The spectral functions for both states reveal the presence of $S=-2$ geometric excitations. For the first time, we unveil the complex spectra of both neutral fermion and bosonic Girvin-MacDonald-Platzman modes within the excitation continuum by calculating the three-particle density response function for the Moore-Read state. Our findings support the hypothesis of emergent supersymmetry and highlight the potential for detecting neutral fermions in future experiments. △ Less

Submitted 1 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11099 [pdf, ps, other]

doi 10.1103/PhysRevB.109.L220401

Gateway to all-optical spin switching in Heusler ferrimagnets: Pancharatnam-Berry tensor and magnetic moment ratio

Authors: G. P. Zhang, Y. Q. Liu, M. S. Si, Nicholas Allbritton, Y. H. Bai, Wolfgang Hübner, Thomas F. George

Abstract: All-optical spin switching (AOS) is a new phenomenon found in a small group of magnetic media, where a single laser pulse can switch spins from one direction to another, without assistance of a magnetic field, on a time scale much shorter than existing magnetic technology. However, despite intensive efforts over a decade, its underlying working principle remains elusive. Here through manganese-bas… ▽ More All-optical spin switching (AOS) is a new phenomenon found in a small group of magnetic media, where a single laser pulse can switch spins from one direction to another, without assistance of a magnetic field, on a time scale much shorter than existing magnetic technology. However, despite intensive efforts over a decade, its underlying working principle remains elusive. Here through manganese-based Heusler ferrimagnets, we show that a group of flat bands around the Fermi level act as gateway states to form efficient channels or spin switching, where their noncentrosymmetry allows us to correlate the spin dynamics to the second-order optical response. To quantify their efficacy, we introduce the third-rank Pancharatnam-Berry tensor (PB tensor), $\boldsymbolη^{(3)}=\langle i |{\bf p} |m\rangle \langle m|{\bf p} |f\rangle \langle f|{\bf p} |i\rangle,$ where $|i\rangle$, $|m\rangle$ and $|f\rangle$ are initial, intermediate and final band states, respectively, and ${\bf p}$ is the momentum operator. A picture emerges: Those which show AOS, such as the recently discovered Mn$_2$RuGa, always have a large PB tensor element} but have a small sublattice spin moment ratio, consistent with the prior experimental small remanence criterion. This does not only reveal that the delicate balance between the large PB tensor element and the small sublattice spin ratio plays a decisive role in AOS, but also, conceptually, connects the $n$th-order nonlinear optics to $(n+1)$th-rank PB tensors in general. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 20 pages, four figures. Published in Physical Review B Letters

arXiv:2406.11087 [pdf, other]

MemDPT: Differential Privacy for Memory Efficient Language Models

Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on memory resources, which is a matter of significant concern in practice. In this paper, we present an innovative training framework MemDPT that not only reduces the memory cost of large language models but also places a strong emphasis on safeguarding user data privacy. MemDPT provides edge network and reverse network designs to accommodate various differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves $2 \sim 3 \times$ memory optimization but also provides robust privacy protection, ensuring that user data remains secure and confidential. Extensive experiments have demonstrated that MemDPT can effectively provide differential privacy efficient fine-tuning across various task scenarios. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages first version

arXiv:2406.11045 [pdf, other]

Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving PDEs based on Kolmogorov Arnold Networks

Authors: Yizheng Wang, Jia Sun, **shuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu

Abstract: AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in… ▽ More AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in various forms, such as strong form, energy form, and inverse form. While mathematically equivalent, these forms are not computationally equivalent, making the exploration of different PDE formulations significant in computational physics. Thus, we propose different PDE forms based on KAN instead of MLP, termed Kolmogorov-Arnold-Informed Neural Network (KINN). We systematically compare MLP and KAN in various numerical examples of PDEs, including multi-scale, singularity, stress concentration, nonlinear hyperelasticity, heterogeneous, and complex geometry problems. Our results demonstrate that KINN significantly outperforms MLP in terms of accuracy and convergence speed for numerous PDEs in computational solid mechanics, except for the complex geometry problem. This highlights KINN's potential for more efficient and accurate PDE solutions in AI for PDEs. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10977 [pdf, other]

Toward Optimal LLM Alignments Using Two-Player Games

Authors: Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuan**g Huang, Hang Li, Yang Liu

Abstract: The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the… ▽ More The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. The adversarial agent's task at each step is to generate prompts that expose the weakness of the defensive agent. In return, the defensive agent seeks to improve its responses to these newly identified prompts it struggled with, based on feedback from the reward model. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Our code is released at https://github.com/ruizheng20/gpo

MSC Class: 68

arXiv:2406.10945 [pdf, ps, other]

Tilt stability of Ky-Fan $κ$-norm composite optimization

Authors: Yulan Liu, Shaohua Pan, Wen Song

Abstract: This paper concerns the tilt stability for the minimization of the sum of a twice continuously differentiable matrix-valued function and the Ky-Fan $κ$-norm. By using the expression of second subderivative of the Ky-Fan $κ$-norm, we derive a verifiable criterion to identify the tilt stability of a local minimum for this class of nonconvex and nonsmooth problems. As a byproduct, a practical criteri… ▽ More This paper concerns the tilt stability for the minimization of the sum of a twice continuously differentiable matrix-valued function and the Ky-Fan $κ$-norm. By using the expression of second subderivative of the Ky-Fan $κ$-norm, we derive a verifiable criterion to identify the tilt stability of a local minimum for this class of nonconvex and nonsmooth problems. As a byproduct, a practical criterion is achieved for the tilt stable solution of the nuclear-norm regularized minimization. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 30pages

MSC Class: 49J52; 49J53; 49K40; 90C31; 90C56

arXiv:2406.10941 [pdf, other]

Near-Field Localization and Sensing with Large-Aperture Arrays: From Signal Modeling to Processing

Authors: Zhaolin Wang, Parisa Ramezani, Yuanwei Liu, Emil Björnson

Abstract: The signal processing community is currently witnessing a growing interest in near-field signal processing, driven by the trend towards the use of large aperture arrays with high spatial resolution in the fields of communication, localization, sensing, imaging, etc. From the perspective of localization and sensing, this trend breaks the basic far-field assumptions that have dominated the array sig… ▽ More The signal processing community is currently witnessing a growing interest in near-field signal processing, driven by the trend towards the use of large aperture arrays with high spatial resolution in the fields of communication, localization, sensing, imaging, etc. From the perspective of localization and sensing, this trend breaks the basic far-field assumptions that have dominated the array signal processing research in the past, presenting new challenges and promising opportunities. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 20 pages, 5 figures

arXiv:2406.10939 [pdf, ps, other]

On the Existence of Weighted-cscK Metrics

Authors: Jiyuan Han, Yaxiong Liu

Abstract: In this paper, we prove that on a smooth Kähler manifold, the $\mathbb{G}$-coercivity of the weighted Mabuchi functional implies the existence of weighted-cscK (extremal) metric (firstly studied in \cite{Lah19}), e.g, cscK metric, Kähler--Ricci soliton. Furthermore, we generalize this result to the case of $\mathbb{G}$-equivariant $\mathbb{Q}$-Gorenstein smoothable projective klt varieties. In this paper, we prove that on a smooth Kähler manifold, the $\mathbb{G}$-coercivity of the weighted Mabuchi functional implies the existence of weighted-cscK (extremal) metric (firstly studied in \cite{Lah19}), e.g, cscK metric, Kähler--Ricci soliton. Furthermore, we generalize this result to the case of $\mathbb{G}$-equivariant $\mathbb{Q}$-Gorenstein smoothable projective klt varieties. △ Less

Submitted 16 June, 2024; originally announced June 2024.

MSC Class: 53C55 (Primary) 53C21; 32J27; 58J60 (Secondary)

arXiv:2406.10829 [pdf, other]

Solving Co-Path/Cycle Packing Faster than $3^k$

Authors: Yuxi Liu, Mingyu Xiao

Abstract: The \textsc{Co-Path/Cycle Packing} problem asks whether we can delete at most $k$ vertices from the input graph such that the remaining graph is a collection of induced paths and cycles. \textsc{Co-Path/Cycle Packing} is a fundamental graph problem that has important applications in bioinformatics. Although this problem has been extensively studied in parameterized algorithms, it seems hard to bre… ▽ More The \textsc{Co-Path/Cycle Packing} problem asks whether we can delete at most $k$ vertices from the input graph such that the remaining graph is a collection of induced paths and cycles. \textsc{Co-Path/Cycle Packing} is a fundamental graph problem that has important applications in bioinformatics. Although this problem has been extensively studied in parameterized algorithms, it seems hard to break the running time bound $3^k$. In 2015, Feng et al. provided an $O^*(3^k)$-time randomized algorithm. Recently, Tsur showed that this problem can be solved in $O^*(3^k)$ time deterministically. In this paper, by combining several techniques such as path decomposition, dynamic programming, and branch-and-search methods, we show that \textsc{Co-Path/Cycle Packing} can be solved in $O^*(2.8192^k)$ time. As a by-product, we also show that the \textsc{$d$-Bounded-Degree Vertex Deletion} problem, a generalization of \textsc{Co-Path/Cycle Packing}, can be solved in $O^*((d + 2)^p)$ time if a path decomposition of width $p$ is given, which implies that \textsc{$d$-Bounded-Degree Vertex Deletion} is FPT with parameter $p+d$. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: The author list and contents need to be verified by all authors

arXiv:2406.10667 [pdf, other]

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Authors: Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu

Abstract: Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates ra… ▽ More Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly. We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization. To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space. We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark. Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory. Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results. The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 32 pages, 16 figures

arXiv:2406.10638 [pdf, other]

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Authors: Yexin Liu, Zhengyang Liang, Yueze Wang, Muyang He, Jian Li, Bo Zhao

Abstract: Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in… ▽ More Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in many cases, MLLMs answer questions incorrectly despite correctly understanding the visual content. This suggests that incorrect answers do not necessarily imply a lack of comprehension but may instead result from lacking robustness to leading questions. To comprehensively measure MLLMs' understanding capability and robustness to leading questions, we introduce a MultiModal Robustness benchmark (MMR). MMR contains paired positive and negative questions across 12 categories, meticulously annotated by humans. We evaluate 18 leading MLLMs on the MMB benchmark, revealing that MLLMs suffer from fragility to leading questions despite understanding the visual content. To enhance MLLMs' understanding capability and robustness, we further present a training set with paired positive and negative visual question-answer samples. Experiments verify that MLLMs' robustness can be significantly enhanced by tuning on this new training set. The benchmark, training set, and code can be found at https://github.com/BAAI-DCAI/Multimodal-Robustness-Benchmark. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10591 [pdf, other]

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT . △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10589 [pdf, other]

Resilience patterns in higher-order meta-population networks

Authors: Yanyi Nie, Yanbing Liu, Qixuan Cao, Tao Lin, Wei Wang

Abstract: Meta-population networks are effective tools for capturing population movement across distinct regions, but the assumption of well-mixed regions fails to capture the reality of population higher-order interactions. As a multidimensional system capturing mobility characteristics, meta-population networks are inherently complex and difficult to interpret when subjected to resilience analysis based o… ▽ More Meta-population networks are effective tools for capturing population movement across distinct regions, but the assumption of well-mixed regions fails to capture the reality of population higher-order interactions. As a multidimensional system capturing mobility characteristics, meta-population networks are inherently complex and difficult to interpret when subjected to resilience analysis based on N-dimensional equations. We propose a higher-order meta-population model that captures large-scale global cross-regional mobility and small-scale higher-order interactions within regions. Remarkably, we extend the dimension-reduction approach, simplifying the N-dimensional higher-order meta-population system into a one-dimensional equation by decomposing different network behaviours into a single universal resilience function, thereby allowing for convenient and accurate prediction of the system resilience. The network structure and human mobility parameters can clearly and simply express the epidemic threshold. Numerical experimental results on both real networks and star networks confirm the accuracy of the proposed dimension-reduction framework in predicting the evolution of epidemic dynamics on higher-order meta-population networks. Additionally, higher-order interactions among populations are shown to lead to explosive growth in the epidemic infection size potentially. Population mobility causes changes in the spatial distribution of infectious diseases across different regions. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10554 [pdf, other]

Causal Inference with Outcomes Truncated by Death and Missing Not at Random

Authors: Wei Li, Yuan Liu, Shanshan Luo, Zhi Geng

Abstract: In clinical trials, principal stratification analysis is commonly employed to address the issue of truncation by death, where a subject dies before the outcome can be measured. However, in practice, many survivor outcomes may remain uncollected or be missing not at random, posing a challenge to standard principal stratification analyses. In this paper, we explore the identification, estimation, an… ▽ More In clinical trials, principal stratification analysis is commonly employed to address the issue of truncation by death, where a subject dies before the outcome can be measured. However, in practice, many survivor outcomes may remain uncollected or be missing not at random, posing a challenge to standard principal stratification analyses. In this paper, we explore the identification, estimation, and bounds of the average treatment effect within a subpopulation of individuals who would potentially survive under both treatment and control conditions. We show that the causal parameter of interest can be identified by introducing a proxy variable that affects the outcome only through the principal strata, while requiring that the treatment variable does not directly affect the missingness mechanism. Subsequently, we propose an approach for estimating causal parameters and derive nonparametric bounds in cases where identification assumptions are violated. We illustrate the performance of the proposed method through simulation studies and a real dataset obtained from a Human Immunodeficiency Virus (HIV) study. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10416 [pdf, other]

Byzantine-Robust Decentralized Federated Learning

Authors: Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong

Abstract: Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bot… ▽ More Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks. △ Less

Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: To appear in ACM Conference on Computer and Communications Security 2024 (CCS '24)

arXiv:2406.10272 [pdf, other]

Connected Speech-Based Cognitive Assessment in Chinese and English

Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction. △ Less

Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of Interspeech 2024

ACM Class: J.3; I.5.4

arXiv:2406.10181 [pdf, other]

Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Authors: Siyuan Chen, Zelong Guan, Yudong Liu, Phillip B. Gibbons

Abstract: Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU. In this paper, we present an offloading fram… ▽ More Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU. In this paper, we present an offloading framework, LSP_Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned subspace projectors. Our data-driven approach involves learning an efficient sparse compressor that minimizes communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter model on a 4GB laptop GPU and a 7 billion parameter model on an NVIDIA RTX 4090 GPU with 24GB memory, achieving only a 31% slowdown compared to fine-tuning with unlimited memory. Compared to state-of-the-art offloading frameworks, our approach increases fine-tuning throughput by up to 3.33 times and reduces end-to-end fine-tuning time by 33.1%~62.5% when converging to the same accuracy. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10130 [pdf, other]

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

Authors: Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, Tsung-Yi Ho

Abstract: Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases, which may cause negative social impacts or even bring catastrophic results in application. Previous works on this problem mainly focused on using black-box methods such as probing to detect and quantify social biases in PLMs by observing model outputs. As a result, previous debiasing me… ▽ More Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases, which may cause negative social impacts or even bring catastrophic results in application. Previous works on this problem mainly focused on using black-box methods such as probing to detect and quantify social biases in PLMs by observing model outputs. As a result, previous debiasing methods mainly finetune or even pre-train language models on newly constructed anti-stereotypical datasets, which are high-cost. In this work, we try to unveil the mystery of social bias inside language models by introducing the concept of {\sc Social Bias Neurons}. Specifically, we propose {\sc Integrated Gap Gradients (IG$^2$)} to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias. By formalizing undesirable behavior as a distributional property of language, we employ sentiment-bearing prompts to elicit classes of sensitive words (demographics) correlated with such sentiments. Our IG$^2$ thus attributes the uneven distribution for different demographics to specific Social Bias Neurons, which track the trail of unwanted behavior inside PLM units to achieve interoperability. Moreover, derived from our interpretable technique, {\sc Bias Neuron Suppression (BNS)} is further proposed to mitigate social biases. By studying BERT, RoBERTa, and their attributable differences from debiased FairBERTa, IG$^2$ allows us to locate and suppress identified neurons, and further mitigate undesired behaviors. As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09961 [pdf, other]

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Authors: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan **g, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

Abstract: We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres… ▽ More We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which represent the authentic chart use cases found in scientific papers across various domains(e.g., Physics, Computer Science, Economics, etc). These charts span 18 regular types and 4 advanced types, diversifying into 191 subcategories. Furthermore, we propose multi-level evaluation metrics to provide an automatic and thorough assessment of the output code and the rendered charts. Unlike existing code generation benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to harmonize a blend of cognitive capabilities, encompassing visual understanding, code generation, and cross-modal reasoning. The evaluation of 3 proprietary models and 11 open-weight models highlights the substantial challenges posed by ChartMimic. Even the advanced GPT-4V, Claude-3-opus only achieve an average score of 73.2 and 53.7, respectively, indicating significant room for improvement. We anticipate that ChartMimic will inspire the development of LMMs, advancing the pursuit of artificial general intelligence. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Data and code are available at https://github.com/ChartMimic/ChartMimic

arXiv:2406.09881 [pdf, other]

A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation

Authors: Yongkang Liu, Ercong Nie, Shi Feng, Zheng Hua, Zifeng Ding, Daling Wang, Yifei Zhang, Hinrich Schütze

Abstract: Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data \textbf{A}ugmentation framework for \textbf{M}ulti-\textbf{D}omain \textbf{D}ialogue \textbf{G}eneration, referred to as \textbf{AMD$^2$G}. The AMD… ▽ More Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data \textbf{A}ugmentation framework for \textbf{M}ulti-\textbf{D}omain \textbf{D}ialogue \textbf{G}eneration, referred to as \textbf{AMD$^2$G}. The AMD$^2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training. We posit that domain corpora are a blend of domain-agnostic and domain-specific features, with certain representation patterns shared among diverse domains. Domain-agnostic training aims to enable models to learn these common expressive patterns. To construct domain-agnostic dialogue corpora, we employ a \textit{\textbf{de-domaining}} data processing technique used to remove domain-specific features. By mitigating the effects of domain-specific features, the model trained on the de-domained corpora can effectively learn common expression patterns in different domains. Subsequently, we adapt the learned domain-agnostic features to the target domain through domain adaptation training. We conduct experiments on Chinese dialogue datasets from five different domains and show that AMD$^2$G achieves superior performance compared to both direct training on the target domain corpus and collective training on all five domain corpora. Our work underscores AMD$^2$G as a viable alternative solution for low-resource multi-domain dialogue generation. Code and data associated with our work are available on GitHub repository$^{\text 1}$. △ Less

Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: 17pages,ECML-PKDD

Journal ref: 2024 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

arXiv:2406.09841 [pdf, other]

Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie

Abstract: Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular repr… ▽ More Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures

arXiv:2406.09834 [pdf, other]

How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs, the specific problem of deprecated API usage in LLM-based code completion has not been thoroughly investigated. To address this gap, we conducted the first evaluation study on deprecated API usage in LLM-based code completion. This study involved seven advanced LLMs, 145 API map**s from eight popular Python libraries, and 28,125 completion prompts. The study results reveal the \textit{status quo} and \textit{root causes} of deprecated API usage in LLM-based code completion from the perspectives of \textit{model}, \textit{prompt}, and \textit{library}. Based on these findings, we propose two lightweight fixing approaches, \textsc{ReplaceAPI} and \textsc{InsertPrompt}, which can serve as baseline approaches for future research on mitigating deprecated API usage in LLM-based completion. Additionally, we provide implications for future research on integrating library evolution with LLM-driven software development. △ Less

Submitted 3 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09815 [pdf, other]

Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Authors: Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang

Abstract: The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context.… ▽ More The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context. In this paper, we propose retrieval augmented fact verification through the synthesis of contrasting arguments (RAFTS). Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources. Then, RAFTS forms contrastive arguments (i.e., supporting or refuting) conditioned on the retrieved evidence. In addition, RAFTS leverages an embedding model to identify informative demonstrations, followed by in-context prompting to generate the prediction and explanation. Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives, incorporating nuanced information for fine-grained decision-making. Combined with informative in-context examples as prior, RAFTS achieves significant improvements to supervised and LLM baselines without complex prompts. We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024

arXiv:2406.09798 [pdf, other]

Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang

Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VL… ▽ More Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VLN models trained with panoramic observation, perform better but are difficult to deploy on most monocular robots. For this case, we propose a sim-to-real transfer approach to endow the monocular robots with panoramic traversability perception and panoramic semantic understanding, thus smoothly transferring the high-performance panoramic VLN models to the common monocular robots. In this work, the semantic traversable map is proposed to predict agent-centric navigable waypoints, and the novel view representations of these navigable waypoints are predicted through the 3D feature fields. These methods broaden the limited field of view of the monocular robots and significantly improve navigation performance in the real world. Our VLN system outperforms previous SOTA monocular VLN methods in R2R-CE and RxR-CE benchmarks within the simulation environments and is also validated in real-world environments, providing a practical and high-performance solution for real-world VLN. △ Less

Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: Submitted to CoRL 2024. The code is available at https://github.com/MrZihan/Sim2Real-VLN-3DFF

arXiv:2406.09693 [pdf, other]

Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

Authors: Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng

Abstract: In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames. The proposed model consists of the intra-group feature alignment (IntraGFA) module, the inter-group feature fusion (InterGFF) module, and the feature enhancement (FE) module. We form the group of pictures (GoP) by selecting fr… ▽ More In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames. The proposed model consists of the intra-group feature alignment (IntraGFA) module, the inter-group feature fusion (InterGFF) module, and the feature enhancement (FE) module. We form the group of pictures (GoP) by selecting frames from the video according to their temporal distances to the target enhanced frame. With this grou**, the composed GoP can contain either long- or short-term correlated information of neighboring frames. We design the IntraGFA module to align the features of frames of each GoP to eliminate the motion existing between frames. We construct the InterGFF module to fuse features belonging to different GoPs and finally enhance the fused features with the FE module to generate high-quality video frames. The experimental results show that our proposed method achieves up to 0.05dB gain and lower complexity compared to the state-of-the-art method. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09684 [pdf, other]

Explainable AI for Comparative Analysis of Intrusion Detection Models

Authors: Pap M. Corea, Yongxin Liu, Jian Wang, Shuteng Niu, Houbing Song

Abstract: Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class class… ▽ More Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic on the same dataset using occlusion sensitivity. The models evaluated include Linear Regression, Logistic Regression, Linear Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Trees, and Multi-Layer Perceptrons (MLP). We trained all models to the accuracy of 90\% on the UNSW-NB15 Dataset. We found that most classifiers leverage only less than three critical features to achieve such accuracies, indicating that effective feature engineering could actually be far more important for intrusion detection than applying complicated models. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness. Data and code available at https://github.com/pcwhy/XML-IntrusionDetection.git △ Less

Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE MeditCom 2024 - WS-05

arXiv:2406.09613 [pdf, other]

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Authors: Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

Abstract: A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categori… ▽ More A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categories. However, existing datasets with object-level 3D annotations are often limited by the number of categories or the quality of annotations. Models developed on these datasets become specialists for certain categories or domains, and fail to generalize. In this work, we present ImageNet3D, a large dataset for general-purpose object-level 3D understanding. ImageNet3D augments 200 categories from the ImageNet dataset with 2D bounding box, 3D pose, 3D location annotations, and image captions interleaved with 3D information. With the new annotations available in ImageNet3D, we could (i) analyze the object-level 3D awareness of visual foundation models, and (ii) study and develop general-purpose models that infer both 2D and 3D information for arbitrary rigid objects in natural images, and (iii) integrate unified 3D models with large language models for 3D-related reasoning.. We consider two new tasks, probing of object-level 3D awareness and open vocabulary pose estimation, besides standard classification and pose estimation. Experimental results on ImageNet3D demonstrate the potential of our dataset in building vision models with stronger general-purpose object-level 3D understanding. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09608 [pdf, other]

Human-Machine Interface Evaluation Using EEG in Driving Simulator

Authors: Y. C. Liu, N. Figalova, M. Baumann, K Bengler

Abstract: Automated vehicles are pictured as the future of transportation, and facilitating safer driving is only one of the many benefits. However, due to the constantly changing role of the human driver, users are easily confused and have little knowledge about their responsibilities. Being the bridge between automation and human, the human-machine interface (HMI) is of great importance to driving safety.… ▽ More Automated vehicles are pictured as the future of transportation, and facilitating safer driving is only one of the many benefits. However, due to the constantly changing role of the human driver, users are easily confused and have little knowledge about their responsibilities. Being the bridge between automation and human, the human-machine interface (HMI) is of great importance to driving safety. This study was conducted in a static driving simulator. Three HMI designs were developed, among which significant differences in mental workload using NASA-TLX and the subjective transparency test were found. An electroencephalogram was applied throughout the study to determine if differences in the mental workload could also be found using EEG's spectral power analysis. Results suggested that more studies are required to determine the effectiveness of the spectral power of EEG on mental workload, but the three interface designs developed in this study could serve as a solid basis for future research to evaluate the effectiveness of psychophysiological measures. Marie Sklodowska-Curie Actions; Innovative Training Network (ITN); SHAPE-IT; Grant number 860410; Publication date: [27 July 2023]; DOI: [10.1109/IV55152.2023.10186567] △ Less

Submitted 13 June, 2024; originally announced June 2024.

Journal ref: 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 2023, pp. 1-6

arXiv:2406.09603 [pdf, other]

Workload Assessment of Human-Machine Interface: A Simulator Study with Psychophysiological Measures

Authors: Yuan-Cheng Liu, Nikol Figalova, Juergen Pichen, Philipp Hock, Martin Baumann, Klaus Bengler

Abstract: Human-machine Interface (HMI) is critical for safety during automated driving, as it serves as the only media between the automated system and human users. To enable a transparent HMI, we first need to know how to evaluate it. However, most of the assessment methods used for HMI designs are subjective and thus not efficient. To bridge the gap, an objective and standardized HMI assessment method is… ▽ More Human-machine Interface (HMI) is critical for safety during automated driving, as it serves as the only media between the automated system and human users. To enable a transparent HMI, we first need to know how to evaluate it. However, most of the assessment methods used for HMI designs are subjective and thus not efficient. To bridge the gap, an objective and standardized HMI assessment method is needed, and the first step is to find an objective method for workload measurement for this context. In this study, two psychophysiological measures, electrocardiography (ECG) and electrodermal activity (EDA), were evaluated for their effectiveness in finding differences in mental workload among different HMI designs in a simulator study. Three HMI designs were developed and used. Results showed that both workload measures were able to identify significant differences in objective mental workload when interacting with in-vehicle HMIs. As a first step toward a standardized assessment method, the results could be used as a firm ground for future studies. Marie Skłodowska-Curie Actions; Innovative Training Network (ITN); SHAPE-IT; Grant number 860410; Publication date: [29 Sep 2023]; DOI: [10.54941/ahfe1004172] △ Less

Submitted 13 June, 2024; originally announced June 2024.

Journal ref: AHFE (2023) International Conference. AHFE Open Access, vol 112. AHFE International, USA

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09313 [pdf, other]

doi 10.1145/3660803

Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

Authors: Shuqing Li, Cuiyun Gao, Jian** Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of the user's brain, leading to user discomfort and even adverse health effects. Such issues commonly exist but remain underexplored. We conduct an empirical analysis on 282 SVI bug reports from 15 VR platforms, summarizing 15 types of manifestations. The empirical analysis reveals that automatically detecting SVI issues is challenging, mainly because: (1) lack of training data; (2) the manifestations of SVI issues are diverse, complicated, and often application-specific; (3) most accessible VR apps are closed-source commercial software. Existing pattern-based supervised classification approaches may be inapplicable or ineffective in detecting the SVI issues. To counter these challenges, we propose an unsupervised black-box testing framework named StereoID to identify the stereoscopic visual inconsistencies, based only on the rendered GUI states. StereoID generates a synthetic right-eye image based on the actual left-eye image and computes distances between the synthetic right-eye image and the actual right-eye image to detect SVI issues. We propose a depth-aware conditional stereo image translator to power the image generation process, which captures the expected perspective shifts between left-eye and right-eye images. We build a large-scale unlabeled VR stereo screenshot dataset with larger than 171K images from 288 real-world VR apps for experiments. After substantial experiments, StereoID demonstrates superior performance for detecting SVI issues in both user reports and wild VR apps. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

arXiv:2406.09270 [pdf, other]

Discovery and Extensive Follow-Up of SN 2024ggi, a nearby type IIP supernova in NGC 3621

Authors: Ting-Wan Chen, Sheng Yang, Shubham Srivastav, Takashi J. Moriya, Stephen J. Smartt, Sofia Rest, Armin Rest, Hsing Wen Lin, Hao-Yu Miao, Yu-Chi Cheng, Amar Aryan, Chia-Yu Cheng, Morgan Fraser, Li-Ching Huang, Meng-Han Lee, Cheng-Han Lai, Yu Hsuan Liu, Aiswarya Sankar. K, Ken W. Smith, Heloise F. Stevance, Ze-Ning Wang, Joseph P. Anderson, Charlotte R. Angus, Thomas de Boer, Kenneth Chambers , et al. (23 additional authors not shown)

Abstract: We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o… ▽ More We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o- and r-band light curves show a rapid rise of 3.3 magnitudes in 13.7 hours, much faster than SN 2023ixf (another recent, nearby, and well-observed SN II). Between 13.8 and 18.8 hours after explosion SN 2024ggi became bluer, with u-g colour drop** from 0.53 to 0.15 mag. The rapid blueward evolution indicates a wind shock breakout (SBO) scenario. No hour-long brightening expected for the SBO from a bare stellar surface was detected during our observations. The classification spectrum, taken 17 hours after the SN explosion, shows flash features of high-ionization species such as Balmer lines, He I, C III, and N III. Detailed light curve modeling reveals critical insights into the properties of the circumstellar material (CSM). Our favoured model has an explosion energy of 2 x 10^51 erg, a mass-loss rate of 10^-3 solar_mass/yr (with an assumed 10 km/s wind), and a confined CSM radius of 6 x 10^14 cm. The corresponding CSM mass is 0.4 solar_mass. Comparisons with SN 2023ixf highlight that SN 2024ggi has a smaller CSM density, resulting in a faster rise and fainter UV flux. The extensive dataset and the involvement of citizen astronomers underscore that a collaborative network is essential for SBO searches, leading to more precise and comprehensive SN characterizations. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures in manuscript, 6 pages in appendix, submitted to ApJL

arXiv:2406.09264 [pdf, other]

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions. △ Less

Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 56 pages

Showing 151–200 of 14,489 results for author: liu, Y